Species abundances surpass richness effects in the biodiversity-ecosystem function relationship across marine fishes

[ad_1]

We defined ecosystem functions as the movement or storage of energy or materials through an ecosystem41,42. We therefore adapted and applied a relatively novel approach for calculating biomass production40, which is quantified as the rate of biomass produced via ontogenetic growth of all surveyed individuals over 500 m2 over the course of one day, instead of using a proxy for functioning (e.g., standing stock biomass as seen in ref. 22). We first conducted a systematic literature review on all published von Bertalanffy growth models on marine teleost fishes from around the world to build a machine learning model to predict growth coefficients for fishes without empirically measured growth curves (as in ref. 39). Predictive models used a combination of ecological and environmental trait data, while also accounting for the aging method used (see below). We then used predicted growth coefficients to calculate the biomass production of reef fishes censused from one of the largest standardised global survey programs in the world: the Reef Life Survey program37,38. Although the Reef Life Survey program specifically censuses relatively shallow reef habitats, some of the species recorded have lower depth limits beyond 1000 m, hence we included all fishes in our systematic literature review. We then used Hill numbers to holistically capture multiple axes of biodiversity and assessed the BEF from the best-fitting Hill diversity metric. All models accounted for potential sources of environmental variation (i.e., water visibility, survey depth, temperature) and spatial heterogeneity (i.e., realm- and site-level effects).

Data collation

We expanded on the framework developed by ref. 69 to generate standardised species-level estimates of somatic growth across marine teleost fishes. In short, we used a derivation of the von Bertalanffy Growth Model70 known as Kmax, which can be interpreted as the rate at which an individual along a specific growth trajectory would reach its asymptotic size if it grew to the species’ maximum recorded size69. Please see ref. 69 for a detailed description of the quantification of Kmax and its associated equations. Because ref. 69 only considered coral reef fishes, we expanded on this by (1) first conducting a systematic literature review of all published Von Bertalanffy growth curves for all marine fishes, including reef and non-reef habitats; and (2) generating new predictive models to predict Kmax across all marine teleost fishes. Although we are estimating growth rates for fishes surveyed in relatively shallow coastal, hard substratum-associated habitats, fishes found in pelagic systems or from deeper waters have been recorded on Reef Life Survey transects. To ensure accurate growth estimates for all available fishes, we included observations from all non-reef habitats in our predictive models.

We collated a list of growth studies from FishBase71 (accessed May 2022; n = 5770), which we supplemented with a systematic literature review (completed September 2022; n = 1916). We used the ISI Web of Science database using the search string fish * AND (growth OR von Bertalanffy), which resulted in 141649 articles. We filtered articles based on their titles and abstracts, removed all duplicates between the two sources, and corrected any typological/rounding errors recorded from FishBase. We only included studies where species were collected from marine habitats (we omitted all freshwater species that were recorded in euryhaline habitats) and those where the discrete geographic locations were provided (see ref. 69 for a detailed explanation). This yielded 7686 growth curves encompassing 1480 species. From each study, we extracted the von Bertalanffy growth parameters L and K, the length measurement type (i.e., total length, standard length, or fork length), the method used for aging (i.e., mark recapture, growth rings (e.g., otolith, scale, or vertebrae), length frequency, or unknown), and the geographic location. We then used the length-length conversion factors from FishBase to transform all estimates of L to total length in cm.

With the mined growth values from the literature, we generated predictive Extreme Gradient Boosting models to use a series of traits and environmental variables to (1) explain variation in Kmax across fishes with empirically measured growth values and (2) predict Kmax for fishes lacking data (as in ref. 39). Specifically, we predicted growth trajectories for all the fishes recorded using a standardised global underwater visual census program, the Reef Life Survey program (accessed 16 December 2021). In short, reef fish communities were censused along 50 m-long transects along coral and rocky reefs around the world. Trained divers recorded the identity, size class, and abundance of all fishes across one-to-two 250 m2-area blocks along a transect along relatively shallow depth ranges (<20 m). Detailed explanations of the survey methods can be found online (www.reeflifesurvey.com). We removed all surveys that were only composed of a single block from analyses (n = 171). For each species, including those from the mined literature and those recorded from the Reef Life Survey, we recorded the following trait data: maximum body size (total length, cm), trophic level (continuous), maximum depth (m), and position in the water column (categorical). Most of the trait data came from FishBase or the primary literature; when data were not available at the species level, we used estimates at the genus or family level. We used a 2° buffer around their respective locations and the years that they were sampled to generate an estimate of sea surface temperature (°C) using raster data from the National Oceanic and Atmospheric Administration (https://psl.noaa.gov/data/gridded/data.noaa.oisst.v2.html).

Predicting growth

We combined the trait data (i.e., maximum body size, maximum depth, trophic level, and position in the water column) with estimates of sea surface temperature and the aging method (i.e., mark recapture, otolith rings, scale rings, other rings, length frequency, and unknown) in an Extreme Gradient Boosting framework72,73 to explain the variation and predict the growth trajectories (Kmax) of all the fishes recorded on Reef Life Survey (i.e., all surveyed fishes were given a predicted growth trajectory). Extreme Gradient Boosting models are a form of machine learning algorithm that combine multiple decision trees with a boosting algorithm, exhibit relatively high predictive accuracy, and are able to handle non-linearities and complex interactions72,73. We modelled Kmax using a Gamma loss function and selected hyperparameters using the two-step tuning method from refs. 39,69, which involved varying the learning rate (eta; 0.1–0.9), regularising parameter (gamma; 0.1–0.9), maximum tree depth (max_depth; 5, 10, or 15), and the subsample rate (0.1–0.9). The final hyperparameters reduced the negative log likelihood from 2.88 to 1.70 using the following hyperparameters: eta = 0.098, gamma = 0.89, max_depth = 10, subsample = 0.096.

We used a cross-validation procedure to model and predict Kmax. The model was built and trained on 80% of the data and assessed against the remaining 20% test set. We assessed the model’s bias by subtracting the predicted Kmax value from the test set against the observed value; a well-fitted model should have a bias close to zero. The model performance was also assessed by extracting the R2 (i.e., the goodness of fit) value from fitting a linear model between log(predicted) against log(observed) values from the test set. Finally, we predicted Kmax values for all the species recorded from the Reef Life Survey program. Due to the stochastic model-building procedure of Extreme Gradient Boosting models, we bootstrapped the entire process for 1000 iterations, using the XGBoost package v.1.4.1.172,73 in R v.4.1.074. We chose to predict growth using the aforementioned method because our XGBoost models achieved a low median prediction bias of −0.005 (minimum, maximum: −0.01, 0.0008) and a high median precision R2 of 0.72 (0.65, 0.76; Fig. S5), whereas phylogenetic predictive models (e.g., ref. 75) can produce predictions with an accuracy as low as 51%. As in refs. 39,69, maximum body size was the best predictor of Kmax (median variable importance: 43.1%), followed by sea surface temperature (21.1%), maximum depth (14.6%), trophic level (12.1%), aging method (5.3%), and position in the water column (3.7%; Fig. S6; Table S3).

Calculating productivity

We calculated productivity following the methods set forth by ref. 40 using our newly estimated values of Kmax. In short, productivity is measured as the amount of biomass acquired via somatic growth of all individuals in a community over the course of one day.

$${Productivity}={Expected\, biomass}_{{t}} – {Observed\, biomass}_{t-1}$$

(1)

Here, the expected biomass is the standing biomass plus the expected biomass arising via somatic growth, the observed biomass is the community standing biomass, and t is the day. This generates a functional, process-based quantification of biomass production41,42,76, which captures the underlying energetic and elemental fluxes experienced by the community compared to static proxies, such as standing stock biomass25,40. To place each individual within their growth trajectories, we randomly drew Kmax values from a truncated normal distribution from their 90% predicted quantile range. We then simulated the growth that would be expressed by each individual over the course of one day and used the difference between the two biomass estimates as the estimate of productivity. The total productivity was therefore calculated as the sum of the somatic growth of all individuals over the course of one day over 1000 bootstrapped simulations40.

Quantifying biodiversity

To capture multiple axes of biodiversity, namely abundance and richness, we used Hill numbers with different scaling parameters. Common indices used to quantify biodiversity (e.g., species richness) can be sensitive to rare species occurring in low abundances, whereas Hill numbers provide a continuous framework to estimate the effective number of species present46,47. Following ref. 12,48, we calculated Hill diversity (D) as:

$$D={\left({\sum}_{i=1}^{S}{p}_{i}{\left(\frac{1}{{p}_{i}}\right)}^{l}\right)}^{(1/l)}$$

(2)

Here, p1, p2, …, pS are species’ relative abundances for species richness S. This formulation of D allows for the differentiation between weights via abundance and rarity by controlling values of the scaling parameter 12,48. Specifically, setting to different values (or calculating the limit as approaches zero) generates different diversity indices and can drastically influence BEF relationships48. We explicitly tested the inverse Simpson Index ( = −1), exponentiated Shannon entropy ( = 0), species richness ( = 1), and the maximum abundance-emphasised value tested by ref. 48 ( = 10). Increasing emphasis on rarity will mathematically emphasise abundances: because the lowest possible count for a species is one individual (i.e., a singleton), increasing rarity can only be achieved by increasing the total abundance of all other species in the community48. Therefore, Hill numbers that emphasise rarity more than species richness place de facto emphasis on communities with high total abundances48. We ran individual BEF analyses with each metric of biodiversity (i.e., Simpson Index, exponentiated Shannon entropy, species richness, and abundance index) and compared models by assessing relative effect sizes and using leave-one-out information criterion (LOOIC).

Analyses

We first conducted a Principal Component Analysis (PCA) to assess the correlation of all Hill diversity metrics, which were log-transformed prior to all analyses to reduce the leverage of transects with disproportionally high diversity values. We presented figures with non-logged values, however, to highlight our hypotheses that each additional species unit would have the greatest impact in low-diversity systems, with the effect of each additional species unit having a saturating effect in log space. To assess the biodiversity-ecosystem function relationships, we used Bayesian generalised linear mixed-effects models (GLMMs) using productivity as the response variable. We individually modelled a global BEF model with productivity as a function of each Hill diversity metric (i.e., Simpson index, exponentiated Shannon entropy, species richness, and abundance index). We chose the top model as that which produced the smallest LOOIC value. All models included an interaction term between Hill diversity and latitudinal position; latitudinal position was separated into temperate and tropical locations based on their respective geographic realms. We specified a gamma error distribution for all models by specifying gamma shape (ϕ) and rate (λi) parameters.

$${{\rm{Gamma}}}\left({y}_{i}|\phi,{\lambda }_{i}\right)$$

(3)

$${\lambda }_{i}=\frac{\phi }{{\mu }_{i}}$$

(4)

$$\log \left({\mu }_{i}\right)= {\beta }_{0}+{\beta }_{D}\log \left({x}_{D}\right)+{\beta }_{trop}{x}_{trop}+{\beta }_{D*{trop}}\log \left({x}_{D}\right){x}_{{trop}}\\ +{\beta }_{{sst},{vis},{depth}}{{{\bf{X}}}}_{i}{{\boldsymbol{+}}}{\alpha }_{site}$$

(5)

$${\alpha }_{site} \sim {\alpha }_{realm}+{\varepsilon }_{site}$$

(6)

$${\varepsilon }_{site} \sim {{\mathrm{Normal}}}\left(0,\,{\sigma }_{site}\right)$$

(7)

$${{{\rm{\beta }}}}_{0} \sim {{\rm{Normal}}}\left(0,\,5\right)$$

(8)

$${\beta }_{D,{trop},\,D*{trop},{sst},{vis},{depth}} \sim {\mathrm{Normal}}\left(0,\,1\right)$$

(9)

Here, yi is the observed community productivity for transect i, β0 is the overall intercept, βD*trop is the interaction term between diversity and the tropics, βD,trop,sst,vis,depth are the estimated effects for Hill diversity (D), tropics (trop), sea surface temperate (sst), visibility (vis), and depth, respectively. We ran an additional model following the top model structure, but used the Boltzmann-Arrhenius relationship to convert temperature to inverse temperature (1/temperature x Boltzmann’s constant) with temperature in Kelvin53. The term Xi is the design matrix of covariates. Site- and realm-level grouping factors are denoted as αsite and αrealm, respectively, following a nested structure of site within realm. We specified weakly informative priors on all estimated effects. We modelled per-capita productivity following the same model structure as total community productivity, but we added log-transformed abundance as an offset in the model. Adding an offset in the model allows us to measure productivity on a per-capita basis, while still following the same underlying gamma error distribution.

We followed a similar model structure when modelling changes in abundance (yi) with diversity between temperate and tropical regions, but we specified a negative binomial distribution using the inverse shape parameter (\(\omega\)) for overdispersion.

$${{\rm{Negative\; Binomial}}}\left({y}_{i}\,|{\,\mu }_{i},\omega \right)$$

(10)

$$\log \left({\mu }_{i}\right)= {\beta }_{0}+\,{\beta }_{D}\log \left({x}_{D}\right)+{\beta }_{trop}{x}_{trop}+\,{\beta }_{D*{trop}}\log \left({x}_{D}\right){x}_{{trop}}\\ +\,{\beta }_{{sst},{vis},{depth}}{{{\bf{X}}}}_{i}{{\boldsymbol{+}}}\,{\alpha }_{{site}}$$

(11)

$${{\rm{Var}}}\left({y}_{i}\right)=\,{\mu }_{i}+\,\frac{{\mu }_{i}^{2}}{\omega }$$

(12)

$${\alpha }_{site} \, \sim \,{\alpha }_{realm}+\, {\varepsilon }_{site}$$

(13)

$${\varepsilon }_{{site}}\, \sim {{\rm{Normal}}}\left(0,\,{\sigma }_{{site}}\right)$$

(14)

$${{{\rm{\beta }}}}_{0}\, \sim {{\rm{Normal}}}\left(0,\,5\right)$$

(15)

$${\beta }_{D,{trop},{sst},{vis},{depth}}\, \sim {{\mathrm{Normal}}}\left(0,\,1\right)$$

(16)

For all models, we scaled and centred all continuous fixed effects by subtracting the mean and dividing by the standard deviation prior to analysis. All models used four Markov-Chain Monte Carlo chains for 4000 iterations after an initial warmup phase of 2000 iterations using the brms package v.2.16.477. We assessed model fit using posterior predictive checks and simulated residuals from the DHARMa package v.0.4.578 and achieved chain convergence for all estimate parameters (scale reduction factor Rhat <1.01) and effective sample sizes were all greater than 1800 (Table S2). Using Moran’s I, we detected no spatial autocorrelation in the simulated residuals of any of the models. All modelled coefficients from GLMMs can be found in Table S2. No statistical method was used to predetermine sample size. Surveys comprising a single block were excluded (see above). There were no experiments involved, therefore blinding and randomisation were not used.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

[ad_2]

Source link

More From Forest Beat

Global 1-km habitat distribution for endangered species and its spatial changes...

Wudu, K., Abegaz, A., Ayele, L. & Ybabe, M. The impacts of climate change on biodiversity loss and its remedial measures using nature...
Biodiversity
7
minutes

Italian still life paintings as a resource for reconstructing past Mediterranean...

We have explored the historical representation of aquatic resources in Italian still-life paintings as an indicator of past aquatic socio-ecosystems. In this study,...
Biodiversity
17
minutes

Trait mediation explains decadal distributional shifts for a wide range of...

Bell, J. R., Blumgart, D. & Shortall, C. R. Are insects declining and at what rate? An analysis of standardised, systematic catches of...
Biodiversity
13
minutes

Why are there large gaps in the British distribution of Common...

Back in mid-April, Karin and I spent a long weekend in the New Forest, exploring the walking trails around the village of Brockenhurst...
Biodiversity
3
minutes
spot_imgspot_img