Archived natural DNA samplers reveal four decades of biodiversity change across the tree of life


Specimen bank data

This research complies with all relevant ethical regulations. All study protocols are approved by the German Environment Agency.

The German Environmental Specimen Bank (ESB) has been in operation since the early 1980s. The ESB collects samples of indicator species from various terrestrial and aquatic ecosystems. These species serve as accumulators of environmental chemicals and provide a detailed image of pollution and ecosystem health. To make pollution analysis comparable between years, ESB samples are collected according to highly standardized protocols. Samples are taken at the same time of the year, at identical sites and using identical protocols. Collection is done using sterile equipment to avoid carryover of even trace amounts of pollutants between samples. To ensure preservation of unstable chemical compounds, the samples are stored over liquid nitrogen after collection and for the long term, halting all chemical and biological degradation. To acquire an integrative view of pollution in an ecosystem, ESB samples are large, each one including hundreds to thousands of specimens or tissue compartments (leaves in case of trees)10,50,51,52,53,54,55. Each sample is cryomilled to a fine powder of a grain size of 200 µm, thoroughly homogenizing all traces of chemicals56.

Recent work shows that ESBs are ideal for studies on biodiversity change. ESB indicator species can be considered natural community DNA samplers, which preserve an imprint of the surrounding biological community at the time of sampling. The highly standardized and contamination-free sampling and sample processing conditions, coupled with storage at ultra-low temperatures, make ESB samples perfectly suited for metabarcoding. The cryomilling of large ESB samples also guarantees an even distribution of community DNA traces among the sample and breaks open cell walls of various microorganisms, whose DNA is then uniformly released. Previous studies have already extensively tested and highlighted the suitability of different ESB samples for retrospective biodiversity monitoring12,13,14. Here, we use four different types of ESB samples from terrestrial, limnic and marine habitats as natural community DNA samplers to measure biodiversity change across four decades.

Tree leaves

The ESB collects leaves from three common tree species in Germany, namely European beech (Fagus sylvatica), Norway spruce (Picea abies) and Lombardy poplar (Populus nigra). The leaves are collected once annually or biannually and serve as samplers for aerial pollutants deposited on the leaves51,52,54. ESB leaf samples are collected from different forest ecosystem types, spanning a land use gradient from core zones of national parks to timber forests, forests neighbouring agricultural sites, and urban parks. Sample series from nine sites were included in this study, starting from 1985. Each sample contains hundreds of leaves from at least 15 individual trees, milled to a fine powder51,52,54,56. These samples contain DNA traces of all organisms that interacted with the tree canopy at the time of collection13. Here, we characterize communities of canopy-associated arthropods, fungi and bacteria. The results for arthropod diversity shown in our study represent a novel dataset compared with Krehenwinkel et al.13, including additional and longer time series and an improved DNA extraction protocol to deal with polymerase chain reaction (PCR) inhibitors. Only the poplar datasets as well as the 16S ribosomal DNA (rDNA) datasets were taken from the original dataset by Krehenwinkel et al.13.

Bladderwrack (Fucus vesiculosus)

This macroalgae is widespread along the European coastline, where it makes up a substantial part of the biomass. Marine pollutants are enriched in the tissue of the algae, making it an ideal sentinel species for pollution monitoring53. Three sites have been sampled annually or biannually for bladderwrack thalli beginning in 1985. ESB samples from two North Sea sites are collected at intervals of two months, six times a year, and then merged into a pooled annual sample. The third site at the Baltic Sea is sampled twice a year. Bladderwrack is a critical species in coastal ecosystems, providing a habitat for countless taxa. All these taxa leave detectable DNA traces in the ESB sample12. Here we characterize communities of animals and bacteria that interacted with the bladderwrack.

Blue mussels (Mytilus edulis)

This is the most common mussel in coastal ecosystems of northern and central Europe. Blue mussels constantly filter the water column for planktonic organisms and organic particles. In doing so, they enrich pollutants in their tissue, making them an excellent sentinel species for pollution monitoring55. The ESB has collected blue mussels at three coastal sites in Germany since 1985. The mussel’s entire soft tissue including respiratory water is used for the ESB sample. Annual or biannual samples of hundreds of mussels are compiled from six sampling events at the North Sea and two at the Baltic Sea. With each mussel filtering roughly 1 litre of water per hour, these samples contain a comprehensive imprint of the annual planktonic biodiversity at the sampling site12. Here we characterize communities of eukaryotic plankton and mussel-associated bacteria.

Zebra mussels (Dreissena polymorpha)

The limnic zebra mussel is an invasive species from the Black Sea region, which has colonized nearly all major rivers of Germany since the 1960s57. Like the blue mussel, zebra mussels are highly efficient filter feeders. Since the 1990s, zebra mussels are reared by the ESB on special plate stacks, which are then placed in four major German rivers for about one year, allowing the mussels to accumulate pollutants in their tissue. The mussels are then collected from the plate stacks, immediately deep-frozen and a sample of soft tissue including respiratory water is compiled from thousands of mussels50. The samples from nine sites used here provide an overview of limnic biodiversity from major rivers of Germany12. Here we characterize communities of animals, eukaryotic plankton and mussel-associated bacteria.

Laboratory workflow and sequence processing

Samples were processed as described in refs. 13,14. Work steps were performed on clean benches to avoid carryover and cross-contamination. We isolated DNA from 200 mg of homogenate from each sample. This amount was shown to be sufficient to recover the sample-associated diversity in ESB leaf samples13 and is assumed to apply to all ESB sample types due to the identical grinding process56. DNA was extracted in one or two replicates depending on the sample type (Supplementary Data 5) using a CTAB protocol (OPS Diagnostics), which proved best suited to extract high-purity DNA from these sample types. The DNA extracts were then amplified for different DNA metabarcode markers to enrich various taxonomic groups from the samples (for a list of metabarcode markers and PCR conditions see Supplementary Data 5). PCR was performed using the Qiagen Multiplex PCR Kit in 10-µl volumes according to the manufacturer’s protocols. Primers were chosen to amplify the associated community, but not the ESB indicator species itself, whose DNA dominates the extract. To characterize bacterial communities (all sample types), we amplified the V1 or V5–7 region of 16S rDNA58,59,60. For the bladderwrack and tree samples, mitochondrial and chloroplast DNA will be greatly overrepresented over bacterial DNA in the samples. We thus used primers that exclude chloroplast and mitochondrial amplification for these samples. This may also result in slightly lower species numbers of bacteria recovered, but should not affect the community composition within sample types. For terrestrial arthropods (tree leaf samples), we used a mitochondrial cytochrome oxidase I (COI) marker48. For fungi (tree leaf samples) we used the ITS1 region of the nuclear ribosomal cluster61,62. The variable V9 region of nuclear 18S rDNA was targeted to characterize communities of aquatic animals and eukaryotic plankton (bladderwrack and mussel samples12; for primer details, see Supplementary Data 5). PCR success was checked on 1.5% agarose gels, and the PCR products were then amplified in another round of PCR to add Illumina TruSeq adaptors and unique combinations of dual indexes63 (Supplementary Data 5). All final libraries were pooled in approximately equimolar amounts, cleaned of leftover primers using 1X AMPure beads XP (Beckman Coulter) and then sequenced on an Illumina MiSeq using paired-end sequencing with 500-cycle V2 and 600-cycle V3 kits. To ensure reproducibility of our data and to recover rare species, we ran several PCR replicates for every sample, which were indexed and sequenced separately. The number of PCR replicates was adapted based on sample type and marker, and varied between three and six (Supplementary Data 5). Blank DNA extractions were included in every batch of extractions, and non-template control PCRs were run alongside all PCR reactions. All controls were sequenced along with the samples to provide a baseline for carryover or cross-contamination during processing.

Forward and reverse reads were merged using PEAR64 with a minimum quality of 20 and a minimum overlap of 50 base pairs (bp). The merged reads were then quality-filtered by limiting the number of expected errors in a sequence to 1 (ref. 65) and transformed to FASTA format using USEARCH66. Primer sequences were trimmed off using Unix scripts. Long 18S rDNA amplicons (~350 bp) of limnic metazoan and microeukaryotic plankton generated from zebra mussels were trimmed to match the corresponding short amplicon of ~150 bp. As both metazoan and phytoplankton amplicons span exactly the same nuclear 18S rDNA region, all sequences were combined into one file. Likewise, reads generated from the three tree species were saved to one file for each marker. After trimming, the resulting file for each marker and sample type was dereplicated and clustered into zero-radius OTUs using the USEARCH pipeline. OTU tables were built for each sample type and marker, also using USEARCH. Taxonomy was annotated using blast2taxonomy script v1.4.2.67 after BLAST searching68 against the entire National Center for Biotechnology Information GenBank database for 18S rDNA and COI with a maximum number of ten target sequences. The SILVA database69 was used for annotating 16S rDNA sequences, and the UNITE database70 for fungal ITS1. The FungalTraits database71 was used for the functional annotation of fungi. Taxonomic assignments based on BLAST hits with a base pair length of less than 80% of the amplicon length and/or less than 85% sequence identity were removed. We excluded all taxa except bacteria, algae/protozoa, metazoa or fungi from the respective datasets. Some fungal and bacterial DNA in the samples may also stem from small metazoans associated with the samples. These may bias the recovered diversity patterns from these microorganisms. To ensure independence between recovered biodiversity trends of different taxonomic groups, we functionally annotated fungi and bacteria in our terrestrial dataset and excluded those associated with arthropods. Results for the calculated diversity trends did not differ from those presented in Fig. 4. Furthermore, we used the FungalTraits71 and PLaBAse72,73 databases to check for the proportion of plant-associated fungi and bacteria in our terrestrial dataset. Of all OTUs with a genus-level annotation and a match in the respective reference database, we verified 34% (fungi) and 76% (bacteria) as plant-associated taxa.

Per OTU and sample replicate, we removed all reads below 3, as this is the read carryover between samples commonly observed in our workflows. The OTU tables were checked for contamination using negative controls by excluding OTUs present in the negative controls from the dataset (typical laboratory contaminants). We only detected negligible contamination in the samples. PCR replicates were merged and all datasets were checked for sufficient sequencing depth and sampling size (Extended Data Fig. 1; for resulting sampling and OTU count as well as number of phyla and orders see Supplementary Data 2 ‘Cleaned dataset’).

Statistical model and analyses of community diversity

For each sample type, we (1) only selected sites that were sampled for at least 5 years; (2) only kept sampling years represented by at least 50% of the sites; and (3) removed years that were isolated from the others (>2 years). We also removed samples with low read coverage (less than 50% of the median number of reads). Finally, because OTU read abundances from metabarcoding datasets are subject to many biases74, we converted OTU abundances into binary presence/absence data and only analysed trends in terms of OTU occurrence. To limit cross-contamination, we considered an OTU as present in a sample if it represented at least 0.01% of the total reads. A total of 537 samples were included in the analysis (for resulting sampling and OTU count as well as number of phyla and orders per dataset; see Supplementary Data 2 ‘Filtered dataset (model)’ and Extended Data Table 1).

We measured community diversity trends in four different ways. First, in each community, in each year, we computed the α-diversity using the OTU richness as a measure of local diversity at a given time. Second, we computed β-diversities between pairs of communities (1) sampled at the same site in different years; or (2) sampled at different sites in the same year. Option (1) gives an idea of changes in community composition over time (temporal β-diversity), whereas (2) indicates changes in community composition across space (spatial β-diversity). We measured β-diversities using the full Jaccard distances (turnover and nestedness) and the turnover component of the Jaccard distances alone (R package betapart75). Last, for each dataset, each year, γ-diversity (regional diversity) was computed using bootstrapping with the specpool function (R package vegan76).

To identify temporal trends in these diversity indices, temporal models were fitted using mixed linear models accounting for the temporal autocorrelation between sampling years and the effect of the different sampling sites. We used the lme function (R package nlme77) with the corAR1 temporal correlation and the different sites as random effects. We fitted these temporal models with either the α-diversities, the temporal or spatial β-diversities, or the γ-diversities as response variables.

The significance of the observed trends was evaluated by comparing them with null expectations of community changes. Changes in community composition occur as a result of immigration and local extinction events, which are influenced by neutral and/or niche factors. This generates a dynamic equilibrium with ever-changing communities, even in the absence of any kind of disturbance. Following ref. 17, we built upon the ETIB24 to set up a dynamic model for community assembly that generates null expectations of diversity trends in the absence of disturbance (Fig. 3a). The ETIB is a lineage-based model of species colonization of a local community (the island) from a metacommunity (the continent). In its simplest form24, at each time step, it assumes that each OTU has a probability mi to migrate from the metacommunity to the local community i, and once settled in the community, each OTU has a probability ei to go extinct. The number of new immigration events (that is, of OTUs not already in the community i) per time step is given by mi(S − si), where S is the total number of OTUs in the metacommunity and si is the number of OTUs already present in community i; it declines as the number of OTUs in the community increases. The number of local extinction events per time step, given by eisi, increases with the number of OTUs in the community. An equilibrium is reached when the number of immigration events per time step equals the number of extinction events, that is, mi(S − si) = eisi. The equilibrium number of OTUs in the community is given by si = miS/(mi + ei). This simple form of ETIB implies a linear decrease (respectively, increase) of the number of new immigration events (respectively, extinction events) per unit of time with the number of settled OTUs in the local community. It assumes that all OTUs have the same probabilities to migrate or go extinct (neutrality), and that these probabilities do not depend on the number of OTUs in the community, implying that there is a negligible influence of interspecific competition on immigration and extinction. It thus applies best to communities that are far from carrying capacity. This model has the advantage of being very straightforward to simulate using a simple discrete-time Markov chain78.

Given the incomplete sampling of communities typically achieved with metabarcoding techniques, we assumed that each OTU present in community i is observed at each time step with a fixed probability ρi. This extra parameter can be handled using hidden Markov models. We assumed that the rates mi, si and ρi vary from one community to the other due to various extrinsic and intrinsic factors (for example, distance to the metacommunity, community size and environmental factors).

Assuming neutrality, that is, that all OTUs have the same immigration and extinction rates, is a strong assumption often proved to be unrealistic79. We therefore relaxed the assumption of neutrality. In our non-neutral model, we assumed that immigration (respectively, extinction) rates for each OTU are sampled from a beta distribution with parameters α and α(1 − mi)/mi (respectively, α(1 − ei)/ei). The rates are therefore sampled around mi (respectively, ei) with a variance that is inversely proportional to α: a large α corresponds to scenarios of neutrality whereas α closer to 0 indicates that immigration and extinction rates are very different across OTUs. While each OTU has specific immigration and extinction rates in each community, we assumed that the ranks of the OTUs in terms of immigration and extinction rates are conserved across communities (that is, an OTU with a low extinction rate in community i compared with other OTUs also has a low extinction rate in community j). We thus obtained a non-neutral model derived from the ETIB, which assumes that the presence of an OTU in a community results from the balance between immigration and local extinction, and that each OTU is characterized by specific rates of immigration and local extinction centred around the average rates. At equilibrium, some OTUs are more likely to be present in the community (for example, OTUs with higher immigration and lower extinction rates).

Instead of testing different parameter values chosen a priori17, we implemented an inference strategy to adjust the model parameters to the empirical data using a sequential technique (Fig. 3a). First, in each community, the sampling fraction ρi was inferred using ACE (R package vegan80). Second, we estimated the average rates mi and ei by fitting the neutral model to each community using a hidden Markov model (R package seqHMM81). We used these estimates as community-specific estimates of the average rates of immigration and extinction of the non-neutral model. Third, given ρi, mi and ei, we used simulation-based inferences using artificial neural networks to estimate the parameters S and α. We generated a large number of simulated datasets by sampling S and log α from uniform prior distributions and simulating the corresponding non-neutral model of community assembly, and for each of these simulations, we recorded ɑ-, spatial and temporal β-, and γ-diversities through time across all the sampled sites. We specifically incorporated subsampling into the simulations (with probability ρi), such that not all OTUs present in the local communities are observed, mimicking the detection bias of metabarcoding: simulated ɑ-, spatial and temporal β-, and γ-diversities are therefore directly comparable to empirical diversites. For S, we used a uniform prior distribution between the number of observed OTUs and three times the estimated γ-diversity; for log α, we used a uniform prior between 1 and 5. We started the simulations at year 1500 (providing ample time to ensure they reach equilibrium) with a random community composition at each site (each OTU has a probability si/S to be initially present in the community, where si is the theoretical number of species at equilibrium in the neutral model, given S, mi and ei: si = miS/(mi + ei)). Next, we simulated community composition over time in each site until 2023, sampled the communities according to ρi and recorded community composition through time for the years of sampling. We then computed for each simulation the ɑ-, β- and γ-diversities: we used the same methods and sampling scheme as for empirical data (the number of sampling sites varied through time) to obtain comparable measures of ɑ-, spatial and temporal β- as well as γ-diversities. We trained an artificial neural network to estimate S and log α from time series of ɑ-, β- and γ-diversities using the Python library Keras82, with 100,000 simulations per dataset until reaching a sufficient predictive power. Once trained, the artificial neural network takes as input ɑ-, β- and γ-diversities through time and outputs estimates of S and log α. We used a neural network with three intermediate layers, containing 132, 64 and 32 neurons respectively, and with exponential linear unit (ELU) activation functions. We prevented overfitting by using a dropout of 0.5 at each intermediate layer. Input and output data were scaled between 0 and 1 before fitting, and the simulations were split between the training set (90%) and the test set (10%). Once validated (Extended Data Fig. 3), we finally applied the trained neural network to our empirical data, and obtained corresponding S and log α values.

Given the estimated S and log α values, we simulated our non-neutral model 1,000 times with these parameters to generate time series of ɑ-, β- and γ-diversities under an equilibrium model of community assembly. We then compared empirical and simulated temporal trends and considered an empirical trend to be significant if it fell outside the central 95% of the distribution of the simulated trends. P values were computed as the proportions of simulated trends that were higher or lower than the empirical trend. We interpreted significant deviations from the simulated equilibrium model of community assembly as indicative of out-of-equilibrium dynamics in the empirical data, potentially driven by anthropogenic disturbances.

We used simulations to test the validity of our approach that assesses the significance of the observed trends by comparing them to null expectations of community changes under the non-neutral dynamic model for community assembly. We simulated two scenarios: (1) a scenario without any disturbance; and (2) a scenario of biotic homogenization and regional diversity loss. To generate realistic simulations, we designed them using the number of sites, the total number of OTUs, the rates of immigrations and local extinctions, and the sampling probabilities estimated from the bacterial communities of beech (one of the largest datasets). In the first simulated scenario, we simply simulated community changes under the assumptions of our non-neutral dynamic model for community assembly—we therefore did not expect any deviations from the null expectations when applying our approach. Conversely, in the second simulated scenario, we simulated the regional extinctions of 10% of the OTUs and their replacement by widespread invasive OTUs across all communities—we therefore expected no variation of ɑ-diversities, a decrease of spatial β-diversities (spatial homogenization) as well as a decrease of γ-diversities (regional diversity loss). We performed ten simulations for each scenario. When applying our approach, we correctly recovered the simulated scenario in almost all cases (Extended Data Table 2), confirming the validity of our approach.

To interpret the observed variations of community compositions, we used the conceptual framework of Blowes et al.21. For each sample type and taxonomic group, we plotted the temporal variation log γ-diversities as a function of the temporal variation of log ɑ-diversities. By construction, Δα < Δγ indicates spatial differentiation of the community composition (increase of spatial β-diversities), whereas Δα > Δγ corresponds to spatial homogenization (decrease of spatial β-diversities).

Finally, we investigated the temporal trends of OTU occurrences, as some OTUs may be recent invaders and others may have gone extinct at the regional scale. For each sample type and taxonomic group, we only looked at OTUs present in at least 10% of the samples and across at least two sites. For each OTU, we first tested whether its occurrence in the communities tended to vary through time. To do so, we fitted a generalized linear mixed model with a binomial response (presence/absence of the OTU in a community) and considered the sampling site as random effects.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.



Source link

More From Forest Beat

Impact of Gaura parviflora invasion on urban wildness biodiversity: a campus...

Profiles of the species composition of uninvaded weed communities on campus ...
Biodiversity
6
minutes

Harnessing eDNA technology to identify fish diversity and distribution in the...

Sequence statistics and fish compositionIn this study, a total of 4,594,782 high-quality sequences (2,457,586 in April and 2,137,196 in September) were obtained after...
Biodiversity
7
minutes

Global divergence in plant and mycorrhizal fungal diversity hotspots

Alpha diversity data layersWe used previously published global predictions of AM and ECM fungal richness from Van Nuland et al.26 and Mikryukov et...
Biodiversity
15
minutes

Here’s how you can make your garden a safe and biodiverse...

Biodiversity is essential to mitigating and adapting to climate change, enhancing the resilience of ecosystems and safeguarding the ecological...
Biodiversity
4
minutes
spot_imgspot_img