Comprehensive Dataset for Polychaetes in the IPC: Species Distribution, DNA Barcodes, and Functional Traits

The dataset can be found in a figshare repository (Weng et al.²², and are licensed under CC BY.

The database, which includes three main components (occurrence records, DNA barcode data and functional traits), is organized into three distinct files formatted as ‘. Xlsx’. Instances of missing data within these files have been systematically designated as NA.

The “Occurrence_data.xlsx” file consists two sheets: one entitled ‘species distribution records’ and the other ‘species list’. Each record within the ‘species distribution records’ sheet contains detailed information, including the taxonomic category of the species, viz., family, genus, and species, as well as specific specimen details including latitude and longitude, date of collection, habitat type, depth, source of data, and country/region. Taxonomic category columns clarify the classification hierarchy, incorporating scientific names along with the author’s surname and the year of naming. Data sourced from public databases are marked with the respective database name, such as GBIF or OBIS, whereas literature-derived entries include the title of the publication. The depth column indicates the vertical water depth (in meters) where the species was found, and the habitat column characterizes the environments from which the specimens were collected. The ‘species list’ sheet provides taxonomic classifications for each species, including family, genus, and species.

The dataset comprises approximately 39,310 records of polychaete annelid worms, representing 2,831 species in 696 genera and 75 families, covering the period from 1776 to 2024. Notably, an 13% of these entries are derived from scientific literature, and it is important to highlight that this portion of the data is exclusive and not incorporated within existing databases. The majority of species records are marine, with a small number found in terrestrial or freshwater environments (Fig. 3). The period from 1991 to 2010 experienced the highest number of sampling events, totaling 12,089, which notably surpassed the 6,258 events documented in the decades from 1971 to 1990. Australia was identified as the country with the most sampling events, contributing 61.4% to the total, followed by Indonesia, China, India, and the Philippines. Australia was identified as the country with the most sampling events, contributing 61.4% to the total, followed by Indonesia, China, India, and the Philippines. Most sampling activities had occurred within the 0–100 meter. Furthermore, five families exhibiting the greatest species diversity include Syllidae, with 329 species; Nereididae, with 220 species; Terebellidae, with 215 species; Spionidae, with 174 species; and Polynoidae, with 142 species, as presented in Fig. 4.

The dataset titled “Functional_traits_data.xlsx” consists of a matrix comprising 2,831 species and 13 trait variables. A total of 11,953 valid trait recordings were collected, with temperature tolerance, salinity tolerance, depth zonation, and branching structure/branchiae being the four traits most frequently noted. Conversely, the traits with the fewest number of recordings were population spawning frequency, epistasis, and longevity (Fig. 5).

The dataset entitled “DNA_barcode_data.xlsx” consists of five separate sheets.

The first sheet, identified as “COI”, includes data relevant to the COI gene sequence, detailing information such as class, family, genus, and species, as well as the gene name (abbreviated as COI), gene length, GenBank ID, BOLD ID, and the nucleotide sequence.

The second sheet, labeled “16S”, contains information related to the 16S gene sequence, while the third sheet, named “18S”, provides analogous data for the 18S gene sequences. The columns in this sheet are consistent with those found in the COI sheet, thereby ensuring uniformity across the dataset. The fourth sheet, titled “mtDNA”, consists of mitochondrial genome data, featuring columns such as class, family, genus, species name, length, molecule type, GenBank ID, and sequence. Finally, the fifth sheet summarizes the gene collections affiliated with the species, containing four columns: COI, 16S, 18S, and mtDNA, where the values in the cells denote the number of sequences corresponding to each gene.

In the present study, we catalogued a total of 3,973 COI sequences, which accounts for 20.10% of the total species. Furthermore, we recorded 1,574 sequences for the 16S gene, corresponding to 17.20% of the species diversity. Moreover, we recorded 1,505 18S sequences, accounting for 20.28% of the overall species. In total, we also catalogued 154 mitochondrial genome sequences, of which 55 were generated in the present study. These sequences encompass 33 families, with Nereididae and Spionidae emerging as the most abundant (Fig. 6).

In the pie chart, the grey sections are species with sequences and the white sections are species without sequences.

Source link

UN scientists propose ‘minerals trust’ to power green energy, protect communities

Chief Minister Shri Bhupendra Patel Announces 16th Lion Population Census Results

Wildlife Vocalizations: Steffani Singh – The Wildlife Society

Revived hydropower project to bring forced displacement, Peru communities warn

How 2025 lion census unravels perils, positives of harmonious lion-human coexistence

Clean Air Shouldn’t Be Optional: Climate Investments should be in the Oregon Transportation Package

Calling In: Tell Oregon lawmakers to invest in our transportation future TODAY!

Our energy grid needs help fast. Contact your legislator today!

Removing roadblocks to progress | Climate Solutions

Drivers’ access to national EV charging network threatened by rogue Trump action