Curated global occurrence dataset of the insect order Zoraptera

[ad_1]

To incorporate all species currently classified in this insect order into the dataset, the most recent comprehensive catalog of Zoraptera was utilized¹⁷. Subsequently described taxa were then added (e.g.^{3,4,5,6,18,19,20}), while taxa that were not zorapterans were removed²¹. We followed the currently used and widely accepted higher classification with two families (each with two subfamilies) and ten genera. This classification was recently proposed by Kočárek et al.² and Kočárek & Kočárková²², and it is based on the results of analyses of molecular phylogeny in combination with morphological characters. The current version of the dataset includes all recent members of the order Zoraptera described before October 1, 2024.

Data sources

The geographical position of each species was obtained from published sources, as well as from material deposited in museum collections and other material collected and/or identified by the authors of this contribution. Additionally, data from iNaturalist²³ and GBIF²⁴ were included. Initially, data from all original descriptions of new species was incorporated, and subsequently, all Zoraptera distributional records found in the remaining literature were added. A comprehensive search strategy was employed, encompassing all references cited in the original descriptions as well as in the catalog by Hubbard¹⁷, and subsequently, all the references cited in those works were searched. This process was repeated until no new references or occurrence records were identified. To ensure the comprehensiveness of the results, systematic searches were conducted on Google Scholar, Google, and Web of Knowledge using the keyword “Zoraptera” and all supraspecific taxon names historically used in this order (Zorotypidae, Spiralizoridae, Zorotypinae, Spermozorinae, Latinozorinae, Spiralizorinae, Aspiralizoros, Brazilozoros, Centrozoros, Cordezoros, Floridazoros, Latinozoros, Meridozoros, Scapulizoros, Spermozoros, Spiralizoros, Zorotypus, and Usazoros). Therefore, we reviewed not only the taxonomic and faunistic literature, but also studies focusing on the biology, morphology, and phylogeny of Zoraptera (many of which included useful distributional data of the material examined), as well as various general books and other documents. Most publications were in English; rare cases of studies written in other languages (German, Latin, Portuguese, Spanish, Chinese) were analyzed in consultation with colleagues and translated using online translation websites (DeepL or Google Translate). The references included in the final dataset were either those providing original data or, in cases where multiple references reported the same distribution information, only those with the first record or with the most complete information.

All records from iNaturalist have been revised and identified to the lowest possible reliably determinable taxonomic rank directly on the iNaturalist website by Petr Kočárek, and have only imported records with the appropriate license (CC-BY, CC-BY-NC, CC0). GBIF was then queried, excluding iNaturalist data, resulting in a GBIF dataset²⁴ that was further manually revised (see Technical Validation section below for common errors in Zoraptera identification). Specifically, we excluded sequence records imported from genetic databases (Barcode of Life Data System—BOLD, and European Nucleotide Archive—ENA) in all cases when there was no specified voucher specimen deposited in the publicly available collection, fossil records from amber, or records with any geographic information. Finally, we matched the corresponding records by adding a GBIF id to existing records in our dataset and then added the remaining records with revised taxon identification.

Digitizing locations

Information about the geographic location of the records in the available sources was stored in different ways and of varying quality. In all cases, we attempted to derive the most likely accurate coordinates and then determine the degree of positional uncertainty. If the record already had the coordinates, the coordinates were converted to decimal degree format, and the uncertainty was set based on the coordinate precision according to Wieczorek²⁵. In cases where coordinates were missing or not uninterpretable and only a record description was available, we obtained the coordinates by digitizing the areas based on all available information, including maps in publications. To find locations by name or address, we used the Nominatim geocoding tool, which searches for features in OpenStreetMap (OSM) data²⁶. The digitizing process was conducted in QGIS 3.38²⁷. We used Nominatim in QGIS using the plugin OSM place search 1.4.5²⁸, which enabled us to import OSM geometry features with attributes directly to QGIS for further processing. We followed the Georeferencing Quick Reference Guide²⁹, and each record was treated separately and carefully based on the context of the environment; i.e., we excluded from the digitized area areas that we felt would be more appropriate for description in the collector’s situation, such as a major city or other notable geographic feature. We used OSM geometry features in various ways based on the amount of available information and the context surrounding the target area, e.g., if there was little information, such as the name of the state or city or only part of it, we used the exact OSM feature. In other cases, the OSM feature was edited or only used as a reference point, e.g., if the location was described as ‘near’ or using distance and direction from the given location. The identification numbers (ids) of the original OSM features were stored in the dataset and could be retrospectively examined and compared. If the record site could not be localized with Nominatim, we used other sources such as Google Maps or various sites and publications reached with Google search and digitized manually or used corresponding OSM features. If altitude was considered, we used OpenTopoMap³⁰ to derive the area corresponding to the altitude. If the description referred to a line or point feature (e.g., a road, river, hill, or other feature), we converted these features to polygons with at least a 100-meter buffer. In general, if the polygon extended beyond the coastline (e.g., Java), coastal areas were cropped to coastlines from the OSM data (OSM tag ‘natural = coastlines’). Finally, all polygons were simplified using the QGIS native Simplify tool with the Visvalingam algorithm and a threshold tolerance set of 100. This resulted in the removal of redundant polygon vertices, so the resulting dataset saved data storage space, with negligible loss of information. The resulting polygons were stored in a GeoPackage file that was published as part of the dataset repository. In addition, the polygons were written into the dataset itself as well-known text (WKT). This allows users to check the individual areas from which the coordinates were created, and to make further edits and updates.

We assigned a country name and code to every record based on ISO 3166-1 alpha-2. To obtain coordinates and positional uncertainty from polygon geometries, we computed enclosing circles and their centroids. If such a centroid was outside the original polygon geometry, we calculated the nearest point that intersecting the polygon. We then calculated the geodetic distance from that point to the most distant point on the polygon (i.e., the radius). These points represent the coordinates of the record, and the distances represent the positional uncertainties. These values were calculated in R 4.3.3³¹ with the packages sf³² and lwgeom³³.

Dataset updates

Records in the dataset can be updated by directly editing the ‘zoraptera_occs.csv’ or semiautomatically from iNaturalist and GBIF. The iNaturalist update workflow starts by revising the identification directly in iNaturalist, then we use the rinat R package³⁴ to check new records or identification updates verified by specific users (for the initial version of dataset only Petr Kočárek); compliant data are then automatically appended to the ‘zoraptera_occs.csv’ dataset, and the date of an update is recorded in the log file. The GBIF update workflow starts by downloading the current Zoraptera data from GBIF using the rgbif R package³⁵. On each update, the current GBIF dataset is compared with the last downloaded and revised GBIF dataset based on the gbifID of the records. The date and the dataset doi are stored in a log file to repeat this process. All new GBIF records are temporarily stored and manually revised and implemented in ‘zoraptera_occs.csv’. Polygon geometry can be manually added or edited within the GeoPackage, and the coordinates with positional uncertainties can be automatically recalculated and updated in ‘zoraptera_occs.csv’.

[ad_2]

Source link

Asiatic Lioness ‘Kesari’ Welcomes Four Cubs at Assam State Zoo

Fred Johnson wins Caesar Kleberg Award

Latest rhino assessment finds two species recovering, but three continue to decline

Journalist Given Relief In 16 Year Old Gir Forest Case

To save humanity and nature we must tackle wealth inequality, says Cambridge researcher

The Power of place: chasing blue skies

Oregon has the tools to repair and revamp our aging grid

Seattle shows how local governments can play a big role in cutting pollution

This hybrid electric ferry crosses the sound without a sound

Macro Wins for Microgrids!

Global 1-km habitat distribution for endangered species and its spatial changes under future warming scenarios

Italian still life paintings as a resource for reconstructing past Mediterranean aquatic biodiversity

Trait mediation explains decadal distributional shifts for a wide range of insect taxa

Why are there large gaps in the British distribution of Common Elder?

Origin and crop type affect the biodiversity pressures of fruits and vegetables

COP30: Brasil llama a un mutirão global por la acción climática, ¿qué significa?

Daya tarik kisah lama bagi para pendaki baru di Gunung Hantu Papua Nugini

From WWII ordeal to eco-tourism

AE-TPP’s 2025 Forum in Hanoi

10 Key Themes Shaping Brazil’s COP30 Agenda in Belém

Heritage Hub Exclusive! Glasgow Restored: Maryhill Burgh Halls

Going for Zero: A Q&A with Carl Elefante

Exploring the Chesapeake Bay’s Unsung Black History

What is Left Behind: Builders at 6 National Trust Historic Sites Leave Their Mark

An Exceptional Contribution to Preservation: Camille and Duncan Strachan

Curated global occurrence dataset of the insect order Zoraptera

Data sources

Digitizing locations

Dataset updates

More From Forest Beat

Global 1-km habitat distribution for endangered species and its spatial changes...

Italian still life paintings as a resource for reconstructing past Mediterranean...

Trait mediation explains decadal distributional shifts for a wide range of...

Why are there large gaps in the British distribution of Common...

About

Service

Newsletter

Curated global occurrence dataset of the insect order Zoraptera

Data sources

Digitizing locations

Dataset updates

More From Forest Beat

.tdi_122{margin-bottom:15px!important}@media (min-width:768px) and (max-width:1018px){.tdi_122{margin-bottom:8px!important}}@media (max-width:767px){.tdi_122{margin-bottom:8px!important}}Italian still life paintings as a resource for reconstructing past Mediterranean...

.tdi_142{margin-bottom:15px!important}@media (min-width:768px) and (max-width:1018px){.tdi_142{margin-bottom:8px!important}}@media (max-width:767px){.tdi_142{margin-bottom:8px!important}}Trait mediation explains decadal distributional shifts for a wide range of...

.tdi_162{margin-bottom:15px!important}@media (min-width:768px) and (max-width:1018px){.tdi_162{margin-bottom:8px!important}}@media (max-width:767px){.tdi_162{margin-bottom:8px!important}}Why are there large gaps in the British distribution of Common...

About

Service

Newsletter

Italian still life paintings as a resource for reconstructing past Mediterranean...

Trait mediation explains decadal distributional shifts for a wide range of...

Why are there large gaps in the British distribution of Common...