[ad_1]
Overall framework
Figure 1 illustrates the methodological framework for generating this high-resolution global dataset of projecting 10 rodent genera distributions. First, we selected the target species for modeling and obtained georeferenced global distribution data for representative rodents at the species level from the GBIF database for the period 1970–2000. Next, we collected historical and future environmental data, including temperature, precipitation, DEM (Digital Elevation Model), and NDVI (Normalized Difference Vegetation Index) as input environmental variables. Future environmental data were derived from four SSP–RCP scenarios and 10 global climate models (GCMs). Finally, we applied the MaxEnt algorithm, optimized the model parameters, and generated global probability distribution for representative rodents at 20-year intervals from 2021 to 2100.
Overall framework for constructing the GridScopeRodents dataset using ecological niche modeling. This diagram illustrates the overall workflow for developing the GridScopeRodents dataset. It integrates species occurrence data, environmental variables, and ecological niche modeling using MaxEnt algorithm. Historical and future projections are generated under multiple SSP–RCP scenarios and GCMs, resulting in a high-resolution global dataset of rodent habitat suitability from 2021 to 2100.
Species selected and processing
We selected ten representative rodent genera for modeling: Akodon, Apodemus, Cricetulus, Mastomys, Microtus, Mus, Oligoryzomys, Peromyscus, Phyllotis, and Rattus. These genera were chosen based on four main criteria:
-
(1)
ecological or economic relevance, particularly regarding agricultural damage and zoonotic disease transmission;
-
(2)
sufficient georeferenced occurrence records (at least 300 records) to support robust MaxEnt modeling;
-
(3)
broad geographic and climatic coverage, encompassing multiple continents and major biomes; and
-
(4)
taxonomic clarity and consistency in global biodiversity databases such as GBIF and IUCN.
This selection includes both globally invasive genera (e.g., Rattus and Mus) and regionally dominant genera (e.g., Akodon, Microtus, and Phyllotis), ensuring taxonomical, ecological and biogeographic diversity in projections. A detailed summary of each genus’s ecological and economic relevance, with supporting literature, is provided in a supplementary table hosted on Figshare, titled “Rodent Genus Selection Justification (Literature)” (see Usage Notes).
Based on GBIF data, we extracted georeferenced global distribution records at the species level for these genera30. The data retrieval criteria included: “HasCoordinate is true”, “HasGeospatialIssue is false”, “OccurrenceStatus is Present”, and “TaxonKey is one of (Phyllotis Waterhouse, 1837; Mastomys Thomas, 1915; Rattus Fischer, 1803; Oligoryzomys Bangs, 1900; Peromyscus Gloger, 1841; Cricetulus Milne-Edwards, 1867; Mus Linnaeus, 1758; Akodon Meyen, 1833; Apodemus Kaup, 1829; Microtus Schrank, 1798)”, “Year 1970–2000”. The extracted dataset underwent coordinate validation and taxonomic name verification, resulting in a total of 548,571 occurrence records (see Fig. 2 for species sample sizes and distributions, with rodent images sourced from iNaturalist contributor31 and Observation.org32).
Representative rodents selected (b) and their global distribution (a). The circular phylogenetic tree displays the rodent species included in the model at the terminal nodes, annotated with the number of samples per species (b). Labels of the form “Genus + None” indicate records that could only be identified to the genus level in the original dataset.
To mitigate spatial autocorrelation and estimation bias caused by uneven sampling, and to improve model accuracy and reliability, we applied the Spatially Rarefy Occurrence Data for SDMs function in the SDMtoolbox Pro extension of ArcGIS Pro (version 3.1.6). The occurrence points for each genus were spatially rarefied to a 10 km resolution, ensuring that only one occurrence point exists within each 10 km grid cell.
Environmental variables selected and processing
We selected climate (bioclimatic variable, including temperature and precipitation), DEM (including elevation and slope), and NDVI data as input environmental variables (Table 1). In addition to climatic variables that directly influence species distributions, non-climatic factors such as elevation, slope, and NDVI also play important roles. Elevation and slope describe topographic structure, which contributes to habitat heterogeneity and potential barriers to species movement. NDVI reflects vegetation greenness and primary productivity, and serves as a proxy for food availability and habitat quality in ecosystems33,34. For future projections, elevation, slope, and NDVI were initially intended to be held constant across all scenarios, due to the lack of reliable, long-term global projections at comparable spatial resolution. This practice—combining static environmental layers with dynamic climate variables—is commonly adopted in SDM studies and has been shown to outperform models that exclude static variables in several contexts35,36,37. However, NDVI was ultimately excluded from the final modeling framework, as explained in the Uncertainty section.
Climate and elevation data were obtained from WorldClim (v2.1), which provides 19 bioclimatic variables (Bio1–Bio19) and elevation data at a spatial resolution of 1/12°38. The dataset includes long-term historical climate averages for the period 1970–2000, as well as projected future climate averages at 20-year intervals from 2021 to 2100. WorldClim is an online database that provides global climate data, with version 2.1 being its second major update. This dataset, based on climate model projections, is widely used in biodiversity conservation, ecological modeling, climate change impact assessments, and agricultural planning38. The global climate dataset provides both historical climate data and future projections, covering four SSP–RCP scenarios and 10 GCMs. Slope data were derived from elevation using the ‘Slope Calculation’ function in ArcGIS Pro (v3.1.6). Global NDVI data were obtained from the PKU-GIMMS-NDVI dataset (v1.2) released by Peking University, at a resolution of 1/12° over the period 1982–202039. To align as closely as possible with the temporal coverage of the WorldClim climate data, we used the long-term mean for the period 1982–2000 as a proxy for average vegetation greenness during the baseline period. This dataset addresses uncertainties in existing long-term NDVI records, particularly those caused by NOAA satellite orbital drift and AVHRR sensor degradation and has demonstrated high accuracy when evaluated against Landsat NDVI samples39.
Ecological niche modeling of representative rodents
Ecological niche refers to the role or function of a species within its environment, encompassing its position in the ecosystem, interactions with other species (e.g., predation, competition), ecological requirements (e.g., food, habitat, climatic conditions), and the way it utilizes resource endowments and environmental conditions40,41. This study selected habitat preferences and environmental factors, such as temperature, humidity, and food availability (vegetation coverage), as input variables for ecological niche modeling. We employed the Maximum Entropy (MaxEnt) algorithm within the species distribution modeling (SDM) framework to analyze the ecological niches of representative rodents, using Maximum Entropy Species Distribution Modeling software (v3.4.4).
MaxEnt is a machine-learning algorithm based on the maximum entropy method (see Eq. 1), integrating species occurrence data with environmental variables to predict potential species distributions42,43. The algorithm is advantageous due to its low data requirements, high predictive accuracy, and straightforward implementation, making it one of the most widely used SDMs in biogeography and macroecology44. The core principle of MaxEnt is to select the probability distribution with the highest entropy that satisfies the given constraints, thereby minimizing prediction risk. This ensures an accurate estimation of species distributions by producing the most uniform probability distribution possible in the absence of additional subjective assumptions, ultimately reducing predictive uncertainty. MaxEnt was trained using species occurrence records and randomly sampled background points. These background points are not considered true absences but are used to characterize the available environmental space, following standard practice in presence-only species distribution modeling42,44.
$$\begin{array}{cc}{\rm{maximize}} & -\sum _{x\in X}p\left(x\right)\log p\left(x\right)\\ {\rm{subject}} & \begin{array}{c}\sum _{x\in X}p\left(x\right){f}_{j}\left(x\right)=\frac{1}{n}\mathop{\sum }\limits_{i=1}^{n}\;{f}_{j}\left({x}_{i}\right),\forall \,j\\ \sum _{x\in X}p\left(x\right)=1\\ p\left(x\right)\ge 0,\forall \,x\in X\end{array}\end{array}$$
(1)
where X denotes the set of all spatial grid cells within the study area; xi represents the i-th known occurrence location of the species, with \(i\) = 1,2, …, \(n\); fj(x) is the \(j\)-th environmental feature function at location x; and p(x) is the estimated spatial probability distribution of species presence.
We conducted both historical and future distribution modeling for ten representative rodent genera. Variable selection and parameter optimization were necessary when performing ecological niche modeling using MaxEnt. To ensure methodological rigor, we reviewed multiple studies on rodent distributions and followed a systematic process25,45,46,47,48:
First, to reduce multicollinearity among variables, Pearson Correlation (r) was used to compute pairwise correlations among all variables. Meanwhile, all the preselected historical environmental variables were input into MaxEnt to obtain the Percent Contribution (PC) and Permutation Importance (PI) of each variable in influencing species distribution. The r, PC, and PI values obtained from the initial modeling of each Genus are shown in Fig. 3.
Environmental variable correlations and variables’ percent contributions and permutation importance for each genus in initial modeling. Bottom-left: Network diagram showing the percent contribution (%) of each environmental variable (positioned diagonally) to each rodent genus (along the left and bottom axes). The thickness and color of the connecting lines represent the magnitude of contribution, with thicker and more yellow edges indicating higher contribution. Center: Pearson correlation matrix among the 22 environmental variables used in the modeling. Right: Permutation importance (%) of each variable in predicting genus-level habitat suitability.
Then, according to the obtained r, PC, and PI values, we refined the environmental variable set by removing highly correlated variables (|r| ≥ 0.8), retaining those with a greater contribution for PC and PI to species distribution. The final selected environmental variables were then re-entered into MaxEnt (see Table 2) for parameter tuning using the ENMeval package in R. We tested regularization multiplier (RM) values ranging from 1.0 to 6.0 (with 0.5 steps), while keeping the feature classes at MaxEnt’s default “Auto features” setting. Model performance for each RM setting was evaluated using Area Under the Curve (AUC) and True Skill Statistic (TSS). The RM value yielding the best balance between predictive accuracy and generalization was selected for each genus (see Fig. 4).
Area Under the Curve (AUC) and True Skill Statistic (TSS) performance across varying regularization multipliers for (a) Akodon, (b) Apodemus, (c) Cricetulus, (d) Mastomys, (e) Microtus, (f) Mus, (g) Oligoryzomys, (h) Peromyscus, (i) Phyllotis, and (j) Rattus. All models were run with MaxEnt’s default “Auto features”. Each point represents the mean AUC and TSS across 25 replicate runs, with error bars indicating ± 1 standard deviation. Note: TSS was calculated using the Maximum Training Sensitivity Plus Specificity (MTSS) threshold applied to cloglog-transformed outputs.
Next, using the final set of environmental variables and the optimized parameters, we conducted 25 replicate runs for both historical and future distribution modeling of each species. Historical distribution modeling was based on historical-format environmental variables. Following the historical distribution modeling results, we selected 10 GCMs and their corresponding four SSP–RCP scenarios for future projections (Table 3 and Table 4). The MaxEnt parameter settings were as follows: randomseed, randomtestpoints = 25, maximumbackground = 100000, replicates = 25, replicatetype = bootstrap, writebackgroundpredictions, and maximumiterations = 5000.
Finally, we generated a global probability distribution dataset for rodents from 2021 to 2100 at 20-year intervals under four SSP–RCP scenarios and ten GCMs. The probability estimates were generated using cloglog transformation, which is considered to have stronger theoretical interpretability compared to the traditional logistic transformation28.
[ad_2]
Source link