The four target macro categories (Mediterranean coastal dunes, grasslands, Alpine screes, and Mediterranean deciduous forests) considered in this work consist of six different habitat types listed in Annex I of the Habitats Directive. To adhere to Annex I of the Habitat Directive2, habitats are identified by a four-digit code that includes the first digit, which represents the type of the environment, along with the subsequent digits that provide more specific information. In this dataset, we consider habitats 2110, 2120, 6210*, 8110, 8120, and 9210*. In the six habitats considered, 2110 refers to Embryonic shifting dunes, 2120 refers to Shifting dunes along the shoreline with Ammophila arenaria (white dune), 6210* refers to Semi-natural dry grasslands and scrubland facies on calcareous substrates (Festuco-Brometalia) (*important orchid sites), 8110 refers to Siliceous scree of the montane to snow levels (Androsacetalia alpinae and Galeopsetalia ladani), 8120 refers to Calcareous and calcshist screes of the montane to alpine levels (Thlaspietea rotundifolii), and 9210* refers to Apennine beech forests with Taxus and Ilex. Table 1 synthesizes the information relative to each habitat.
Plant species with a different ecological role were selected for each macro category. Typical species (TS) serve as an indicator for the “structure and functions” metrics which determine how successfully the habitats, where NS live, are being conserved. TS must be selected according to the monitoring manuals in a manner that reflects the favorable structure and functions of the habitat type. This means that TS should simultaneously be an effective indicator of the favorable habitat quality, unique to the habitat or present across a significant portion of the habitat range, and sensitive to changes in the habitat condition. EWS are plant species that indicate the ongoing processes that alter the structure and function of their habitat. This concept is particularly applicable to non-typical taxa, whose presence can serve as an informative proxy for detecting the initial community functional shifts33. This also took into account IAS, as they pose a significant risk to biodiversity and worsen the conservation status of habitats that are important to the European Community.9. Table 2 recaps the macro categories, the species selected, and their ecological role, while details are given in the subsequent habitat-specific methods sections.
While standardizing conditions (e.g., controlled lighting and fixed camera settings) might yield a more homogeneous dataset with potentially higher performance on in-laboratory tests, such uniformity would fail to capture the variability inherent in real-world monitoring scenarios. Our dataset intentionally encompasses a wide range of environmental conditions, including varying lighting, exposure, angles, and focus. To ensure this we used autonomous robotic missions, teleoperated operations, and human-operated cameras with different hardware, ensuring that artificial intelligence algorithms trained on this dataset can generalize robustly across unpredictable field conditions.
Employed Hardware and Procedures
The vast majority of data was acquired in the spring and summer of 2022 and 2023 (starting April 27th, 2022, and ending July 15th, 2023) when fieldwork was carried out to verify the occurrence and distribution of the selected target species in the study areas and to carry out explorative vegetation plots. Moreover, we collected a large quantity of pictures and videos of the plants in different phenological stages to later be labeled. The guidelines followed were in general the same for each habitat as per the Data Acquisition Pipeline subsection in the Technical Validation section. The procedure was generally different for each macro category and is detailed in the relative Method sections. The species identified as indicators differ for each habitat, as differ the procedures of data acquisition that need to mimic the ones effectuated by the human operators, and the fieldwork periods also differ as they depend on the directive and the lifecycle of the chosen species. Because of this, the method utilized differs between each habitat and has been described in a distinct method subsection for Mediterranean Dunes, Grasslands, Alpine Screes, and Mediterranean Deciduous Forests. The raw data used in the work presented in this data descriptor were acquired following three different procedures: robot in autonomous mission for around 7.6% of the images, teleoperated robot for around 27.9% of the images, and by human operators with different cameras for the remaining 64,5% of the images. This difference in amounts is related to the technical difficulty of the data acquisition: the more difficult the procedure, the less time we were able to operate it. While autonomous missions yield systematic, grid-based images with relatively uniform angles, teleoperated and human-operated images inherently introduce greater diversity in viewpoints and exposure conditions. This variability, rather than being detrimental, enhances the dataset’s robustness by more accurately simulating real-world conditions and enabling artificial intelligence models to generalize more effectively. Furthermore, the expert annotation process and subsequent quality assurance protocols helped normalize any differences in image quality or species representation. Needs to be noted that the acquisition of the raw data in an autonomous fashion is more time-effective, but we chose to not label most of the data acquired in this stage of the project willing to use it as a benchmark for the performance of the framework and of the models.
Data acquired employing robot in autonomous mission
Robot’s autonomous missions are conducted to mimic the procedures used for habitat monitoring carried out by human operators. First, for data acquisition, we selected sites accessible by both human operators and robots. The legged robot used in this work is ANYmal C34, which was developed by ANYbotics AG based in Zurich, Switzerland. This robot possesses dimensions of 1.05 m by 0.52 m and weighs 50 kilograms. It is capable of performing various movements, including hip abduction and adduction, hip flexion and extension, and knee flexion and extension, due to its three actuated joints per leg. ANYmal C can be operated wirelessly through a dedicated remote controller or a PC, or it can operate autonomously in unstructured environments, such as dunes. It is powered by a 932.4 Wh lithium-ion battery, which provides an operational window of 2–4 hours on a single charge. The robot is also equipped with an array of exteroceptive sensors, including a Velodyne VLP-16 puck lite LiDAR, four depth cameras Intel RealSense D435 RGB-D, and two wide-angle cameras FLIR Blackfly BFS-GE-16S2C-BD2, which provide the robot with environmental perception capabilities. The LiDAR sensor uses laser pulses to create a detailed three-dimensional representation of the surroundings, while the depth cameras use stereo vision to capture both color images and range information as shown in Fig. 2.
During the mapping phase, the robot is teleoperated in the area following a non-predefined trajectory, reconstructing a three-dimensional map of the surrounding environment. Next, the robot performs an autonomous mission, it locates itself in the previously created operative space scenario map and it moves following the waypoints, that vary for each habitat according to the different procedures. During the entire autonomous mission, videos are acquired with the onboard cameras, and data relating to the robot’s status is recorded. At each waypoint, the robot oriented itself consistently before taking four photographs utilizing its four RGB-D cameras. In the presence of obstacles, the robot was teleoperated for security concerns: otherwise, it moved autonomously.
Data acquired employing robot in teleoperated mission
The robot’s remote-controlled missions were conducted for the sole purpose of acquiring data on the target species to be labeled. Consequently, spaces within the study area corresponding to each habitat were selected where it was not feasible to replicate the plot carried out by botanists as traditional monitoring via an autonomous mission but where at least one target species was present. Indeed, the data acquired via an autonomous mission were utilized for benchmarking the quality of the framework and the associated artificial intelligence models. Therefore, it was essential to ensure that the data acquired via a teleoperated robot were obtained in a location distinct from the one where the autonomous missions were conducted, in order to avoid positive bias. In this procedure, the robot is teleoperated by an operator in a manner that enables the capture of photographs of the species identified by plant scientists by varying position, focal distance, number of individuals, and angle within the image plane.
Data acquired employing human operators with multiple hardware
Regarding the data acquired by a human operator, different hardware was employed, including photographic cameras (OLYMPUS E-M5, SONY ILCE-6000, NIKON D7200, NIKON D300, NIKON COOLPIX S3300, CANON EOS 80D) and smartphone cameras (HUAWEI FIG-LX1, SAMSUNG SM-J250Y, SAMSUNG SM-A536B, XIAOMI 21061119DG, XIAOMI 22101316G, APPLE iPhone SE). This procedure focused on maximizing the generalization capacity of the dataset, beginning with the utilization of diverse hardware for the acquisition of raw data, while also considering the diversification of various environmental conditions of the photographs (e.g., weather, space, time, exposure, brightness, number of individuals).
Mediterranean Coastal Dunes: methods for Annex I habitat 2110 and 2120
The data were primarily collected in two sites located in North Sardinia (Italy). The first one is the dunal system located inside the Special Areas of Conservation (SAC) ITB010003 – Stagno and Ginepreto di Platamona. It is a 17-kilometer-long sandy beach with longitudinal and parabolic dunes aligned north-west/southeast. The second one, spiaggia del Relitto, is located in the La Maddalena Archipelago which is a SAC (ITB010008) and a National Park. A series of communities that extend along parallel transects from the shoreline to the most stable regions behind the dune characterize the vegetation of the emergent portion of the beach in both locations. The morphology of the dunes varies depending on the exposure to wind; the dunes in Platamona have varying heights and forms, whereas the dunes in Spiaggia del Relitto are flat.
Fieldwork was conducted over the course of four days during the first campaign, from May 16th to May 19th, 2022, and over the course of seven days during the second campaign, from May 16th to May 20th, 2023. All the fieldwork was conducted in Platamona excluding one day of the latter campaign spent at the Spiaggia del Relitto. Those periods have been chosen as they represent the ideal conditions for sampling dune vegetation35.
The zonation of coastal dunes is shaped by a combination of ecological factors such as salt spray, burial, stability, and nutrient availability. Almost all types of dune vegetation are recognized as habitats of community interest in the European Union under Annex I of the Habitat Directive. We considered the habitats 2110 and 2120.
Habitat 2110 – Embryonic shifting dunes
This habitat represents the initial stages of dune formation, found as a seaward fringe at the base of taller dunes or as ripples and elevated sand surfaces on the upper beach. We considered the two TS Thinopyrum junceum (L.) Á.Löve and Achillea maritima (L.) Ehrend. & Y.P.Guo,. Moreover, we considered other two NS, Eryngium maritimum L. and Pancratium maritimum L.
Habitat 2120 – Shifting dunes along the shoreline with Ammophila arenaria (white dunes)
These are mobile dunes forming the seaward cordon or the cordons within coastal dune systems. In this habitat, we considered the TS, the geophyte Ammophila arenaria (L.) Link subsp. arundinacea (Husn.) H. Lindb., a perennial plant with both horizontal and vertical rhizomes that can reach up to 120 cm which plays a crucial role in the formation and maintenance of dunes. Other species commonly found in this habitat include Eryngium maritimum and Pancratium maritimum.
We also considered the presence of the IAS Carpobrotus acinaciformis L. L.Bolus36,37, which poses a significant threat to biodiversity and negatively impacts the conservation status of these habitats. For reference, see Table 2.
TS and NS living in Mediterranean coastal dunes are psammophilous species that are adapted to live on sand in extreme conditions determined by high salinity, wind exposure, and scarcity of water and nutrients. They have rigid, hairy leaves with thickened cuticles and extensive, highly branched root systems resistant to mechanical pull38. The IAS C. acinaciformis has a facultative C3-CAM photosynthetic strategy, high phenotypic plasticity, resistance to high salinity concentrations, and intense vegetative clonality. It shows strong adaptability and dispersal ability in sandy dunes39.
Vegetation surveys are conducted within adjacent 1m × 1m plots along a transect perpendicular to the coastline, documenting the changes in vegetation across environmental gradients for each of the two habitats. During its autonomous mission, the robot followed the linear transect, halting every meter to capture images and assess each contiguous 1 m2 plot.
Grasslands: methods for Annex I habitat 6210*
The data were primarily collected in the Valsorda area, in central Italy, on the Apenninic mountain ridge at around 1000 m a.s.l. within the municipality of Gualdo Tadino in the province of Perugia, Italy. This location is part of the N2000N and is specifically designated as the Special Area of Conservation (SAC) IT5210014 – Monti Maggio – Nero (sommità). The area comprises a mosaic of grasslands and sparse beech forests. The vegetation in Valsorda is composed of diverse communities that are adapted to the varying altitudes and slopes of the terrain. Grassland habitats dominate the mountain tops, particularly Annex I habitat 6210*, which includes semi-natural dry grasslands and scrubland on calcareous substrates. These grasslands are interspersed with patches of beech forests (habitat 9210*), which are more prevalent on cooler, north-facing slopes. The area’s geomorphology is shaped by its rugged terrain, with sporadic steep slopes and rocky outcrops.
The orchid species play a particularly important role in this habitat type, and the timing of their flowering is crucial for monitoring their occurrence. Because of this, data collection was carried out in May, which is the ideal period for detecting both the target TS and EWS species in the field in habitat 6210*, as they typically flower between April and June40. Also, during the early stages of their life cycle, the orchid species are difficult to distinguish, while at the time of flowering, they become easier to recognize. Fieldwork was conducted from May 10th to May 13th, 2022, during the first campaign, and from May 8th to May 10th, 2023, during the second campaign.
Habitat 6210* – Semi-natural dry grasslands and scrubland facies on calcareous substrates (Festuco-Brometalia) (* important orchid sites)
This habitat is typically found on calcareous soils in open, sunny locations, often on slopes and hillsides, and is characterized by species-rich semi-natural grasslands. The vegetation includes a variety of grasses and herbaceous plants that are well-adapted to dry conditions. These grasslands are known for their high biodiversity and often support a range of orchids, including Anacamptis morio (L.) R.M.Bateman, Pridgeon & M.W.Chase and Dactyloriza sambucina (L.) Soó. The habitat also provides important ecological functions, such as supporting pollinators and serving as a refuge for many species of invertebrates as well as feeding areas for herbivorous mammals. The diversity and structure of the vegetation can vary depending on factors such as grazing pressure, which influences the presence and abundance of species within the habitat.
There are no generally valid lists of Typical Species (TS) at the European or national level because of the variability in grassland flora. Official lists of species are included in reference physiognomic combinations at national level41, but they are suitable for habitat identification, not for habitat monitoring. It is advised to identify TS at a regional or even local scale42. The presence of orchid species is typically regarded as an indicator of favorable conservation status, and their abundance within a surveyed area is particularly significant as an indicator of the priority status of habitat 6210*2. In accordance with the guidelines and the recommendation to select TS at the local level for habitat 6210*, we chose two typical orchid species, Anacamptis morio (L.) R.M.Bateman, Pridgeon & M.W.Chase and Dactylorhiza sambucina (L.) Soó (in both its yellow and pink forms), which are among the most commonly found orchids in habitat 6210* at the regional level. A. morio grows in different environments, predominantly in unimproved dry grasslands, often in large populations, in full sunlight or partial shade. D. sambucina mainly occurs in meadows and pastures on the mountain range, though it can occasionally be encountered in clearings or bright woodlands43. Both species are rather indifferent to the substrate type; A. morio generally prefers nutrient-poor soils, while D. sambucina the moister, nutrient-rich ones44. As Early Warning Species (EWS) we selected Asphodelus macrocarpus Parl. This tall, rhizomatous geophyte is known for its vigorous vegetative growth in the spring, typically spreading from forest edges into semi-natural grassland habitats, in the Apenninic areas. The presence and impacts of this species on habitat 6210* in the central Apennines are well-documented, particularly in areas where traditional agropastoral activities have been reduced or abandoned. A. macrocarpus colonizes grasslands through direct invasion and rapid expansion, facilitated by its heliophilic nature and efficient vegetative propagation44,45,46,47. The invasive characteristics of this species are further intensified by its unpalatability to grazing animals47,48,49. In its native range, A. macrocarpusgrows in grasslands, woodlands, and open shrub areas, on well structured soils46.
In grassland monitoring, the surveyed area typically consists of a 4 m × 4 m plot or a transect of varying length. The objective during this phase is to gather habitat data using the four onboard cameras. Navigation is achieved through a series of waypoints that the robot autonomously determines in real-time: upon reaching a waypoint, the robot sets the next one at a distance of 1 m. This approach is chosen for two main reasons. Firstly, botanists generally divide the area into multiple 1 × 1 m2 blocks for detailed surveys. Secondly, not pre-selecting the waypoints allows the robot to adjust to any variations in terrain slope.
Alpine Screes: methods for Annex I habitat 8110 and 8120
The data were primarily collected in the Valfurva area, situated within the Stelvio National Park, in the province of Sondrio, Italy. This location is part of the N2000N and is specifically designated as the Special Protection Area (SPA) IT2040044 – Stelvio. Valfurva is characterized by its alpine landscape, which is located within the high altitudes of the Italian Alps. The area comprises a combination of rocky screes and sparse vegetation, with elevations ranging from montane to alpine zones. The scree habitats in this region include habitats 8110 and 8120. These habitats are found on steep, rocky slopes, where the substrate is composed of loose rock debris, either siliceous or calcareous. The vegetation is sparse due to the unstable and nutrient-poor conditions, but it includes specialized plant communities that are adapted to the harsh environmental conditions of these high-altitude areas. These scree habitats, classified as unfavorable-inadequate (U1) in the fourth EU report, are critical for the conservation of specialized alpine flora. The microclimate of Valfurva, influenced by its elevation and exposure, fosters a unique combination of species that vary significantly depending on altitude, rock type, and exposure to sunlight and wind.
Fieldwork was conducted from July 19th to July 21st, 2022, during the first campaign, and from July 10th to July 15th, 2023, during the second campaign. The days selected were determined because in high-altitude habitats, this period is ideal for observing the phenological stages of vascular plants, as it coincides with their peak blooming and development.
Habitat 8110 – Siliceous scree of the montane to snow levels (Androsacetalia alpinae and Galeopsetalia ladani)
This habitat consists of loose rock debris found on steep slopes at elevations ranging from montane to alpine levels. The scree is primarily composed of siliceous rock, and the vegetation is sparse due to the unstable substrate. Typical species in this habitat include pioneer plants that are adapted to the shifting conditions. These species often have deep root systems to colonize the loose, rocky environment. Notable species include Geum reptans and Ranunculus glacialis, both of which are well-adapted to the high altitudes and poor soil conditions. The habitat prevents soil erosion and acts as a refuge for specialized alpine flora and fauna.
Habitat 8120 – Calcareous and calcshist screes of the montane to alpine levels (Thlaspietea rotundifolii)
This habitat is found on slopes with calcareous or calcshist scree, typically at montane to alpine elevations. The substrate consists of loose, unstable rock fragments, which support specialized vegetation. The flora is characterized by its ability to stabilize the scree, reducing the movement of rocks and contributing to the gradual development of soil. These habitat is important for the conservation of alpine plant species that are adapted to extreme environmental conditions and for maintaining biodiversity.
The considered TS for the Screes habitats are Cerastium spp. (which includes Cerastium uniflorum Clairv. and Cerastium pedunculatum Gaudin), Geum reptans L., Papaver alpinum L., Ranunculus glacialis L., and Saxifraga bryoides L. Conversely, the only considered EWS is Luzula alpinopilosa (Chaix) Breistr.
All these species share the need for cold, high-altitude environment, with intense light and low nutrient content. Papaver alpinum, being linked to calcareous screes (habitat 8120) grows on soils with high pH, while all the other species need acidic soils typical of habitat 811050. All the TS are strictly linked to the coarse soil characteristic of active scree slopes, while Luzula alpinopilosa, the only EWS, shows a preference for substrates rich in fine fraction50, which testify an incoming stabilization of the scree slope.
In screes settings, the surveyed area typically consists of a 5 m × 5 m plot51. The objective during this phase is to gather habitat data using the four onboard cameras. Navigation is achieved through a series of waypoints that the robot autonomously determines in real-time: upon reaching a waypoint, the robot sets the next one at a distance of 1 m. This approach is chosen for two main reasons. Firstly, botanists generally divide the area into multiple 1 × 1 m2 blocks for detailed surveys. Secondly, not pre-selecting the waypoints allows the robot to adjust to any variations in debrits size.
Mediterranean Deciduous Forests: methods for Annex I habitat 9210*
The data were primarily collected in the La Verna forest, located within the “Foreste Casentinesi” National Park, and part of the N2000N as the Special Area of Conservation (SAC) IT5180101 – “La Verna – Monte Penna.” This area is situated in the Apennine region of Italy and is characterized by its montane landscape, which includes dense beech forests. The forested area is primarily composed of habitat 9210*, known as Apennine beech forests with Taxus and Ilex, which are particularly significant due to their high species diversity and the presence of taxa with important conservation value. The vegetation in La Verna forest is dominated by beech (Fagus sylvatica) trees, with a well-developed understory that includes sporadic shrubby individuals of Abies alba and an almost continuous herbaceous layer rich in species. The fieldwork for this study was timed to coincide with the flowering season of target nemoral understory species, which serve as indicators of the conservation status of habitat 9210*, to ensure that the most relevant phenological stages of the indicator species were captured.
Fieldwork was conducted from April 27th to April 28th, 2022, during the first campaign, and from May 2nd to May 6th, 2023, during the second campaign. The days selected were determined by taking into account the examined indicator species’ blooming season, which runs from March to May.
Habitat 9210* – Apennine beech forests with Taxus and Ilex
This habitat is of prioritary interest according to the habitat Directive and includes termophilous beech forest, which occur in the submontane belt and show regression into the montane belt whit an oceanic bioclimate (from meso to supratemperate) rich in spring flowering geophytes. The vegetation is dominated by beech (Fagus sylvatica L. subsp. sylvatica) trees, forming dense forests with a well-developed understory. Characteristic species include Taxus baccata L. and Ilex aquifolium L. These forests are significant for their ecological functions, including soil stabilization, carbon sequestration, and providing habitat for a diverse range of flora and fauna. The structure and composition of the forest can vary depending on factors such as altitude, soil type, and forest management practices, which influence the presence and abundance of typical and associated species within the habitat52.
In the forest habitat 9210*, we concentrated on four species: Anemonoides nemorosa (L.) Holub, Corydalis cava (L.) Schweigg. & Körte, and Anemonoides ranunculoides (L.) Holub as Typical Species (TS), and Doronicum columnae Ten. as an Early Warning Species (EWS).
The TS belonging to the genera Anemonoides and Corydalis are vernal sciaphilous geophytes that share a high tolerance for dense summer shade and a preference for alkaline to slightly acidic substrates, typical of habitat 9210*53. Their presence is closely associated with unmanaged and ancient forests, where stable shade conditions support their growth and persistence54. In contrast, Doronicum columnae, the only EWS, tolerates light in the understory and also contributes to the formation of megaforb-rich forest edges55. Its presence may then indicate canopy openings, which can result from anthropogenic disturbances in the habitat.
For the study, two different kind of plots were selected. Smaller plots were chosen in areas free from obstacles, with flat, even terrain, devoid of trees, to conduct preliminary floristic surveys that served as a testbed for evaluating the feasibility of robot deployment. Subsequently, larger plots were designated for comprehensive structural surveys and also to gather data for the floristic survey, as of interest of this data descriptor. The monitoring procedure was designed to replicate typical botanist fieldwork, with circular plots of approximately 200 m2 (radius = 8 m) being selected, a commonly used size and shape for monitoring forest habitats56.
For the data acquisition during the autonomous mission phase, the robot followed a predefined grid pattern to capture images and videos of the indicator species. This grid was composed of waypoints arranged in a bottom-right to top-left configuration, with each waypoint spaced 1 m apart. This methodology was chosen for two primary reasons: first, botanists typically divide the survey area into multiple 1 × 1 m2 blocks for detailed analysis, and second, allowing the robot to autonomously place waypoints ensures adaptability to any variations in terrain slope.
Data Annotation
To the best of the Author’s knowledge, labeled datasets for object detection relating to habitats 2110, 2120, 6210*, 8110, 8120, and 9210* are not available. Once the necessary data had been retrieved, a group of plant scientists who are experts in the target habitat carried out labeling. Images have been acquired through the robot’s teleoperated cameras and by human operators with multiple hardware. Part of the images have also been recovered from botanical archives. The robot acquisitions have been concentrated between mid-April 2022 and late July 2023. Acquisitions carried out by operators have been distributed over a longer period and also cover different areas. Following data collection, expert botanists annotated all of the human and robot-collected images using bounding box annotations, with the aid of online tools such as Labelbox and Roboflow, and offline tools such as ModifiedOpenLabelling.
The dataset described in this data descriptor comprises four datasets for dunes, grasslands, forests, and screes. A class in the context of object detection refers to a distinct category or type of object that the model is trained to recognize and differentiate. In this dataset, each class represents a specific plant species relevant to the labeled environments. The complete dataset encompasses the 19 classes corresponding to TS, NS, IAS, or EWS for each habitat. It encompasses a total of 10264 images with a resolution of 640 × 640, containing 61238 annotations, yielding an average of 5.97 annotations per class. There is a significant class imbalance, which stems from the fact that the datasets were developed separately for each habitat. Some images without bounding boxes were added for each habitat acting as true negatives, constituting around 7% of the total dataset. The number of images and annotations for each habitat and the relative classes and species are the following, as per Table 2:
-
Mediterranean Coastal Dunes: 2891 labeled images, with 24378 annotations as shown in Fig. 3 and Fig. 4, across six classes representing Ammophila arenaria subsp. arundinacea, Thinopyrum junceum, Achillea maritima, Pancratium maritimum, Eryngium maritimum, and Carpobrotus acinaciformis.
Fig. 3 Overview of the Mediterranean Coastal Dunes dataset’s labels. Top left corner, a bar chart that shows the instances per class is displayed. Top right, an overlapping triangle plot representing bounding box dimensions distribution is displayed. Bottom left, a heatmap showing the distribution of detections across the image space is presented. Bottom right, a heatmap showing the distribution of bounding box dimensions (width and height) is displayed. In the .yaml file, the class Calamagrotis arenaria is associated with the species Ammophila arenaria subsp. arundinacea.
-
Grasslands: 2091 labeled images, with 11397 annotations as shown in Fig. 5 and Fig. 6, across three classes representing Asphodelus macrocapus, Anacamptis morio, and Dactylorhiza sambucina.
Fig. 5 Overview of the Grasslands dataset’s labels. Top left corner, a bar chart that shows the instances per class is displayed. Top right, an overlapping triangle plot representing bounding box dimensions distribution is displayed. Bottom left, a heatmap showing the distribution of detections across the image space is presented. Bottom right, a heatmap showing the distribution of bounding box dimensions (width and height) is displayed.
-
Alpine Screes: 2725 labeled images, with 11988 annotations as shown in Fig. 7 and Fig. 8, across six classes representing Geum reptans, Ranunculus glacialis, Saxifraga bryoides, Cerastium spp, Papaver alpinum, and Luzula alpino-pilosa.
Fig. 7 Overview of the Alpine Screes dataset’s labels. Top left corner, a bar chart that shows the instances per class is displayed. Top right, an overlapping triangle plot representing bounding box dimensions distribution is displayed. Bottom left, a heatmap showing the distribution of detections across the image space is presented. Bottom right, a heatmap showing the distribution of bounding box dimensions (width and height) is displayed. In the .yaml file, the class Luzula alpino-pilosa corresponds to the species Luzula alpinopilosa, and the class Saxifraga corresponds to Saxifraga bryoides.
-
Mediterranean Deciduous Forests: 2557 labeled images, with 13475 annotations as shown in Fig. 9 and Fig. 10, across four classes representing Anemonoides ranuncoloides, Anemonoides nemorosa, Doronicum columnae, and Corydalis cava.
Fig. 9 Overview of the Mediterranean Deciduous Forests dataset’s labels. Top left corner, a bar chart that shows the instances per class is displayed. Top right, an overlapping triangle plot representing bounding box dimensions distribution is displayed. Bottom left, a heatmap showing the distribution of detections across the image space is presented. Bottom right, a heatmap showing the distribution of bounding box dimensions (width and height) is displayed.
Due to the procedure followed, each image could include one of the relative target species, but also some images act as a true negative showing only the background. If there is a label and so at least one instance of a TS, NS, IAS, or EWS is included, also other target species instances may be present