Study area
The data presented here encompass the entire Kingdom of Bavaria, then an independent nation; now, with some territorial changes, the largest federal state in Germany. This area includes a substantial territory (76,770 km2, of which 24,560 km2, or 31.99%, are woodland), almost twice the size of the Netherlands, and a variety of ecosystems, representing 21 of the 24 landscape types currently described by the German Federal Agency for Nature Conservation (https://www.bfn.de/daten-und-fakten/biogeografische-regionen-und-naturraeumliche-haupteinheiten-deutschlands). These include smaller parts of the Austrian Limestone Alps as well as left-bank Rhine regions, such as the Palatinate Forest adjacent to the Vosges Mountains along the French-German border. In 1845, the area was predominantly agricultural and pre-industrial with an overall population density of 57.8 inhabitants per km2 21.
Data processing (Overview)
Data processing (see Fig. 2) encompassed the following steps, which are described in more detail below:
-
Source discovery and historical contextualization
-
Digitization and creation of machine-readable texts
-
Datafication
-
Transformation and publication
Data Processing Workflow (Overview). This diagram illustrates the workflow used to create the Historical Animal Observation Records by Bavarian Forestry Offices (1845) dataset, from the discovery of the archival sources to the publication of the final data in public repositories. The top row describes each processing step, while the bottom row outlines the resulting outputs.
Figure 3 shows an entry of a sample record in its original form from the Tegernsee (Salforste) office, describing the occurrence of the Eurasian lynx (Lynx lynx). The text is presented in both the original language and its English translation:
Sample Response. This image is part of the Tegernsee (Salforste) report (cf. Figure 4 for the full-page image), documenting the presence of the lynx. The transcribed original language, along with its English translation, is provided in the text. Source: BayHStA Zool. Staatssammlung, 217, p. 14.
Luchs: “Seitdem /: im Jahre 1826:/ ein ganzes aus den alten u. zwey jungen Luchsinnen bestehendes Gehecke am Hirschberge ausgerottet worden, sind nur noch Einzelne dieser Raubthiere diese in den hiesigen Gebirgen erschienen, früher zahlreich, jedoch immer nur periodisch”
Lynx: “Since /: in 1826:/ an entire enclosure consisting of the old and two young female lynxes on the Hirschberg was eradicated, only individuals of these predators have appeared in the local mountains, formerly numerous, but always only periodically”
Source discovery and historical contextualization
This study is based on historical documents from 1845 containing animal observations recorded by Bavarian forestry offices. Foresters used predefined, standardized forms sent to them to document the presence or absence of 44 listed animal species within their districts, as well as fish species in local waters. In some cases, foresters provided additional information on species beyond those listed on the forms. This source is particularly valuable due to its comprehensive coverage of all of Bavaria at a specific point in time—a rarity among archival sources. Observations were documented by a total of 119 forestry offices across the eight Bavarian government districts. The documents are organized into ten files, with one file for each of the eight government districts (Upper Bavaria, Lower Bavaria, Upper Franconia, Middle Franconia, Lower Franconia, Upper Palatinate, Palatinate, Swabia-Neuburg), a file for the Salforste region—a forested area located in the border region between Germany and Austria, historically managed as a shared resource between Bavaria and Austria—and a summary for all of Bavaria. Together, these documents comprise approximately 520 pages.
The documents are housed in the Bavarian State Archives as part of the record group for the Zoological State Collection (https://www.gda.bayern.de/service/findmitteldatenbank/Kapitel/0ea38d12-d425-4b3e-a497-7b6830f439e1). This record group, spanning the period from 1827 to 1990, consists of 1,442 archival units. These units include documents related to the organisation, administration, and personnel matters of the Zoological State Collection, along with records of acquisitions, inventory catalogues, professional staff correspondence, and scientific research. The animal occurrence records completed by the forestry offices were originally held in the Zoological State Collection in Munich and were transferred to the Bavarian State Archives in 2013.
The context of these documents is deeply intertwined with the complex history of the Zoological State Collection. Established in 1827 as an independent public research institution by order of King Ludwig I, the Zoological Collection was initially separated from the scientific collections of the Bavarian Academy of Sciences. However, it soon became dependent on Ludwig Maximilian University, which had to transfer its zoological collection to the state cabinet. The collection’s director, or conservator, was typically the university’s Professor of Zoology. Simultaneously, the “General Conservatory of State Scientific Collections” was founded, intended to secure the collections’ independence. However, this agency’s effectiveness was limited by the Academy’s influence, as its president also served as the General Conservator. As a result, the zoological cabinet existed in an ambiguous organizational and personnel structure, with its scientific staff primarily sourced from the university, but required approval from the state collection administration, which was heavily reliant on the Academy’s resources.
The valuable dataset presented here originated from a commission by Bavarian Crown Prince Maximilian II, who tasked the zoologist Johann Andreas Wagner with mapping the distribution of Bavaria’s most significant animal species22. Wagner, who had been Professor of Zoology at the University of Munich and Deputy Curator of the State Zoological Collection since 1836, worked diligently to expand the collection and made notable contributions to zoological taxonomy20. To fulfil the prestigious royal commission and ensure a comprehensive survey of Bavaria’s animal population, Wagner decided to involve the royal forestry offices22. These offices were likely selected because the foresters were not only expected to possess a thorough knowledge of local wildlife but were also regarded as qualified experts due to their academic training.
In 1845, the 119 royal forestry offices, then subordinate to the Bavarian Ministry of Finance, were issued a directive from the Ministry, mandating that they systematically record the animal species present in their respective forestry districts. The motivation behind the survey is evident from the official letter: foresters were explicitly asked to document the animals from a “scientific perspective” (cf. Figure 4). This initiative arose during a period of rapid scientific and economic development. Concurrently, there was a growing awareness of the relationship between humans and nature, with particular interest in the systematic and descriptive documentation of the natural world. While the geographical distribution of plants had already been well studied, Wagner observed that zoology still lagged behind in this regard23.
Royal Order and Tegernsee Response. Left: Royal order from 13 August 1845: “Zur Kenntniß der geographischen Verbreitung der Thier- und Baumarten in Bayern — vom wissenschaftlichen Standpunkt aus gesehen — wird gewünscht, daß die in hinreichender Anzahl beigefügten Schemata, Litera A. und B., von den K. Forstämtern ausgefüllt werden.” (For an understanding of the geographical distribution of animal and tree species in Bavaria—from a scientific perspective—it is desired that the attached forms, Litera A and B, be completed by the Royal Forestry Offices.) Source: BayHStA, Zoologische Staatssammlung, 208. Right: Completed first page of the survey form from the Salinenforstamt Tegernsee. Source: BayHStA Zoologische Staatssammlung, 217.
The survey responses, comprising more than 5,400 entries in short prose, offer insights not only into species occurrence but also into habitats, their changes, and the effects of human activity on wildlife populations. For example, the survey contains the last recorded sightings of the beaver (Castor fiber) in Bavaria before its extinction around 186724. Wagner anticipated the beaver’s impending extinction, attributing it to particularly lucrative but officially prohibited hunting22.
The survey results formed the basis for a map visualizing animal distribution in Bavaria, as commissioned by the Crown Prince25. Andreas Wagner also presented these findings in a lecture at the Bavarian Academy of Sciences on 21 February 1846, where he had been a member since 1835. This lecture was subsequently published in Gelehrte Anzeigen, a journal issued by members of the Bavarian Academy of Sciences22.
Digitization and creation of machine-readable texts
It is typical for many historical sources found in archives of societies that the raw data of the 1845 survey consist of paper-based archival files in which handwritten texts represent the information of interest. To enable further processing, we created digital images of these original documents following the German Research Council (DFG) recommendations26, specifically uncompressed TIFF images with a resolution of 300 ppi (pixels per inch) as master files to allow future reuse and upcycling27. We then transformed these image-based raw data into machine-processable texts through a two-step process: first, layout recognition (identifying regions and script baselines), and, second, content transcription using AI-based Handwritten Text Recognition (HTR) via the pre-trained Transkribus model “The Text Titan”, without additional custom training28. This automated process produced an initial Character Error Rate (CER) of approximately 7%, which was significantly improved through manual post-correction by trained palaeographers. The texts were transcribed “as is”—preserving the original language, maintaining spelling errors, and leaving abbreviations unresolved—to keep the transcriptions as faithful to the originals as possible.
Datafication
These raw texts were organized by forestry office name and taxon name, creating a total of 5,467 entries for 44 predefined species, along with additional species reported by foresters. We interpret this processing of a non-digital artefact into computable codes as datafication27, whereby data of interest were extracted from textual content that was originally unstructured and not formatted for direct use. Instead, the information followed a free-text format, adhering to the instructions provided in Wagner’s questionnaire (presented here in English translation):
“List of animal species of whose existence and place of residence information is desired. Note: If the distribution is limited to certain localities only, the nearest village or locality should be included in parenthesis. In the case of birds, only those which breed in the district itself or which spend the winter there, or which are currently the subject of hunting, should be listed. – It should also be stated whether the species is common or rare.”
We applied the following rules for datafication:
-
1.
Metadata (per district): For each of the 119 forestry districts, the location of its office seat was researched and geographically coded. Example: “Forstverwaltung Deggendorf” → office seat Deggendorf, Lower Bavaria → longitude = 12.9603, latitude = 48.8348. Sources: survey headers, State Directory Kingdom of Bavaria 184529, and OpenStreetMap (OSM).
-
2.
Species names: Mapping of the predefined 44 species names and an additional 91 individually reported, partly historical, names to their scientific names and taxonomy (see Tables 1, 2). The trivial names and Latin names used in the historical dataset serve as the central point of integration for the data into the service infrastructures of biodiversity informatics, e.g., for the analysis of historical species distributions or changes in biodiversity patterns over time. However, for the interoperability of the data, it is necessary to understand the species concepts used and to link them to recent concepts using stable persistent identifiers. Beyond clarifying the concepts, this also avoids the problem of orthographic variants of species names during data integration. We used the checklist infrastructure of the German Federal Agency for Nature Conservation (BfN) for the taxonomic annotation of the data (https://checklisten.rotelistezentrum.de/api/public/swagger-ui). On the one hand, it provides the taxonomic and geographical coverage necessary for the annotation of forestry data. On a second level, the infrastructure allows taxonomic concepts to be precisely referenced using persistent identifiers, providing the basis for the semantic annotation of the taxon names used30. Example: “Halbente” → Knäkente → anas querquedula. Sources: survey, checklist (e.g., https://checklisten.rotelistezentrum.de/api/public/1/taxon/24804); historical literature and dictionary in rare cases (e.g., https://www.dwds.de/wb/dwb/halbente).
Table 1 Species Overview (mammals). Table 2 Species Overview (birds and reptiles). -
3.
Text: Text is transcribed character by character as is in the original source including abbreviations and potential misspelling. Unsure transcriptions are marked as ‘#…#’. A significant amount of entries repeat the previous line by using ‘ditto’ words or signs, or they aggregate entries by using curly brackets. Occurrence data have been drawn from the line referred to by the signs or brackets in these cases. Example: “Lynx: Kömmt nicht vor; Wild cat: dito; Wolf: dito” yields occurrence = 0 for all three species (Deggendorf, Lower Bavaria).
-
4.
Occurrences: Binary classification (absence or presence) of occurrences per species and district. This basic classification of occurrence also caters for the fact that the responses by the forestry offices were quite diverse in scope and quality. On the lower end of the scale of detail, an entry could simply read “[species] does not exist here” or even only a strike-through of the species names (both classified as ‘0’) while on the upper end, detailed information about habitat or specific location could have been given. In cases of reference to additional species or more comprehensive reporting such as specific time of observation, a new entry has been created. Source: survey.
-
a.
Classification as 1: We could draw from the texts that the reporting office was sure or at least provided good reason that a specimen of this species had recently (in 1845) been seen within the boundaries of its district. We did not distinguish the type of the animal’s visit (e.g. resident or migratory bird); nor have statements about quantity and abundance (yet) been datafied (e.g. rare occurrence, close to extinction, or regular). Where such information is available, future research may extract it from the texts given in the dataset. Example: “The hamster is found only very rarely.” (Freysing, Lower Bavaria).
-
b.
Classification as 0: There is no or no justifiable evidence for an observation. Example: “The bear is not present.” (Freysing, Lower Bavaria). This also covers the few cases in which an occurrence is reported to be situated outside a natural environment (in game parks or pheasantries). Example: “The pheasant can only be found in pheasantries” (Bayreuth, Oberfranken).
-
a.
-
5.
Time: We generally assume all reports refer to the year 1845 or only few years earlier if not otherwise stated. Where otherwise stated, the given date has been coded and used as time of observation. In case of the latter, a new data entry has been created (see above). Examples: “has not occurred since 1710, when the last one was seen in the former Rehau district and caught in the Sparneck district” — > creates two entries, one for 1710 (classified ‘1’), one for 1845 (classified ‘0’). Source: survey.
-
6.
Location: We generally assume all occurrences are located within a default radius of 20 km from the seat of the forestry office.
Transformation and publication
Finally, all data were transformed using a standardized terminology based on the Access to Biological Collection Data (ABCD) Schema (http://www.tdwg.org/standards/115). This transformation can also be seen as “valorisation”13—creating value and impact from the historical sources, the knowledge they contain, and the historian’s expertise required for their mobilisation. We have published the data under a CC BY licence in two formats
-
1.
A dataset uploaded to the Global Biodiversity Information Facility (GBIF) and published via the GBIF data services and the GBIF-hosted portal Lebendiger Atlas der Natur Deutschlands (LAND, https://land.gbif.de). Here, the data can be used within the given framework and its retrieval and visualization tools, and in the broader contexts of the entire GBIF database, allowing for spatial and temporal analysis as well as species comparisons15.
-
2.
A dataset uploaded to the online repository Zenodo, available for download for in-depth analysis using statistical and text mining tools and Geographic Information Systems (GIS). This format also supports cross-disciplinary applications14.