Repository logo
Log In(current)
  1. Home
  2. Colleges & Schools
  3. Graduate School
  4. Masters Theses
  5. ADDRESSING DATA DEFICIENCIES: LICHENS OF THE PALOUSE PRAIRIE (U.S.) AND THE POTENTIAL OF LARGE LANGUAGE MODELS FOR GEOCODING
Details

ADDRESSING DATA DEFICIENCIES: LICHENS OF THE PALOUSE PRAIRIE (U.S.) AND THE POTENTIAL OF LARGE LANGUAGE MODELS FOR GEOCODING

Date Issued
December 1, 2025
Author(s)
Chandler, Amanda  
Advisor(s)
Brian C. O'Meara
Additional Advisor(s)
Charles Kwit
Jessica L. Allen
Permanent URI
https://trace.tennessee.edu/handle/20.500.14382/37220
Abstract

Data deficiency remains a conservation barrier for many organismal groups, with declines in biodiversity and ecosystem health predicted to continue. The historical focus toward groups generally considered to be more charismatic has ultimately led to a lack of data available for assessing extinction risk in lesser-known taxa, and we lack baseline taxonomic knowledge across many ecosystems. Fungi are just one example of an understudied speciose group that has more recently been gaining conservation attention. While efforts to digitize natural history collections continue to increase our overall understanding of biodiversity, this cannot directly address underlying sampling biases that skew organismal representation across physical collections. Additionally, many digitized collections accessed for conservation-related research are skeletal records that do not include latitude and longitude values to reflect where the specimen was collected from. To address some of these gaps in baseline taxonomic knowledge and physical collections holdings, 360 unidentified lichen specimens from an incomplete biodiversity inventory of Washington’s Palouse Prairie were identified to species. Digitized herbarium records were also analyzed to compare against current identifications., as well as to synthesize historical information for Palouse lichens into a referenceable document. The digitization of newly identified collections and submission to multiple herbaria increases the representation of dryland ecosystems in the northwestern U.S. and provides data for use in both local and state conservation efforts. To contribute to resolving gaps in digital collections, the capabilities of Large Language Models (LLMs) in geocoding from locality strings held in digital occurrence data were tested to gain insight on the potential use of such tools to help with georeferencing tasks. It was found for the current combination of prompts and LLMs that model selection greatly influenced the accuracy of an LLM to choose coordinates based on 500 GBIF locality strings, but that the specific prompt given made no difference. It was additionally found that when asked to perform this geocoding task when disconnected from the internet, the chain-of-reasoning reflected actions that would not be possible without an internet connection. Overall, this works adds to ongoing efforts that address data deficiencies related to natural history collections and our knowledge of biodiversity.

Subjects

digitization

lichens

Large Language Models...

biodiversity

herbaria

georeferencing

Disciplines
Other Ecology and Evolutionary Biology
Degree
Master of Science
Major
Ecology and Evolutionary Biology
File(s)
Thumbnail Image
Name

Chandler_AM_MastersThesis_15Aug.docx

Size

890.06 KB

Format

Microsoft Word XML

Checksum (MD5)

10c95d2668ff9b43e6d5f67a953d0534

Thumbnail Image
Name

auto_convert.pdf

Size

876.06 KB

Format

Adobe PDF

Checksum (MD5)

96f59d79ad84b8a83d345a3d51e48fa9

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Privacy policy
  • End User Agreement
  • Send Feedback
  • Contact
  • Libraries at University of Tennessee, Knoxville
Repository logo COAR Notify