Premium accounts now available! Sign up and create a premium account. Read more Close

Advertisement

Image

trAIt: Species-by-Trait Data Retrieval using Large Language Models

Preprint Created on 25 Jun 2026 bioRxiv

Biological research often requires information about species' traits. Manual literature collation can be time-consuming and miss parts of the literature. To address this gap, we developed trAIt, a publicly available software for the retrieval of characteristics of species from scientific literature catalogued in the Europe PubMed Central (PubMed) database. trAIt provides a graphical user interface in which users specify species and characteristics of interest. Leveraging a large language model (LLM), trAIt retrieves relevant papers, combines their content through a consensus-based summarization model, and outputs a species-by-characteristic table. For a case study involving frog species, trAIt recovered 47.1% of trait-species combinations in 2.75 hours, while an expert curator independently recovered 62.4% over months. The consensus-based summarization substantially aids accuracy compared to single-source extraction. Across three case studies of vertebrate taxa, an expert confirmed the accuracy of 70.9% of trait-species entries recovered by trAIt. We observed considerable variation across taxa in trAIt's accuracy, which is possibly due to heterogeneity in open-access literature availability and inconsistencies in species and trait terminology. In sum, our analysis suggests that LLM-based tools can accelerate biological data synthesis but should be used to support domain experts' research, rather than replace their judgment.

Balaji, S., Martinson, K. A., Schellenberger, J. S., Koley, J., Inman, C. M., Hofmann, H. A., Young, R. L., Harpak, A.

Advertisement

Stats

  • Recommendations n/a n/a positive of 0 vote(s)
  • Views 3
  • Comments 0

Recommended by

  • No recommendations yet.

Post a comment

You need to be signed in to post comments. You can sign in here.

Comments

There are no comments yet.

Advertisement