Archaea have proven to be major players in biogeochemical cycles across diverse ecosystems, yet we still see an underrepresentation of archaeal genomes in the datasets used by popular computational biology tools. Here we present ArchaeaHQ, a quality-controlled, systematically curated reference database of 21,644 archaeal genomes compiled initially from 35,993 assemblies from all four archaeal kingdoms retrieved from NCBI: Methanobacteriati (Euryarchaeota), Thermoproteati (TACK), Nanobdellati (DPANN), and Promethearchaeati (Asgard). All genomes in the database passed standardized quality control, requiring [≥]70% completeness and [≤]10% contamination. A total of 44.2% of genomes in ArchaeaHQ achieved [≥]90% completeness, while 93.1% exhibited [≤]5% contamination. ArchaeaHQ comprises 16,199 metagenome-assembled genomes (MAGs; 74.8%) and 5,445 isolate genomes (25.2%). Approximately 75% of MAGs are assigned to 17 ecologically meaningful categories based on sampling origin, and around 65% of genomes include geographic metadata. ArchaeaHQ is available at https://doi.org/10.6084/m9.figshare.32266599 and provides an analysis-ready reference set for metagenomic classification, biogeochemical and ecological studies, comparative genomics, and development of archaeal-specific bioinformatic tools.
Bespiatykh, D., Leao, P.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 65
- Comments 0
