Sponges (phylum Porifera) are early-diverging metazoans that play central ecological roles and serve as models for understanding animal evolution. However, their associations with diverse microbial communities increase the risk of contamination in publicly available datasets, potentially compromising downstream biological inference. Despite growing genomic resources, systematic assessments of contamination in sponge genome assemblies have been lacking. Here, we present a comprehensive contamination analysis of 30 publicly available sponge genome assemblies and introduce a reproducible and easily adoptable decontamination pipeline tailored to non-model organisms. Using this framework, we provide decontaminated versions of the analysed assemblies. The pipeline integrates three complementary lines of evidence: compositional outlier detection based on k-mer profiles and GC content, protein-level taxonomic classification using DIAMOND, and nucleotide-level classification with Kraken2. Scaffolds are designated as contaminants when supported by at least two independent signals. Pipeline performance was validated using a realistic spike-in dataset composed of bona fide sponge sequences and representative contaminant genomes. The decontamination pipeline achieved 96.8% accuracy, 99.6% precision, and 90.8% recall, maintaining consistently strong recall across the vast majority of analyzed taxa. In addition, taxonomic assignments were accurately resolved to the genus level for 96.3% of identified contaminants. Application to public assemblies revealed variable contamination. On average, 14.5% of scaffolds per assembly were classified as contaminants, although they represented a low fraction of the total genome length, indicating that contamination is concentrated in relatively short scaffolds. Detected contaminants were dominated by bacterial phyla commonly associated with sponge microbiomes, including Pseudomonadota, Chloroflexota, and Poribacteria, with additional archaeal, protozoan, algal, and fungal sequences. Importantly, the number of complete BUSCO orthologs remained virtually unchanged following contamination removal, indicating minimal loss of genuine host scaffolds. Taken together, our study provides 30 curated sponge genome assemblies and a consensus-based decontamination framework tailored to non-model organisms, improving the reliability of genomic resources for evolutionary, ecological, and functional analyses.
Bodulic, K., Vlahovicek, K.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 3
- Comments 0
