Premium accounts now available! Sign up and create a premium account. Read more Close

Advertisement

Image

A complete-genome view of phylum Omnitrophota and a multi-order capacity for very long proteins

Preprint Created on 09 Jun 2026 bioRxiv

Phylum Omnitrophota (formerly candidate division OP3) is represented in public databases almost entirely by metagenome-assembled genomes; GTDB R232 contains two complete Omnitrophota assemblies. We present 176 complete and 53 high-quality Omnitrophota genomes from Oxford Nanopore metagenomes of Fennoscandian deep groundwater and the Baltic Sea water column, an 88-fold expansion of the complete Omnitrophota count. The 229 genomes resolve to 202 distinct species at 95% ANI; 162 of these have no conspecific match in the 714 NCBI HQ Omnitrophota MAGs. Phylogenomically, 171 of the 176 complete genomes fall in class Gorgyraeia - which contains the cultured episymbiont Velamenicoccus archaeovorus - and 5 in Omnitrophia, with multiple Gorgyraeia orders and families represented. Phylum Omnitrophota hosts many very long proteins, with the longest in our corpus reaching 147,155 AA; the long end of the length distribution is concentrated on Gorgyraeia contigs across multiple Gorgyraeia orders. 24-28% of GTDB-Tk-classified Omnitrophota contigs in the deep-groundwater and Baltic samples host at least one protein of 10 kAA or longer. Across the 916-protein long-protein domain-architecture catalog, 94% carry transmembrane helices or a signal peptide; the four complete-genome proteins above 100,000 amino acids are all in inner-membrane-anchored architectures, the 147,155-AA protein with 147 TM helices. The 176 complete genomes share a uniform metabolic profile across the dominant orders: intact bacterial peptidoglycan biosynthesis alongside fragmentary TCA, incomplete electron transport, absent aerobic terminal oxidase, and partial cofactor and amino-acid biosynthesis. The profile matches the cultured V. archaeovorus phenotype and is consistent with a host-dependent episymbiotic lifestyle. Hypervariable-region calling across the 229 chromosomes returns 1,909 candidate loci, distributed across 223 of them; ribosomal-protein and EF-Tu/EF-G content sits inside called HVRs on 150 of those 223 (67%), recovering across the collection the housekeeping-cargo integrations documented in Nielsen (2026b). All genomes, the OrthoFinder supermatrix and its ML tree, the 916-protein giant-protein domain-architecture catalog, and per-step scripts are released as a community resource at Zenodo (DOI [TBD]).

Nielsen, T. N., Lui, L. M.

Advertisement

Stats

  • Recommendations n/a n/a positive of 0 vote(s)
  • Views 4
  • Comments 0

Recommended by

  • No recommendations yet.

Post a comment

You need to be signed in to post comments. You can sign in here.

Comments

There are no comments yet.

Advertisement