Premium accounts now available! Sign up and create a premium account. Read more Close

Advertisement

Image

KozakExplorer: an interactive framework for genome-wide Kozak sequence analysis

Preprint Created on 27 Jun 2026 bioRxiv

Translation initiation signals shape gene expression across all domains of life. In eukaryotes, nucleotide constraints surrounding the start codon are commonly described by the Kozak Consensus Sequence (KCS), whereas in bacteria and archaea, initiation frequently involves Shine--Dalgarno ribosome-binding motifs. Although these signals have been extensively characterized in model organisms, their large-scale diversity and evolutionary distribution remain incompletely explored. We present KozakExplorer, a reproducible framework for quantitative and comparative analysis of translation initiation contexts from genome assemblies and annotations. The software performs strand-aware extraction of start codon environments from FASTA and GFF3 files and applies information-theoretic metrics---including Kullback--Leibler (KL) divergence and information content (IC)---to measure positional nucleotide constraints relative to a background model. Derived summary statistics (Kozak Strength Index [KSI], maximum information content, peak position) convert motif patterns into interpretable per-genome signatures suitable for cross-species comparison. Our primary analysis covers 2,282 eukaryotic reference genomes, producing a standardized dataset of translation initiation metrics. Dimensionality reduction via t-SNE on per-position KL divergence, information content, and motif nucleotide frequencies reveals a structured eukaryotic KCS landscape with kingdom-level clustering and continuous variation in signal strength. A dedicated case study of 216 Apicomplexa genomes shows genus-level structure consistent with host range and phylogeny. An extended analysis across 25,344 reference genomes (22,253 bacteria, 809 archaea) places eukaryotic patterns in a global comparative framework, revealing transitions between sharply localized Kozak motifs and distributed Shine--Dalgarno-type signatures. Implemented within the open-source Sequana ecosystem, KozakExplorer is distributed as a Python module and an interactive web application that accepts local annotated assemblies, GenBank records, or NCBI RefSeq accessions, and exports all computed metrics, embeddings, and coordinates for downstream comparative and evolutionary genomics.

Cokelaer, T., Santi, A. M. M., Pipoli da Fonseca, J., Spaeth, G. F.

Advertisement

Stats

  • Recommendations n/a n/a positive of 0 vote(s)
  • Views 13
  • Comments 0

Recommended by

  • No recommendations yet.

Post a comment

You need to be signed in to post comments. You can sign in here.

Comments

There are no comments yet.

Advertisement