Premium accounts now available! Sign up and create a premium account. Read more Close

Advertisement

Image

fastder: fast, annotation-agnostic detection of expressed regions from RNA-seq coverage and splicing data

Preprint Created on 26 Jun 2026 bioRxiv

Features from RNA-seq experiments are usually counted using a curated annotation (GENCODE, Ensembl, etc) as reference. This approach constrains RNA-seq counts to what the annotation considers a valid gene, transcript or exon, and misses transcription outside them. Moreover, reference annotations are known to be incomplete, and inadequate in some experiments altering splicing, and some diseases. To circumvent these constraints, annotation-free detection of expressed regions aims to detect the actual regions being expressed in a sample or samples. For large scale usage, the recount3 resource holds uniformly processed coverage tracks and splice junctions for over 8,000 human and over 10,000 mouse RNA-seq studies, with several hundred thousand samples in total. This opens an opportunity for scalable tools to call expressed regions that are annotation free. We present fastder, a C++ tool to detect expressed regions directly from recount3 coverage and splice junctions files, with no read alignment steps. It also can run on local raw reads, after aligning them with STAR, making it suitable to analyze species beyond human and mouse. fastder calls expressed regions by bump hunting expression coverage files, and stitches them into spliced multi-exon structures using the splice junction data. On simulated RNA-seq with unannotated expression features, fastder calls exons at a base-level accuracy on par with derfinder and golden standard for the task. Exon precision improves with sequencing depth. We compare fastder's performance against a coverage-only baseline, derfinder and groHMM. fastder extends the functionality of those by assigning strands to the expressed regions, and by also producing multi-exon regions. It runs about 15 to 20 times faster than derfinder and groHMM, within bounded memory. We showcase its use with two recount3-derived examples, including the recovery of cryptic exons from a TDP-43 knockdown experiment; and the GTEx data clustering based on the topology of the expressed regions alone, without taking the expression levels into account. fastder makes annotation-agnostic detection of expressed-regions fast enough to run at recount3 scale. As limitation, it detects expressed regions and the junctions between them, not different isoforms from the same gene.

Lehmann, M., Kitak, T., Mallona, I.

Advertisement

Stats

  • Recommendations n/a n/a positive of 0 vote(s)
  • Views 0
  • Comments 0

Recommended by

  • No recommendations yet.

Post a comment

You need to be signed in to post comments. You can sign in here.

Comments

There are no comments yet.

Advertisement