ELIXIR CZ Friday Coffee 2021 – RepeatExplorer Galaxy Server: Tools for Genome Assembly Annotation

Speaker: Petr Novák from Biology Centre of the Czech Academy of Sciences

RepeatExplorer Galaxy Server provides several tools for discovering and annotation of repetitive elements. Most notably, these include tools such as RepeatExplorer2 or TAREAN pipelines, which can be used to characterize repetitive sequences directly from unassembled sequence reads. Several other tools on the RE server can be also used to annotate repeats in genome assemblies.

One of these tools that is suitable for annotation of assemblies is the tool DANTE. DANTE annotate and classifies transposable elements based on the similarities of their sequences to conserved protein domains in the REXdb database. Because REXdb’s lineage-based classification system is universally applicable to elements from phylogenetically distant taxa, it also allows comparative studies of LTR -retrotransposon composition across a wide range of species. In addition, application of DANTE to assembled genomic sequences can improve annotation of repetitive DNA and reveal lineage-specific distribution patterns of LTR -retrotransposons along assembled chromosomes.

The second tool that can be used for whole genome assembly annotation is the “Repeat Annotation Pipeline”. This tool uses the results of the RepeatExplorer2 pipeline as the basis for creating a species-specific library of repetitive sequences. This library can be further manually curated and then used to annotate whole genome assemblies. Comparison of genome annotation based on the “Repeat Annotation Pipeline” and annotation based on RepeatExplorer2 from unassembled reads can be used as an indirect criterion for completeness of genome assemblies.

The results of DANTE and the “Repeat Annotation Pipeline” can be visualized directly on the RepeaExplorer Galaxy Server using the Jbrowse genomic browser.  All tools are available on https://repeatexplorer-elixir.cerit-sc.cz/

Practical examples of how to use the discussed RepeatExplorer tools on the Galaxy server will be presented in a subsequent Genome annotation & Galaxy Large Data Handling online workshop» on November 4, 2021, from 2:00 pm to 4:00 pm.

