Genomics

Genomics is one of the foundational pillars of ELIXIR CZ, supporting a broad spectrum of research areas including biodiversity, functional genomics, population genetics, pathogen surveillance, structural genome annotation, and multi-omics integration. The ELIXIR CZ has a strong legacy of building domain-specific databases, developing analytical tools, and participating in major European initiatives such as ERGA, the European Genomes on a Tree (GoaT) project, single-cell and spatial omics communities, and collaborative repeat-annotation efforts with other ELIXIR nodes.

Over the past decade, genomics has expanded far beyond sequencing: researchers now require integrative workflows that combine genome assemblies with transcriptomics, epigenomics, chromatin structure, functional annotation including prediction of intracellular localization of predicted proteins, image data, and phenotype metadata. At the same time, the scale of biological datasets has increased exponentially, requiring FAIR data management, advanced search capabilities, and AI-driven approaches to navigate complex biological landscapes.

ELIXIR CZ aims to strengthen its role as a national and European provider of high-quality genomic data resources, analytical tools, and FAIR-compliant services. In 2026–2030, the programme will focus on consolidating its existing assets, expanding interoperability, supporting AI-ready dataset creation, and providing advanced infrastructure for integrative genomics and multi-omics research.

Current Situation and Identified Strengths

ELIXIR CZ maintains a rich ecosystem of genomic databases that serve both national and international research communities. Notable resources include AmtDB, GlobalFungi, GlobalAMFungi, HERVd, REXdb, rPredictorDB, and the integrated REPETdb–REXdb pipeline. These databases provide curated domain-specific information ranging from fungal biodiversity to repetitive elements, viral integrations, ancestral mitochondrial genomes, and RNA structure. They underpin numerous publications and are widely used for ecological modelling, evolutionary research, environmental genomics, and functional genomics.

During the COVID-19 pandemic, ELIXIR-CZ played a critical role in supporting the Czech government by operating the CoG database of coronavirus genomes, which provided a trusted and authoritative source of information for public health decision-making. The infrastructure not only managed large-scale genomic data but also performed systematic analyses and delivered biweekly reports that informed the Ministry of Health about the evolving epidemiological situation. Building on this experience, ELIXIR-CZ now aims to transform the CoG database into a broader pathogen-monitoring platform, capable of tracking diverse infectious agents and delivering actionable insights to the public health system. This evolution will ensure that genomic surveillance becomes a permanent and reliable component of national preparedness, strengthening resilience against future health crises. (Collaboration with Swiss node and locally with Ministry of Health and Czech Army).

ELIXIR CZ develops and maintains a strong suite of computational tools for genomic analysis, including ASAFind, cpPredictor, DANTE, GOLEM, PredictSNP, rboAnalyzer, RepeatExplorer, and scdrake. These tools support repeat identification, chloroplast and mitochondrial genome annotation, prediction of bipartite chloroplast targeting presequences, RNA structure analysis, variant effect prediction, and single-cell omics workflows. Many of them are unique within the ELIXIR ecosystem and are recognised internationally.

Another strength is the ELIXIR CZ’s involvement in high-profile European initiatives such as The European Reference Genome Atlas (ERGA), where it contributes to genome assembly, annotation standards, and FAIRification of biodiversity datasets. Similarly, Czech researchers are active in pathogen surveillance in collaboration with the ELIXIR Swiss Node, and in the rapidly evolving single-cell and spatial transcriptomics community.

The genomics domain benefits from strong computational support through e-INFRA CZ, allowing large-volume analyses such as genome assembly, repeat annotation, and integration of image or phenotype data. There is also strong synergy with the 2.5 Data Management & FAIRification area, particularly in metadata standardisation and semantic annotation, which are increasingly essential for AI-driven genomic analytics.

Together, these strengths give ELIXIR CZ a stable and influential position in European genomics, with capacity for further growth in AI integration, FAIR-compliant data sharing, and cross-domain analysis.

Challenges and New Directions

Despite the strong foundations mentioned above, the genomics ecosystem faces several challenges that must be addressed to enable long-term sustainability and European-level leadership.

  1. A major challenge is the preparation of genomic datasets for AI/ML applications, particularly the need for curated, standardised, and machine-actionable training datasets. Many existing databases were designed for human-driven research rather than automated inference, and require harmonisation of metadata, licensing, and semantic structures.
  2. Key challenge is the FAIRification and interoperability of genomic resources. Although Czech databases are rich and well curated, they often use domain-specific schemas that need to be harmonised with European standards (e.g., EDAM, Bioschemas, Genome Context Ontologies). Interoperability becomes crucial when integrating genomic information with environmental data, phenotypes, chemical biology datasets, or structural predictions.
  3. The fieldfaces an increasing need for access control and licensing frameworks suitable for AI/ML data use, especially when datasets may contain sensitive information or data derived from human-associated samples. Deciding what can be shared, how, and under what license will be essential for integration with AI-based retrieval systems and federated data access networks.
  4. The exponential growth of sequencing technologies produces massive volumes of raw data, requiring efficient storage, scalable compute resources, and sustainable funding models. At the same time, repeat annotation and genome-structure interpretation remain computationally intensive tasks, placing pressure on infrastructure.
  5. Integrating diverse data types — genomics, transcriptomics, image data, phenotypes, and environmental metadata — represents another frontier. Researchers increasingly expect tools that can seamlessly combine these modalities and generate biologically meaningful outputs.
  6. Finally, the genomics community requires specialised training to support the expanding methodological landscape, especially in areas like single-cell and spatial transcriptomics, biodiversity genomics, FAIR metadata preparation, and AI/ML in genomics.
  7. By developing a dedicated pathogen database, ELIXIR-CZ will extend the reach of its infrastructure beyond the traditional domains of science and education to directly support governmental and public health functions. This resource will provide authorities with timely, curated, and AI-ready genomic data that can be used for surveillance, outbreak monitoring, and evidence-based decision-making. In doing so, ELIXIR-CZ will bridge the gap between academic research and applied policy, ensuring that the same infrastructure which empowers scientists and students also serves as a trusted national platform for health security. This dual role will strengthen the resilience of the Czech Republic against future biological threats while reinforcing the societal relevance of ELIXIR-CZ as a strategic national asset.

 

Addressing ELIXIR Priorities

Integrate AI/ML: Driving the adoption of cutting-edge AI (Agent-Genomix) for genomic data retrieval, functional annotation, and trait inference by connecting three structurally different databases. This directly responds to the AI-Ready Data challenge.

Enhance Interoperability: Achieving the Harmonisation with European standards objective by enforcing EDAM and Bioschemas for the AMtDB, HERVd, and Pathogen APIs, ensuring machine-actionable data for the AI Agent.

Expand and Curate Databases: Committing to the maintenance and modernization of HERVd, AMtDB, and PathogenDB to be fully AI-ready and API-accessible resources.
Provide Advanced Training: Developing specialized training modules focused on interpretable AI and multi-omics integration using the Agent-Genomix platform, expanding expertise in a priority area.