Human Data

Human Data is a critical and rapidly expanding area within ELIXIR CZ, representing the convergence of genomics, clinical data, bioinformatics, data governance, and secure computing. As genomic and multi-modal health data increasingly shape modern medicine, research infrastructures must provide robust environments for secure storage, controlled access, federated analytics, and FAIR-compliant metadata management—while fully respecting ethical standards, European regulatory frameworks such as GDPR, the forthcoming European Health Data Space (EHDS), and GA4GH standards.

ELIXIR CZ has already evolved from supporting personalised medicine tools to addressing the broader challenge of how to securely store, manage, process, and share human genomic and clinical data at national scale. The next decade will bring further transformation, driven by large national cohorts, translational genomics, AI-driven diagnostics, and federated European data sharing networks. Human Data at ELIXIR CZ aims to ensure that Czech researchers and clinicians have access to secure, interoperable, and scalable environments at the national level that enable scientific discovery while safeguarding patient privacy and rights.

This strategic area is tightly interlinked with multiple European initiatives, especially the 1+ Million Genomes Initiative (1+MG), the Genome Data Infrastructure (GDI), and the Federated European Genome-phenome Archive (FEGA). ELIXIR CZ is well placed to align the Czech Republic with these continental infrastructures and to build a robust national ecosystem for responsible and secure use of human data in research and healthcare.

Current Situation and Identified Strengths

The Czech Republic has made significant progress in establishing the technical foundation for secure human genomic data management. Over the past years, through strategic investments and active ELIXIR CZ participation, the groundwork has been laid for three national Trusted Research Environment (TRE) data repositoriesat Charles University (UK), Masaryk University (MUNI), and Palacký University Olomouc (UPOL). Most notably, the SensitiveCloud operated by CERIT-SC has already achieved certification and is operational, providing isolated environments, controlled data flows, comprehensive audit logging, and secure deployment capabilities for AI/ML models. This mature technical backbone provides a strategic asset for building a comprehensive national human data infrastructure capable of privacy-preserving analytics and regulatory compliance.

At the same time, ELIXIR CZ has supported and maintained the development of a robust portfolio of tools and resources specifically designed for clinical research and human data analysis. Over the past years, this portfolio—including tools for clinically relevant sequencing data analysis (scdrake, Gen Seeker, chEM), variant pathogenicity prediction (PredictSNP, PredictONCO), genomic databases (CzechGenome, ACGT), and patient tracking (MDRsim)—has demonstrated ELIXIR CZ’s capability to develop analytical tools that can be applied to human data in clinical contexts. Importantly, active development teams continue to enhance these tools, with increasing focus on integrating AI/ML capabilities, while new AI-focused development teams are emerging across Czech institutions to expand these capabilities further.

A critical enabling layer for secure human data management is the Authentication and Authorization Infrastructure (AAI) developed and maintained within ELIXIR CZ. This well-established system provides secure user authentication, role-based access control, and seamless integration with European identity federations. The AAI serves as an essential component for managing access to sensitive datasets, supporting TRE governance frameworks, and enabling future interoperability with FEGA, GDI, and EHDS-compliant access procedures.

ELIXIR CZ’s active participation in major European initiatives has positioned the Czech Republic as a recognized contributor to federated genomic data infrastructure. Through involvement in the 1+ Million Genomes (1+MG) initiative and the Genome Data Infrastructure (GDI) project, Czech teams have demonstrated specific expertise in AAI and federated compute architectures. This expertise was formally recognized in 2024 when ELIXIR CZ was entrusted with leadership of the Task Force for Federated Analysis within GDI. Under this mandate, Czech teams successfully coordinated and delivered a demonstrator showcasing cross-border federated analysis of genomic data—a milestone achievement that validated both the technical approach and Czech capacity to lead at the European level.

Leveraging the technical backbone of the SensitiveCloud and expertise accumulated through previous European collaborations, ELIXIR CZ has made remarkable progress toward establishing a Czech node within the Federated European Genome-phenome Archive (FEGA). With support from ELIXIR Staff Exchanges and knowledge transfer from established FEGA nodes, Czech teams rapidly advanced through implementation milestones, successfully completing an end-to-end demonstration of FEGA ingest and discovery workflows. This achievement has positioned the Czech Republic on the threshold of formally joining the FEGA consortium, enabling participation in Europe’s secure genomic data-sharing network.

In a decisive step toward long-term sustainability, ELIXIR CZ secured dedicated funding for the development and operation of a national genomics data repository through the EOSC-aligned OpenScience 2 project. This funding specifically supports the establishment of OmiCZ (Czech Omics Node), providing a stable financial foundation for building and maintaining the national platform for human genomic and multi-omics data management over the coming years.

Together, these elements — emerging TRE infrastructure, proven analytical tools, robust authentication systems, recognized European leadership in federated analytics, near-completion of FEGA integration, and secured operational funding—establish ELIXIR CZ as uniquely positioned to deliver comprehensive human data management for the Czech Republic. This foundation enables the Czech node not only to adopt and implement European solutions but also to actively influence and co-develop standards, architectures, and workflows that will define the future of federated health data infrastructure.

Challenges and New Directions

The Human Data is uniquely challenging, combining scientific, technical, legal, organisational, and ethical dimensions. Creating a nationally coordinated platform for human genomic and clinical data requires addressing several interdependent challenges.

  1. A central challenge is the absence of a unified, TRE-based national platform for human genomic and other health data. Current activities are distributed across institutions with different policies, security practices, and infrastructure maturity levels. This limits the ability to perform federated analytics, to integrate datasets across hospitals, or to participate effectively in European infrastructures.
  2. Key challenges lie in the fragmentation of clinical data sources, inconsistent metadata annotation, lack of semantic harmonisation, and incompatible hospital IT systems. Without these components, national-level genomic and clinical data integration cannot be achieved.
  3. The deployment of AI/ML in human data analysis brings new demands for secure execution environments. Models must operate inside TREs, with strong isolation, auditability, restricted networking, and reproducible pipelines. These requirements must be coordinated with national TRE efforts, the e-INFRA sensitive cloud, and emerging TRE solutions within the European AI Factories and GDI.
  4. The Human Data must also prepare for the European Health Data Space (EHDS), which will reshape how health data is accessed, processed, and shared across EU member states. Czech infrastructures must align with EHDS requirements for trusted processors, data access request mechanisms, and federated analytics.
  5. A critical emerging challenge is the coordination and integration of AI-driven approaches across the national human data landscape. This requires mapping existing AI activities, identifying high-impact use cases, and establishing collaborative frameworks. AI methods are needed both for improved data management within TREs—enabling automated quality control and semantic enrichment—and for multimodal health data modeling that integrates genomics, imaging, and clinical records toward advanced diagnostics and precision medicine. Coordinating these efforts nationally while aligning with European initiatives will be essential to ensure Czech researchers and clinicians can fully participate in the AI-driven transformation of healthcare.
  6. Finally, sustaining a national human data ecosystem requires new governance models, dedicated personnel with expertise in sensitive data management, clear policies for consent and data sharing, and long-term financial support.

 

These challenges represent a decisive opportunity: the next decade will determine whether the Czech Republic becomes an active contributor to the European human data landscape—or remains on its periphery. ELIXIR CZ can play a central role in ensuring the former.