Chemical Biology within ELIXIR CZ integrates computational chemistry, structural biology, linked-data infrastructures, and AI-driven molecular modelling to support fundamental research, drug discovery, and biochemical innovation. It provides tools and databases that enable researchers to explore small molecules, ligands, metabolites, chemical reactions, and compound–protein interactions. As Europe moves toward large-scale interoperability, FAIRification of chemical data, and application of AI across the chemical space, this domain becomes indispensable for connecting molecular-level understanding with biological function.
Within the broader ELIXIR landscape, Chemical Biology contributes to the Data and Interoperability Platforms, supports the 3D-Beacons and ChEBI/ChEMBL ecosystems, and aligns with initiatives in cheminformatics, toxicology, reaction informatics, and computational design. ELIXIR CZ is well positioned to strengthen European chemical data infrastructures through its long-standing expertise in linked-data technologies, chemical semantics, ligand modelling, and virtual screening.
The next strategic period aims to establish the ELIXIR CZ as a major contributor to chemical data interoperability, automated ligand-quality assurance, AI-ready molecular databases, AI-enabled molecular property prediction, and scalable screening pipelines.
Current Strengths
ELIXIR CZ hosts and develops high-value chemical and structural data resources, particularly through its linked-data ecosystem enabling harmonised ontologies, and cross-database semantic searches (IDSM). Tools and datasets developed in ELIXIR CZ support cheminformatics, ligand analysis, docking, partial charge prediction (ACC – Atomic Charge Calculator), protonation modelling, and annotation of chemical structures embedded in biomacromolecules and interacting with biological membranes (MolMeDB).
The Chemical Biology domain benefits from a strong combination of experimental and computational expertise at IOCB. On the experimental side, advanced metabolomics and mass-spectrometry–driven discovery pipelines provide access to complex metabolic profiles and bioactive molecule identification, enriched by machine-learning–enabled interpretation of large spectral datasets. This creates a rich source of high-quality chemical and metabolic data that can be integrated into linked-data infrastructures and downstream computational workflows.
On the computational side, the infrastructure is strengthened by expertise in protein–ligand interactions, ligand-binding-site diversity, and apo–holo structural analysis. This includes the development of methods for detecting, comparing, and classifying ligand pockets across the PDB, which bridges structural bioinformatics and chemical data integration. These capabilities directly support ligand-quality assessment, binding-site annotation, and the design of AI-enhanced molecular-modelling pipelines.
Together, these competencies provide a unique and balanced foundation linking metabolomics, ligand modelling, structural analysis, and AI-driven molecular discovery within ELIXIR CZ.
ELIXIR CZ contributes to chemical data standardisation, controlled vocabularies, ontologies, and high-quality FAIR metadata. Existing tools provide automated ligand-quality checks and corrections, ensuring downstream modelling workflows (docking, scoring, molecular dynamics) are based on chemically valid structures. ELIXIR CZ also develops ACC2 for computing partial atomic charges using 20 different charge-calculation algorithms. The team has strong expertise in integrating small-molecule data with protein tunnel analysis (Caver, MOLEonline, ChannelsDB), solvent mapping, and structure-derived descriptors.
The domain maintains vibrant collaborations with structural bioinformatics, metabolomics, computational chemistry, and European chemical data communities, including users and developers working with ChEBI, ChEMBL, PDBe, 3D-Beacons, SwissLipids, and other ELIXIR-relevant resources.
New Directions & Challenges
Chemical Biology is undergoing rapid transformation powered by AI, generative models, and massive chemical data integration. As chemical datasets expand to tens to hundreds of millions of structures, researchers require new tools for AI-ready molecular representations, interoperability between diverse chemical repositories, and automated error detection in deposited ligand structures.
- A central challenge is the interoperability of linked-data chemical databases, where heterogeneity of identifiers, protonation states, tautomeric forms, membrane environments, and stereochemistry still create barriers to seamless queries. The current generation of SPARQL endpoints and APIs must evolve to support multi-database federation, harmonised ontologies, and robust cross-resource workflows that non-experts can easily use.
- Another key challenge is the quality of deposited ligand structures in the PDB, or within diffusion-generated datasets from alphafoldology tools, and other resources. Ligands often contain incorrect stereochemistry, inconsistent valence, or improbable protonation states. Automated correction pipelines are increasingly necessary to ensure clean inputs for downstream modelling, particularly for ML-driven virtual screening and compound prioritisation.
- Curated, balanced, machine-readable datasets are necessary for further AI-driven chemical modelling—especially graph neural networks, diffusion models, and transformer-based molecular encoders. ELIXIR CZ must prepare training-quality datasets with consistent chemical semantics, explicit treatment of charge states, and interoperable links to other databases, membranes, proteins, binding sites, and reaction contexts. This represents a further challenge in the chemical biology field.
Finally, the chemical space is now explored through ultra-large virtual screening, where millions to billions of generated or vendor-provided compounds must be rapidly scored, clustered, triaged and stored. The ELIXIR CZ must therefore build scalable workflows and integrate new AI models for fast property prediction, binding-site detection, and prioritisation of small molecules.
These challenges represent a moment of strategic opportunity: ELIXIR CZ can become a European leader in chemical data interoperability, AI-driven molecular modelling, and automated ligand curation and membrane accessibility.