Contents
The AlphaFold Protein Structure Database has for the first time added protein complex structures to its collection, marking the most significant expansion of the resource since it launched in 2021. The update went live on March 15, 2026 and includes 1.7 million high-confidence predictions of homodimers — complexes formed when two identical protein copies bind together. The structures were produced through a four-way collaboration between EMBL's European Bioinformatics Institute (EMBL-EBI), Google DeepMind, NVIDIA, and Seoul National University.
Overview
Since its public launch in 2021, the AlphaFold Protein Structure Database has become the backbone of modern structural biology, providing free, open access to predicted three-dimensional structures for hundreds of millions of individual proteins. Until now, however, the database contained only single-chain (monomer) predictions — snapshots of individual proteins folding in isolation.
Biology rarely works that way. Most proteins carry out their functions by physically associating with other proteins, forming complexes that range from simple pairs to enormous multi-subunit assemblies. The addition of 1.7 million homodimer structures on March 15, 2026 is the database's first step toward capturing that real-world complexity, and it signals a broader strategic shift toward modelling how proteins interact rather than how they fold alone.
The update was coordinated at database scale by EMBL-EBI, powered by Google DeepMind's AlphaFold modelling system, accelerated on NVIDIA hardware, and informed by computational expertise from Seoul National University.
What Changed
Prior to the March 15 update, every entry in the AlphaFold database represented a single protein chain. The new release introduces a separate, dedicated collection of complex structures, initially populated exclusively with homodimers — two copies of the same protein chain bound together.
The 1.7 million new entries were selected on the basis of high-confidence prediction scores and biological relevance. Each entry follows the same open-access model as the existing database: freely downloadable coordinate files, confidence scores per residue, and integrated links to other biological databases.
| Feature | Before March 15, 2026 | After March 15, 2026 |
|---|---|---|
| Structure types | Monomer (single chain) only | Monomer + protein complexes |
| Complex coverage | None | 1.7 million homodimers |
| Collaborating institutions | EMBL-EBI, DeepMind | EMBL-EBI, DeepMind, NVIDIA, Seoul National University |
| Access model | Free & open | Free & open (unchanged) |
The database interface at alphafold.ebi.ac.uk has been updated with new search and filtering tools specifically designed to query complex structures.
Understanding Homodimers
A homodimer is a protein complex made up of two identical subunits (protomers) that associate non-covalently. Roughly one in three known protein structures exists natively as a homodimer, making them the single most common type of protein complex in structural biology databases.
Homodimerisation is functionally significant across many biological contexts:
- Enzyme catalysis: Many enzymes only become catalytically active when dimerised, bringing catalytic residues from each subunit into proximity.
- Transcription regulation: Numerous DNA-binding proteins — including many zinc-finger and helix-loop-helix factors — bind their target sequences as homodimers.
- Receptor signalling: Receptor tyrosine kinases frequently homodimerise upon ligand binding as the first step in intracellular signal transduction.
- Structural roles: Collagen, a primary structural protein, depends on higher-order assemblies whose building blocks include homodimerised subunits.
Because homodimerisation is so pervasive, predicting these structures reliably unlocks a large fraction of biology that was previously inaccessible from sequence alone.
Protein complexes are at the heart of virtually every cellular process. Making these structures openly available at scale is the logical next step after solving the monomer folding problem.
Four-Way Collaboration
The expansion was produced through an unusually broad institutional partnership, with each partner contributing a distinct component:
EMBL-EBI
EMBL's European Bioinformatics Institute hosts and maintains the AlphaFold Protein Structure Database. EMBL-EBI curated the set of target sequences for complex prediction, managed data integration with its wider suite of biological databases (including UniProt, PDBe, and InterPro), and built the updated database interface.
Google DeepMind
DeepMind's AlphaFold system — the same AI that won the 2020 Critical Assessment of Structure Prediction (CASP14) challenge — was adapted to predict multimeric assemblies. The version used for this dataset builds on AlphaFold-Multimer methodology, extended and refined for database-scale production runs. DeepMind provided the core prediction pipeline and confidence scoring.
NVIDIA
Generating 1.7 million complex-structure predictions at publication quality required massive computational throughput. NVIDIA provided the GPU infrastructure and optimised inference software that made database-scale prediction economically and practically feasible. NVIDIA's involvement reflects the growing role of accelerated computing in large-scale bioinformatics production workloads.
Seoul National University
Researchers at Seoul National University contributed expertise in protein complex modelling and validation methodology. Their group provided independent assessment of prediction quality and helped define the confidence thresholds used to filter the 1.7 million high-confidence structures from a broader set of predictions.
Scientific Significance
The AlphaFold database is already used by an estimated 2 million researchers worldwide. Adding complex structures expands the database's utility across several high-priority research areas:
Drug Discovery
Many drug targets are proteins that function as dimers. Anti-cancer therapies, antiviral drugs, and treatments for metabolic disease frequently aim to disrupt or modulate homodimeric interfaces. Accurate structures of these interfaces dramatically accelerate the rational design of small-molecule inhibitors and biologics.
Protein Engineering
Synthetic biology and protein design pipelines benefit directly from understanding how homodimers pack together. Researchers engineering novel enzymes, biosensors, or therapeutic proteins can now use predicted dimer geometries as starting templates.
Fundamental Biology
Thousands of proteins whose monomer structures are already in the database have unknown complex geometries. The new entries illuminate how these proteins assemble, which residues form the dimer interface, and how disease-associated mutations might disrupt that interface.
| Research Area | Previous Limitation | New Capability |
|---|---|---|
| Drug discovery | Interface geometry unknown for most targets | 1.7M dimer interfaces now openly available |
| Structural biology | Monomer-only predictions | Complex structures included for the first time |
| Protein engineering | Manual experimental determination of dimer geometry | Predicted dimer templates free at scale |
| Computational biology | Limited training data for complex-aware models | Large open dataset of high-confidence complex structures |
What Comes Next
The addition of homodimers is explicitly framed as a first step. The AlphaFold database roadmap, as outlined in the March 15 announcement, points toward expanding complex coverage to include:
- Heterodimers: Complexes formed by two different protein chains, which account for a substantial portion of protein-protein interactions in the human proteome.
- Higher-order assemblies: Trimers, tetramers, and larger oligomers that underpin many enzymatic and structural functions.
- Protein-nucleic acid complexes: Structures in which proteins bind DNA or RNA — critical for understanding gene regulation and developing nucleic-acid-targeting therapies.
The collaboration also carries implications for NVIDIA's positioning in scientific AI. As biological datasets grow from millions to billions of structures, GPU-accelerated inference pipelines become essential infrastructure for every major data-producing institution in the life sciences.
For the structural biology community, the March 15 update represents a shift in what a protein structure database is expected to contain. The bar has moved from individual chains to functional assemblies — and the AlphaFold database, backed by some of the world's most capable computing and AI institutions, appears positioned to keep raising it.