Technology
Science & Technology

AlphaFold Database Adds Protein Complex Structures for the First Time

A landmark expansion adds 1.7 million high-confidence homodimer predictions — produced by a four-way collaboration between EMBL-EBI, Google DeepMind, NVIDIA, and Seoul National University — to the world's most-used structural biology resource.

Last updated:March 18, 2026

The AlphaFold Protein Structure Database has for the first time added protein complex structures to its collection, marking the most significant expansion of the resource since it launched in 2021. The update went live on March 15, 2026 and includes 1.7 million high-confidence predictions of homodimers — complexes formed when two identical protein copies bind together. The structures were produced through a four-way collaboration between EMBL's European Bioinformatics Institute (EMBL-EBI), Google DeepMind, NVIDIA, and Seoul National University.

Overview

Since its public launch in 2021, the AlphaFold Protein Structure Database has become the backbone of modern structural biology, providing free, open access to predicted three-dimensional structures for hundreds of millions of individual proteins. Until now, however, the database contained only single-chain (monomer) predictions — snapshots of individual proteins folding in isolation.

Biology rarely works that way. Most proteins carry out their functions by physically associating with other proteins, forming complexes that range from simple pairs to enormous multi-subunit assemblies. The addition of 1.7 million homodimer structures on March 15, 2026 is the database's first step toward capturing that real-world complexity, and it signals a broader strategic shift toward modelling how proteins interact rather than how they fold alone.

The update was coordinated at database scale by EMBL-EBI, powered by Google DeepMind's AlphaFold modelling system, accelerated on NVIDIA hardware, and informed by computational expertise from Seoul National University.

What Changed

Prior to the March 15 update, every entry in the AlphaFold database represented a single protein chain. The new release introduces a separate, dedicated collection of complex structures, initially populated exclusively with homodimers — two copies of the same protein chain bound together.

The 1.7 million new entries were selected on the basis of high-confidence prediction scores and biological relevance. Each entry follows the same open-access model as the existing database: freely downloadable coordinate files, confidence scores per residue, and integrated links to other biological databases.

FeatureBefore March 15, 2026After March 15, 2026
Structure typesMonomer (single chain) onlyMonomer + protein complexes
Complex coverageNone1.7 million homodimers
Collaborating institutionsEMBL-EBI, DeepMindEMBL-EBI, DeepMind, NVIDIA, Seoul National University
Access modelFree & openFree & open (unchanged)

The database interface at alphafold.ebi.ac.uk has been updated with new search and filtering tools specifically designed to query complex structures.

Understanding Homodimers

A homodimer is a protein complex made up of two identical subunits (protomers) that associate non-covalently. Roughly one in three known protein structures exists natively as a homodimer, making them the single most common type of protein complex in structural biology databases.

Homodimerisation is functionally significant across many biological contexts:

  • Enzyme catalysis: Many enzymes only become catalytically active when dimerised, bringing catalytic residues from each subunit into proximity.
  • Transcription regulation: Numerous DNA-binding proteins — including many zinc-finger and helix-loop-helix factors — bind their target sequences as homodimers.
  • Receptor signalling: Receptor tyrosine kinases frequently homodimerise upon ligand binding as the first step in intracellular signal transduction.
  • Structural roles: Collagen, a primary structural protein, depends on higher-order assemblies whose building blocks include homodimerised subunits.

Because homodimerisation is so pervasive, predicting these structures reliably unlocks a large fraction of biology that was previously inaccessible from sequence alone.

Protein complexes are at the heart of virtually every cellular process. Making these structures openly available at scale is the logical next step after solving the monomer folding problem.
EMBL-EBI, March 2026 announcement

Four-Way Collaboration

The expansion was produced through an unusually broad institutional partnership, with each partner contributing a distinct component:

EMBL-EBI

EMBL's European Bioinformatics Institute hosts and maintains the AlphaFold Protein Structure Database. EMBL-EBI curated the set of target sequences for complex prediction, managed data integration with its wider suite of biological databases (including UniProt, PDBe, and InterPro), and built the updated database interface.

Google DeepMind

DeepMind's AlphaFold system — the same AI that won the 2020 Critical Assessment of Structure Prediction (CASP14) challenge — was adapted to predict multimeric assemblies. The version used for this dataset builds on AlphaFold-Multimer methodology, extended and refined for database-scale production runs. DeepMind provided the core prediction pipeline and confidence scoring.

NVIDIA

Generating 1.7 million complex-structure predictions at publication quality required massive computational throughput. NVIDIA provided the GPU infrastructure and optimised inference software that made database-scale prediction economically and practically feasible. NVIDIA's involvement reflects the growing role of accelerated computing in large-scale bioinformatics production workloads.

Seoul National University

Researchers at Seoul National University contributed expertise in protein complex modelling and validation methodology. Their group provided independent assessment of prediction quality and helped define the confidence thresholds used to filter the 1.7 million high-confidence structures from a broader set of predictions.

Scientific Significance

The AlphaFold database is already used by an estimated 2 million researchers worldwide. Adding complex structures expands the database's utility across several high-priority research areas:

Drug Discovery

Many drug targets are proteins that function as dimers. Anti-cancer therapies, antiviral drugs, and treatments for metabolic disease frequently aim to disrupt or modulate homodimeric interfaces. Accurate structures of these interfaces dramatically accelerate the rational design of small-molecule inhibitors and biologics.

Protein Engineering

Synthetic biology and protein design pipelines benefit directly from understanding how homodimers pack together. Researchers engineering novel enzymes, biosensors, or therapeutic proteins can now use predicted dimer geometries as starting templates.

Fundamental Biology

Thousands of proteins whose monomer structures are already in the database have unknown complex geometries. The new entries illuminate how these proteins assemble, which residues form the dimer interface, and how disease-associated mutations might disrupt that interface.

Research AreaPrevious LimitationNew Capability
Drug discoveryInterface geometry unknown for most targets1.7M dimer interfaces now openly available
Structural biologyMonomer-only predictionsComplex structures included for the first time
Protein engineeringManual experimental determination of dimer geometryPredicted dimer templates free at scale
Computational biologyLimited training data for complex-aware modelsLarge open dataset of high-confidence complex structures

What Comes Next

The addition of homodimers is explicitly framed as a first step. The AlphaFold database roadmap, as outlined in the March 15 announcement, points toward expanding complex coverage to include:

  • Heterodimers: Complexes formed by two different protein chains, which account for a substantial portion of protein-protein interactions in the human proteome.
  • Higher-order assemblies: Trimers, tetramers, and larger oligomers that underpin many enzymatic and structural functions.
  • Protein-nucleic acid complexes: Structures in which proteins bind DNA or RNA — critical for understanding gene regulation and developing nucleic-acid-targeting therapies.

The collaboration also carries implications for NVIDIA's positioning in scientific AI. As biological datasets grow from millions to billions of structures, GPU-accelerated inference pipelines become essential infrastructure for every major data-producing institution in the life sciences.

For the structural biology community, the March 15 update represents a shift in what a protein structure database is expected to contain. The bar has moved from individual chains to functional assemblies — and the AlphaFold database, backed by some of the world's most capable computing and AI institutions, appears positioned to keep raising it.

External Links

Related Coverage