Starting from 4HHB, could you predict which hemoglobin mutations would increase or decrease oxygen affinity?
This post contains content not supported on old Reddit. Click here to view the full post
This post contains content not supported on old Reddit. Click here to view the full post
A new Science paper explores a pretty wild idea: can life function with fewer than the standard 20 amino acids?
The authors targeted isoleucine, which is chemically similar to leucine and valine. They did not make a fully 19-amino-acid organism, but they did redesign one of the most essential systems in E. coli: the ribosome.
Using protein language models, structure prediction, and generative design tools, they removed all 382 isoleucines from E. coli ribosomal proteins. The engineered strain was viable and stable for hundreds of generations.
The caveat is: the rest of the E. coli proteome still contains thousands of isoleucines. So this is more like a first proof-of-concept than a true 19-AA lifeform.
In humans, purely theoretically, how many amino acids could we remove from the proteome with enough redesign?
Paper: Toward life with a 19–amino acid alphabet through generative artificial intelligence design
This post contains content not supported on old Reddit. Click here to view the full post
This matters because virtual screening often depends heavily on which protein structure is used. A ligand-bound holo structure, an apo structure, a homology model, or an AlphaFold-predicted model can all give different screening results.
PL-PatchSurfer3 tackles this by comparing local surface patches between the ligand and receptor pocket using 3D Zernike descriptors. The new version adds improved hydrogen-bond complementarity and a visibility feature that captures local curvature. The authors report that this improves performance while keeping the method robust across holo, apo, modeled, and AlphaFold-predicted receptor structures.
For AI protein workflows, the key point is practical: predicted structures are increasingly used in drug discovery, but they are not always in the right binding conformation. Methods like PL-PatchSurfer3 may help make virtual screening more reliable when starting from imperfect or AI-predicted protein models.
Paper: Hidden structural states of proteins revealed by conformer selection
Repo: AISAR
The core idea is to combine AI-generated conformational sampling with NMR data. Instead of relying only on one predicted structure, AISAR generates realistic alternative conformers and then scores them against NOESY and other NMR observables.
The key result is that AISAR revealed hidden structural states in multiple proteins. In Gaussia luciferase, the method identified two interconverting states involving major rearrangements of lids, binding pockets, and cryptic surface cavities. It also found two distinct conformational states in CDK2AP1, a human tumor suppressor protein.
The broader takeaway: AI structure prediction becomes more powerful when paired with experimental data. AISAR suggests a route for mapping dynamic protein states, including cryptic pockets that may matter for function or drug discovery.
Paper: RareFold: Structure prediction and design of proteins with noncanonical amino acids
Repo: RareFold
Most protein design models are still built around the 20 natural amino acids. RareFold pushes beyond that limit by supporting 49 amino acid types in total, including 29 rare/noncanonical residues. The key finding is that these expanded chemical building blocks can be handled directly by the model, opening the door to protein and peptide designs with more chemical diversity.
The authors also introduce EvoBindRare, a binder design framework that can generate both linear and cyclic peptide binders from a target protein sequence, without needing a predefined binding site. According to the project page, the designs were experimentally validated for both linear and cyclic binders.
This could be important for AI protein design because noncanonical amino acids can add properties that natural residues often lack, including improved stability, altered binding chemistry, and new therapeutic possibilities.
This post contains content not supported on old Reddit. Click here to view the full post
This post contains content not supported on old Reddit. Click here to view the full post
Paper: ODesign: A World Model for Biomolecular Interaction Design
Repo: ODesign
Wanted to discuss ODesign, especially in the context of models like BoltzGen, and RFdiffusion3.
The key distinction is that ODesign is closer to an “all-to-all” biomolecular design model, while BoltzGen is more like a universal protein/peptide binder design model. ODesign tries to design across multiple molecular modalities:
BoltzGen, by contrast, mainly designs protein-like binders: miniproteins, peptides, cyclic peptides, nanobodies, and antibody-like binders, against many target types.
So the difference is roughly:
BoltzGen:
“Given a biomolecular target, design a protein/peptide binder.”
ODesign:
“Given a biomolecular target/interface, design the appropriate molecular partner, potentially protein, nucleic acid, or ligand.”
That makes ODesign broader in ambition, but BoltzGen currently looks stronger on experimental validation. BoltzGen reports validation across nanobodies, miniproteins, peptides, cyclic peptides, and challenging target classes, while ODesign’s wet-lab validation appears mainly focused on protein minibinders so far, with other modality validation still pending.
Technically, ODesign is interesting because it builds on an AlphaFold3-like structure-prediction backbone. It uses unified generative tokens for different chemical modalities, then performs conditional all-atom diffusion to generate coordinates. After that, an inverse-folding/type-design module assigns amino acids, nucleotides, or ligand atom types depending on the modality. The clever part is the masking system. ODesign can mask at different levels:
That lets it handle tasks like binder design, motif scaffolding, ligand-binding protein design, aptamer-like design, and ligand generation in one framework.
Compared with other models:
RFdiffusion3 is probably the closest “serious” competitor from the protein-design side. It is all-atom and can design proteins in the context of ligands, DNA/RNA, and other molecules, but it is still mostly about generating proteins, not freely switching between protein, nucleic acid, and ligand outputs.
I think, BoltzGen feels closer to a practical wet-lab binder design tool today.
ODesign feels like the broader future direction: a unified model for programmable molecular interaction design across modalities.
The big question is whether ODesign’s cross-modality promise will translate experimentally beyond protein minibinders. If it can actually produce validated RNA/DNA binders, ligand designs, and non-protein interaction partners, that would be a major step beyond current protein-centric design workflows.
Curious what people think: are these “world models” actually becoming useful design engines, or are we still mostly benchmarking pretty structures until the wet-lab hit rates catch up?
This post contains content not supported on old Reddit. Click here to view the full post
This post contains content not supported on old Reddit. Click here to view the full post
Paper: PXDesign: Fast, Modular, and Accurate De Novo Design of Protein Binders
Repo: PXDesign
I know some people may be skeptical because PXDesign comes from ByteDance Seed, but for general de novo protein binder design, this is honestly one of the more impressive and practical papers I’ve seen, while researching de novo design.
To be clear, I’m not talking about antibody-specific design or niche antibody engineering tasks. I mean general protein binder generation against protein targets like the red protein in the image above..
The main claim is strong: PXDesign reports 20–73% nanomolar binder hit rates across five of six tested targets, with wet-lab validation on IL-7RA, SARS-CoV-2 RBD, PD-L1, TrkA, VEGF-A, and TNF-α. It combines two parts:
The diffusion model seems to be the real workhorse. It is fast, generates structurally diverse binders, and appears better suited for large-scale exploratory campaigns than slower hallucination methods. They also put a lot of effort into filtering and ranking, comparing AF2-style filters with Protenix-based filters, and showing that Protenix often improves enrichment and ranking.
What I like most is that this is not just another “we generated nice-looking structures” paper. They actually test designs experimentally, report hit rates, compare against methods like AlphaProteo, RFDiffusion, Chai, and Latent-X, and release a benchmarking framework.
The important caveat is that this is not peer-reviewed. Also, TNF-α failed, and the authors are pretty open about limitations in filtering thresholds, dataset sparsity, and experimental throughput.
But overall, for de novo protein binder design, PXDesign looks strong. I would not treat it as a universal solution, and I would not use it as an antibody design tool, but for general binder generation it seems very reliable and worth paying attention to.
This post contains content not supported on old Reddit. Click here to view the full post
This post contains content not supported on old Reddit. Click here to view the full post
Paper: Origin-1: a generative AI platform for de novo antibody design against novel epitopes
Repo: Origin-1
The main idea is that Origin-1 designs antibodies against zero-prior epitopes, meaning target sites where there are no prior antibody-antigen complex structures or close structural templates available.
From the paper, Origin-1 is presented as a two-stage design-and-score system. The design side, AbsciGen, generates antibody-antigen complexes and designs paired heavy/light chain CDR sequences against a specified epitope. The scoring side, AbsciBind, then filters candidates using co-folding-based scoring and developability criteria before anything is tested experimentally.
In other words, the platform is not just generating antibody sequences. It is trying to generate an epitope-specific binding pose, design the antibody CDRs around that pose, and then rank/filter the designs before wet-lab validation.
In the paper, they tested the system across 10 human protein targets and report validated antibodies for 4 targets: COL6A3, AZGP1, CHI3L2, and IL36RA.
They also report structural validation for two of the designs. For COL6A3 and AZGP1, cryo-EM structures matched the designed binding modes at 3.0-3.1 Å resolution, with reported DockQ scores of 0.73-0.83.
For IL36RA, they went further and used AI-guided affinity maturation to improve the binder into a functional antagonist, reporting 104 nM potency.
The repo includes supporting study data, including in silico and in vitro results, SPR data, and generated computational models. But the model itself is not being released. This is an open data/results release around a proprietary antibody design platform, not an open-source model release with weights and inference code.
Repo: AFSample2
Paper: Improving AlphaFold2 Performance in Virtual Screens Targeting GPCRs by Enhancing Binding-Site Conformational Sampling
The AFSample2T paper is finally out, and it is a cool example of where AI protein modelling is heading.
Vanilla AlphaFold2 is great at predicting a likely protein structure, but it usually collapses toward one dominant conformation. For GPCR drug discovery, that can be limiting because the binding pocket is flexible, and small local changes can strongly affect docking and virtual screening results.
AFSample2T tackles this by using targeted MSA masking around the binding site. Instead of perturbing the whole protein, it selectively weakens the evolutionary signal near the pocket, pushing AF2 to sample more ligand-compatible GPCR conformations.
So the difference is basically:
Vanilla AF2: predicts one likely structure.
AFSample2T: samples alternative binding-site conformations for better virtual screening.
That matters because drug discovery needs more than static structures. It needs useful receptor states.
Blog: OpenBind’s first release: A structure–affinity dataset for structure-based AI
Dataset: OpenBind
The release includes 925 crystallographic binding events from 699 compounds, with affinity measurements for 601 compounds, focused on EV-A71 / CVA16 2A protease.
What makes this interesting is that it is not just another protein-ligand benchmark scraped from public structures. It is a dense experimental campaign where structures and binding measurements are linked across a single target system.
That feels pretty valuable for AI protein-ligand modelling because a lot of current methods still struggle with real structure-based design problems: receptor state choice, cross-docking, affinity trends, and whether models actually understand local SAR instead of just memorising near-neighbour structures.
They are also teasing OpenBind-1, a predictive model trained using the UK’s Isambard-AI compute cluster. The whole project is open science/open access, which makes it much more useful for benchmarking, fine-tuning, and community testing.
Repo: Pro1
I feel like it is pretty underrated, especially because it takes a different approach from a lot of protein design models.
Most tools in this space are either sequence-only, structure-prediction-first, or diffusion-based. Pro-1 is interesting because it uses an LLM-style reasoning loop with a physics-based reward signal, mainly around Rosetta stability scoring.
We have been testing it for antibody design, and the useful part is that it can reason over the sequence, structural context, prior mutations, and design goals before suggesting changes. That makes it feel less like a black-box generator and more like a guided protein engineering assistant.
The downside is that it is heavy computationally, especially if you want to run the full loop properly with structure prediction/scoring. It is not a lightweight “generate 100 sequences instantly” kind of model.
Still, I think the direction is underrated: using language models to reason through protein engineering decisions, then grounding them with physics-based scoring.
Curious if anyone else has tested it, especially for antibodies or enzyme stability.
This post contains content not supported on old Reddit. Click here to view the full post