Two decades ago, the concept of engineering designer proteins seemed like a distant aspiration.
Today, thanks to artificial intelligence (AI), personalized proteins are abundant. Tailored proteins often possess distinct shapes or constituents that confer novel capabilities not found in nature. This advancement spans from more enduring medications and protein-based immunizations to environmentally friendly biofuels and proteins capable of decomposing plastics, marking a significant shift in technology.
The design of customized proteins relies on sophisticated deep learning methods. With the emergence of extensive language models—such as the AI underpinning OpenAI’s renowned ChatGPT—generating millions of structures that surpass human imagination, the repertoire of bioactive designer proteins is poised for rapid expansion.
“It’s incredibly empowering,” remarked Dr. Neil King from the University of Washington to Nature. “Tasks that seemed insurmountable just a year and a half ago are now achievable.”
However, this newfound power brings forth substantial responsibility. As newly crafted proteins gain momentum for applications in medicine and bioengineering, scientists are contemplating the potential consequences if these technologies are misused.
An essay published in Science underscores the necessity of biosecurity measures for designer proteins. Analogous to ongoing dialogues concerning AI safety, the authors emphasize the urgency of addressing biosecurity risks and establishing policies to prevent rogue deployment of custom proteins.
The essay, co-authored by two field experts, Dr. David Baker, the head of the Institute for Protein Design at the University of Washington, who spearheaded the development of RoseTTAFold—an algorithm that resolved the longstanding challenge of deducing protein structures solely from amino acid sequences, and Dr. George Church from Harvard Medical School, a trailblazer in genetic engineering and synthetic biology.
They propose the integration of unique barcodes into the genetic sequences of synthetic proteins. In the event that any designer protein poses a threat—potentially instigating a hazardous outbreak—its barcode would facilitate tracing it back to its origin.
Essentially, this system offers a form of “audit trail,” as articulated by the duo.
Convergence of Realities
Designer proteins are intricately intertwined with AI, as are the prospective biosecurity protocols.
Over ten years ago, Baker’s laboratory employed software to design and fabricate a protein known as Top7. Proteins comprise amino acid building blocks, each encoded within our DNA. Amino acids, akin to beads on a string, are intricately folded into specific 3D configurations, often forming elaborate architectures that support the protein’s function.
Although Top7 couldn’t interact with natural cellular components, lacking any biological impact, the team concluded that designing novel proteins enables exploration of “vast regions of the protein universe yet unexplored in nature.”
Enter AI. Recent advancements have enabled the rapid design of new proteins compared to conventional laboratory methodologies.
One approach involves structure-based AI, akin to image-generating tools like DALL-E. These AI systems are trained on noisy data, learning to eliminate extraneous information to unveil realistic protein structures. Referred to as diffusion models, they progressively discern protein structures compatible with biological systems.
Another strategy leverages extensive language models. Similar to ChatGPT, these algorithms swiftly establish correlations between protein “terms,” distilling these associations into a form of biological syntax. The protein sequences generated by these models are likely to fold into structures comprehensible to the human body. An illustration of this is ProtGPT2, capable of crafting functional proteins with configurations conducive to novel properties.
Transitioning from Virtual to Physical Realms
These AI-driven protein design programs have raised concerns. Proteins serve as the fundamental components of life—alterations could significantly influence cellular responses to drugs, viruses, or other pathogens.
Last year, governments worldwide unveiled plans to oversee AI safety. The technology wasn’t portrayed as a direct threat; instead, policymakers cautiously formulated regulations to ensure research compliance with privacy laws while bolstering the economy, public health, and national defense. Spearheading this initiative, the European Union ratified the AI Act to regulate the technology in specific domains.
Synthetic proteins weren’t explicitly addressed in these regulations. This omission bodes well for advancing designer proteins, which could be impeded by overly restrictive mandates, as noted by Baker and Church. Nevertheless, new AI-related legislation is in progress, with the United Nations’ AI advisory body slated to release international regulatory guidelines later this year.
Given the specialized nature of AI systems utilized in designer protein synthesis, they may evade regulatory scrutiny if the field unites in a concerted effort for self-regulation.
During the 2023 AI Safety Summit, which broached the topic of AI-driven protein design, experts concurred that documenting the genetic foundation of each new protein is imperative. Analogous to their natural counterparts, designer proteins are also constructed from genetic codes. Maintaining a comprehensive database of all synthetic DNA sequences could facilitate the identification of potential hazards, such as proteins bearing resemblances to known pathogenic structures.
Biosecurity measures should not hinder data exchange. Collaboration is indispensable for scientific progress, yet safeguarding trade secrets remains crucial. Similar to AI, certain designer proteins may harbor potential utility but are too hazardous for unrestricted dissemination.
One viable approach to circumvent this dilemma involves integrating safety protocols into the synthesis process itself. For instance, the authors propose incorporating a barcode—comprising random DNA sequences—into each novel genetic code. During protein assembly, a synthesis apparatus scans the DNA sequence, commencing construction solely upon detecting the specific code.
In essence, the original protein designers retain control over sharing the synthesis process—with whom and to what extent—while still elucidating their findings in publications.
Implementing a barcode system that links protein synthesis to a synthesis apparatus would heighten security measures, deterring malicious actors and complicating the replication of potentially harmful products.
“If a novel biological threat emerges globally, the associated DNA sequences could be traced back to their sources,” the authors emphasized.
The journey ahead will be arduous. Ensuring the safety of designer proteins hinges on widespread support from scientists, research institutions, and governments, the authors contend. Nonetheless, past achievements offer encouragement. International bodies have established safety and collaboration guidelines in contentious domains such as stem cell research, genetic engineering, brain implants, and AI. While adherence isn’t universal—illustrated by the infamous CRISPR babies—these global directives have largely propelled cutting-edge research in a secure and equitable manner.
According to Baker and Church, fostering transparent discussions on biosecurity will not impede progress in the field. On the contrary, it can galvanize diverse sectors and foster public discourse to propel the advancement of custom protein design.