01 October 2025

“The data never moves. All training happens inside each company’s own IT environment, under their control.”

Robin Röhm is the CEO and Co-founder of Apheris, a company delivering enterprise-grade AI applications for drug discovery. These applications run directly within pharma companies’ IT systems, ensuring sensitive data never leaves their control. At the core is Apheris’ federated computing platform, which enables AI models to be trained across proprietary datasets from multiple organizations – without the data ever being shared. This makes collaborations possible on models and benchmarks more powerful than anything a single company could achieve alone.

One of the most prominent examples is the AI Structural Biology (AISB) Network, a collaboration of leading pharmaceutical companies powered by Apheris’ platform. Within this network, the Federated OpenFold3 Initiative improves OpenFold3, an open-source structure prediction system developed by the AlQuraishi Lab at Columbia University. By training it on one of the most diverse collections of protein–ligand data – information on how proteins interact with drug-like molecules – the initiative aims to make the model more accurate at predicting these interactions.

Earlier this autumn, Astex, Bristol Myers Squibb, and Takeda joined founding members AbbVie and Johnson & Johnson, creating one of the most ambitious cross-company AI efforts in life sciences. We spoke with Robin Röhm about the significance of this collaboration, the broader implications of federated learning, and Berlin’s evolving role in the global AI ecosystem.

Hello Robin, the AISB Network brings together global pharma companies to jointly train AI models on proprietary protein–ligand data. Until now, this kind of highly valuable information has always remained locked inside individual companies, making cross-company model training impossible. What made this collaboration feasible now, and why is federated learning the enabling factor?

Two things changed. First, federated computing matured. Instead of data moving across borders, the model goes to the data and returns only privacy-preserving updates. This ensures sensitive information never leaves a company’s environment, while advanced safeguards like attack testing and computational governance protect both data and model IP. Second, the AI field itself progressed. Algorithms are now powerful and widely accessible, but the public datasets available to train them fall short for industrial drug discovery. It became clear that no single pharma company has the data volume or diversity required to build highly accurate models. Federated learning solves both problems at once: it unlocks the collective strength of distributed, proprietary datasets while ensuring every participant retains full control of their assets.

One of the biggest breakthroughs in recent years was DeepMind’s AlphaFold, which used public data to predict the 3D structures of nearly all known proteins. Yet AlphaFold doesn’t capture how proteins interact with small molecules – a key step in drug discovery. How does the Federated OpenFold3 Initiative build on this milestone, and what new scientific ground does it aim to cover?

AlphaFold2 was a scientific breakthrough because it could predict how proteins fold into 3D shapes, a problem that had challenged researchers for decades. But in drug discovery, the real question is how those proteins interact with small molecules, antibodies, or antigens. AlphaFold3 began to address that by modelling protein–ligand interactions, but it was never released for open industrial use. OpenFold3 changed that by recreating the setup in an open format, giving the community access to this capability for the first time. We're very fortunate to collaborate with the OpenFold Consortium and the AlQuraishi Lab at Columbia University, the developers behind OpenFold3. One key issue, however, remains: lack of appropriate training data. The key training data source for OpenFold3 and similar models is the Protein Data Bank (PDB), a global archive of 3D protein structures. The PDB has ~200,000 entries, but only about 10% are protein–ligand complexes, most involving natural rather than drug-like molecules. If only drug-like molecules are selected, then only 2% of the whole PDB is relevant training data. The Federated OpenFold3 Initiative is really about fixing the data problem. We start with the open-source OpenFold3 model and then fine-tune it on proprietary datasets from five major pharma companies, all without the data ever leaving their systems. By doing this, we bring in about five times more drug-like protein–ligand structures than exist in the public Protein Data Bank. That gives us, for the first time, a credible path to making these models accurate enough for real drug discovery.

The idea that sensitive data can be used collaboratively without ever leaving a company’s system sounds paradoxical to many. What technical safeguards and governance principles ensure that intellectual property remains protected in the AISB Network?

It does sound counterintuitive at first – but the key is that the data never moves. All training happens inside each company’s own IT environment, under their control. What gets shared back are only model updates, and even those are processed in a way that prevents anyone from reverse-engineering the original data. We also have a dedicated team that runs privacy attacks to identify potential risks like data reconstruction or reverse engineering before they could ever become an issue. Everything is logged, tested, and auditable. On top of that, we put as much work into governance as into the technology. Intellectual property rights are clearly defined from the start, and Apheris acts as the collaboration coordinator so companies don’t need to negotiate dozens of bilateral agreements. Taken together, it means pharma companies can contribute their most valuable datasets with confidence: they get stronger models without ever compromising ownership or compliance.

Apheris provides the technological backbone for this initiative. What has been the most challenging aspect of building a platform that some of the world’s largest pharma companies are willing to trust – and what does this say about Berlin’s ability to produce globally relevant deep-tech companies?

Earning trust from pharmaceutical companies meant going far beyond technical excellence. In the end, data is among the industry's most strategic and valuable assets. We had to prove that federated learning was not just scientifically valuable but also secure, compliant, and auditable. In the AISB Network, competitors work side by side, but they don’t need to trust each other. The technology and governance frameworks ensure that data and IP stay fully protected, with training happening locally and only privacy-preserving updates shared. Most importantly, we earned trust through years of working with demanding pharma customers, who taught us what it takes to meet InfoSec, Legal, and Compliance requirements. That combination of technology, compliance, and long-term partnership convinced global pharma to join the AISB Network. And AISB is only the beginning: we are now applying the same approach in the ADMET Network and in an Antibody Developability Consortium recently announced with Ginkgo Datapoints. While Apheris has talent across Europe, Berlin still plays an important role. It is where our company was founded, where our team meets in person, and where the broader ecosystem shows that the city can deliver not only creativity and talent but enterprise-grade tech that sets global standards.

Berlin is often described as a place where AI and life sciences increasingly converge. From your perspective, what makes the city attractive as a base for international initiatives like AISB – and what still needs to improve for Berlin to strengthen its position globally?

The AISB Network itself isn’t tied to a single location. It brings together eight global pharmaceutical companies, and Apheris is remote-first. Still, Berlin plays an important role. The city attracts international talent, offers a strong academic base, and has a vibrant startup scene that fosters exchange between disciplines. For us, it’s also where our team gathers in person, which underlines the city’s energy and ambition. What Berlin still needs to improve is scaling conditions. Compared to the US, access to late-stage capital is more limited, regulatory hurdles are higher, and attracting specialized talent at speed can be challenging. If Berlin strengthens those areas, it can firmly establish itself as a global hub where AI and life sciences converge.

Beyond structural biology, where do you see federated data networks having the greatest impact next – whether in healthcare, public health, or even domains like mobility and urban intelligence?

Federated networks are most powerful when valuable data is distributed across many players but can’t be shared because of regulation or IP concerns. That makes them particularly relevant in highly regulated industries like healthcare, pharma, or finance. In practice, pharma and healthcare are furthest ahead. The reason is that success requires more than just the infrastructure. You need to understand the data, prepare it for collaborative use, design the right models, and make sure those models are usable afterwards. That’s why we deliberately focus on a few applications and solve them end-to-end. In drug discovery, we’re already expanding beyond structural biology into ADMET property prediction and, together with Ginkgo Datapoints, antibody developability. We also have customers like Roche building real-world evidence networks on top of our platform. Beyond healthcare, areas like public health or manufacturing could also benefit, though adoption there is still earlier in the journey.

Europe emphasizes trustworthy AI and digital sovereignty. How can Berlin-based initiatives like Apheris and the AISB Network help shape global standards for secure, collaborative AI?

As AI advances, data has become the most valuable asset. It’s the fuel that determines whether models improve. Protecting it is fundamental, and Europe has made trustworthy AI and digital sovereignty central to its strategy. That sets the bar very high for working with sensitive data. It can make access more complex, but when collaborations do happen under these conditions, they are secure and globally credible. With our drug discovery networks, we show that even competitors can work together on shared AI models without ever sharing their raw data — a blueprint for how secure, collaborative AI should work. That kind of proof point helps set benchmarks for what secure, collaborative AI should look like worldwide. At the same time, Europe needs to keep balance. Protection is vital, but if rules make it too hard for AI and tech companies to grow, innovation will move elsewhere. The opportunity is for Europe to show that it’s possible to do both.

Thanks for talking to us.