Written by Cameron Ferris, PhD, Co-Founder and COO, Inventia Life Science
Drug discovery depends on models. They determine what we test, how we measure outcomes, and how much confidence we place in early signals. Over time, those early modeling choices shape which programs progress and which ones stop.
I have seen programs move forward with a level of confidence that, in hindsight, was not well supported by human-relevant evidence. That is not a reflection on the teams involved. Most teams are thoughtful and rigorous. The issue is structural. Our default systems often reward forward momentum before we have strong confirmation that the underlying biology will translate.
New Approach Methodologies, or NAMs, include human-relevant in vitro and in silico systems such as advanced 3D models, organ-on-chip platforms, and computational approaches. Their value is not that they are new. Their value is that they can introduce human-relevant evidence earlier in the discovery process, when decisions still have leverage.
In The Scientist Research Roundtable, Rethinking the Role of Animals in Molecular Biology, I joined Joseph Wu (Stanford), Donald Ingber (Harvard Medical School and Boston Children’s Hospital), Shannon Mumenthaler (Ellison Medical Institute), and Xiling Shen (MD Anderson Cancer Center) to discuss the limitations of animal models and what it will take for NAMs to produce reliable, actionable evidence.
My key takeaway was this: the main challenge is not whether we can build sophisticated human models, it is whether we can generate decision-grade data early enough to meaningfully influence which programs move forward.
Why Animal Models Can Create Late Confidence
Drug discovery is a sequence of decisions made under uncertainty. That uncertainty is unavoidable.
Difficulties arise when uncertainty persists longer than we acknowledge. A target is selected. Screening begins. Chemistry advances. Over time, investment accumulates. By the time more definitive evidence is required, the conversation is often shaped by sunk cost as much as by biology.
In many organizations, animal studies function as a key governance checkpoint. They sit at a point in the pipeline where significant resources have already been committed. That positioning shapes how their results are interpreted.
Several points from the roundtable discussion highlighted these limitations.
Joseph Wu pointed out that what appears to be statistical power in animal studies may still represent a single genetic background. As he noted, “Even if we were to do a thousand mice with and without the drug, in reality, it’s N equals one (N=1) because you’re testing one single-inbred strain.” That can be appropriate for certain mechanistic questions, but it does not reflect the diversity a therapy will face in patients.
Xiling Shen highlighted throughput as a practical constraint. In discovery, testing large numbers of compounds, doses, and biological contexts in animals quickly becomes impractical. As he put it, once you try to capture diversity, “that’s going to be an astronomical amount.” In practice, this often leads to narrower studies than we would ideally design if throughput and cost were not limiting.
None of this means animal models lack value. It does mean we should be realistic about what kinds of decisions they are best suited to support.
There is strong interest in NAMs today. It is tempting to evaluate them based on how closely they resemble human tissue or how many cell types they include. In my view, that is not the most important question.
The more relevant question is whether the model supports decision-grade evidence for a defined context of use.
In practice, “decision-grade” means a few specific things:
Many groups can build a complex model once. Fewer can run it routinely in a way that produces comparable data week after week. That operational reliability is what determines whether a model changes decision-making.
Shannon Mumenthaler made an important point about the diversity of NAMs. As she said, “These new approach methodologies really span the range of technologies that differ across throughput and biological fidelity.” That diversity is useful, but it requires discipline in selecting the right system for the question being asked.
For most discovery teams, the goal is not maximum complexity but sufficient complexity. The model should express the relevant phenotype while remaining scalable and reproducible.
Higher-fidelity systems may be appropriate for later-stage questions where the decision is narrow and the cost of error is high. Early discovery, however, depends on iteration. Models need to be complex enough to capture meaningful biology while still supporting replication and comparison.
If a system cannot be run with appropriate controls and replication, it will not generate durable confidence.
Donald Ingber clarified that the value of advanced systems is not simply that they resemble human organs more closely. As he noted, “You can control every parameter individually and get insight into mechanisms that you can’t do in vivo, in animals or humans.”
That experimental control can improve interpretability. Instead of observing an outcome in a complex organism, you can isolate specific variables and test causal relationships more directly.
As programs move from ranking compounds to understanding mechanism or anticipating liabilities, this level of control becomes more important. The key is to apply complexity where it strengthens a decision, rather than assuming that more complexity is inherently better.
Resistance to NAMs is often described as philosophical. In practice, the barriers are usually operational.
Moving from 2D to 3D systems changes workflows, assay design, and data analysis. That retooling requires time and coordination. If downstream assays are not aligned, even a well-designed model will struggle to gain traction.
Xiling Shen also pointed to incentives. Established animal workflows are familiar and widely accepted. When a NAM produces results that differ from an animal study, acting on that information can feel risky within an organization. That dynamic is cultural as much as scientific.
For NAMs to move upstream, they must be more than scientifically compelling. They need standardized workflows, shared benchmarks, and clear context of use. Teams should not have to validate the entire system themselves before relying on it.
From my perspective at Inventia, the priority is turning advanced 3D biology into decision infrastructure. With RASTRUM, we aim to make 3D model generation reproducible at scale, so scientists can produce decision-grade data, replicated, benchmarkable, and comparable across plates, timepoints, and teams. That allows human-relevant models to move upstream in discovery, not because they’re more complex, but because they guide real prioritization decisions.
The broader question for the field is not whether a NAM can replace a specific animal study at the end of a pipeline. A more meaningful measure is how early reliable human-relevant evidence can be introduced into decision-making.
When these systems are used upstream, they make it easier to stop weaker programs earlier and to support stronger ones with greater confidence. That shift is less about technology preference and more about improving how we govern scientific risk.
If we can introduce better human evidence earlier, we improve the overall quality of decisions in biomedical R&D. That, in my view, is the opportunity.