Top-down biology

Laboratories are slow and expensive. It takes weeks to iterate and costs thousands. Data is poor. Noise is abundant. Signal is weak.

We desire some high-level outcome — a new phenotype, a treatment for some disease, a higher yield — and we push downwards into the exploding complexity of the underlying mechanisms to find a path.

This is top-down biology. But what if we instead did biology from the bottom up?

In silico biology

We now enter the world of in silico experiments — simulating biochemistry in software. But biochemistry exists in the layers between particle physics and medicine. It’s the messy middle where we aren’t quite sure of anything.

We have approximations and heuristics, but we’re mired in complexity and uncertainty. Without a clear model, simulating biochemistry is really hard.

With enough data, we can train models (e.g. AlphaFold), but we don’t usually have enough data. And the data we do have is heavily biased, noisy, and difficult to represent.

So we go lower.

Simulating biology from the bottom up

Let’s simulate life from the underlying physics. Laws of motion, thermodynamics, and particle interaction. This is biology from the bottom up.

We have reduced the complexity by expanding the scale — trillions of tractable computations replacing uncertain abstracted models. We’re following every particle and every step in time and keeping track of the overall system.

But we’re limited by compute.

Even with the power of modern accelerated hardware, we still have orders of magnitude more floating point operations than we can process in a reasonable time.

But we’ve circumvented the uncertainty of dynamics and imprecision of measurement with a gargantuan computing task. We’ve substituted base pairs with bits.

Now we can approximate and find structure. Then we can begin to optimise.

Transforming bioengineering into software engineering

The biggest breakthroughs in bioengineering and healthcare will come from exploiting the complex multi-level feedback loops of regulatory networks and electrochemical interactions.

In vitro experiments are simply too slow and expensive to explore that search space. This is partly why it takes hundreds of scientists decades of work and billions of dollars to develop new drugs. And it’s why those drugs mostly suck.

We need drastically higher temporal and spatial precision to be able to actually find signal in those systems. We probably need better hardware for that. 

But then we can use that high-precision data to actually model the cell.

Once we have somewhat reliable models of cellular biology, we can create a data flywheel between the in silico and the in vitro, decreasing the simulation error over time.

The computability of a cell

If there’s enough information in particle physics + DNA to produce all the cellular behaviour we observe, then the limit is really computation. As far as I can find, everyone considers this an impossible problem. This is a strong signal that we’re missing something.

What bounds do the laws of physics (and Chaos Theory and Information Theory) place on the computability of cellular biology? What is an upper bound on the FLOPs needed to simulate the molecular dynamics of biological cells?