You are applying Design-Build-Test-Learn-cycles? Then you want to hear about self-driving laboratories!

Willi Gottstein
8 min readSep 30, 2022

--

While I do not remember the first paper I ever read anymore, I certainly still remember the one which made me laugh the most and also the one that had the biggest impact on me: “Next-Generation Experimentation with Self-Driving Laboratories”

Ever since I read this article, I have been fascinated by this topic — the acceleration of scientific discovery through autonomous experimentation platforms — the opportunities this approach creates and the scientific and technical challenges that come with it. Here, I collect a couple of thoughts on this subject based on discussions I have had over the past years.

Self-driving Labs as the perfect implementation of a Design-Build-Test-Learn-cycle

In engineering disciplines, so-called Design-Build-Test-Learn-cycles (DBTL-cycles) are rather popular. An iterative approach is required to engineer a system with desired properties, because prediction models are usually lacking. Added to this is the fact that the design space is large, and interactions between individual components is (typically) nonlinear (see Kevin Dunn’s article on “Why product development is difficult”).

One typically starts with an educated guess about the best (in-silico) design which can be based on prior data, expert knowledge and/or simulations (Design phase). This initial design then needs to be physically assembled/implemented (Build phase) and evaluated with respect to predefined Key Performance Indicators (KPIs, Test phase). The generated data can be used to link the inputs back to the obtained outputs (Learn phase) which will serve as a basis to create new designs for the subsequent round of experiments. Running such a cycle for a couple of rounds will then (hopefully) lead to a product with the desired properties — if the problem is feasible given the available inputs.

Self-driving labs follow the same concept and aim at automating this cycle entirely. The Build and Test phases are facilitated by a robotics platform while the Learn and Design phases are carried out by the AI/ML framework. If one agrees that there is currently no better concept to solve difficult engineering tasks than the DBTL-cycle, SDLs are the logical choice to implement them!

Key considerations when setting up a self-driving lab (SDL)

SDLs can be applied to any task for which a clear objective function exists, where the experiments can be executed on a robotics platform and where the KPIs can be extracted in an automated manner. While the required experimental set up is heavily problem specific, there exist a couple of items that should be considered for any problem: Targets, Measurements, Levers, (Meta)Data, Design creation and Workflow orchestration.

  1. What is the objective function, what are you trying to achieve?

Typically, one has to deal with a multi-objective function, meaning that several KPIs have to be optimized simultaneously. It is crucial to get clear guidelines on what the desired values for each KPI are, otherwise it will be hard to define an objective function that can be used in the optimization process. For certain problems, these values can also be in a certain range, e.g., that KPI1 must be larger than a certain value and KPI2 must be within a specific range. This requires good communication within the team, to make sure that requirements by product developers are properly translated into quantifiable KPIs that allow a construction of a mathematically sound objective function.

2. How can the KPIs be extracted, how can the product quality be evaluated?

Once the KPIs and their desired values are clear, one has to think about how to evaluate the performance of the product i.e. how to quantify the desired KPIs. Depending on the final application, one might have to mimic the actual production conditions on a smaller scale so that the system can be examined on a robotics platform; the advantage is that throughput is usually higher at smaller scale and the costs of assembly and testing can be significantly lower than on production scale. That comes with the downside that the performance in the miniaturized setting is not necessarily representative of what one observes on larger scales. Creating conditions on a small scale that are representative for production scale is usually not trivial; it is generally advisable to start with the end in mind: what are my production conditions and how can they be mimicked on a smaller scale? That approach will be more successful than creating something on a small scale and then trying to scale it up.

If one expects experiment-to-experiment variation, one should take controls along that can be used to normalize datasets to make them comparable across runs and to track whether performance improves over time.

3. Which inputs can be used, what are the actionable items that can be tweaked to reach my target?

If the objective function is clear and the desired KPIs can be read out, one also has to find out about the inputs that can be tweaked: What are the actionable inputs that can be varied to reach the desired KPIs? Here it is important to get clarity on all constraints that might exist, e.g. in formulation applications, there might be ingredients that have to be present in the final product at certain minimal or maximal quantities, the cost of raw materials cannot be larger than a certain threshold and so on. The better the problem is defined the easier it is to avoid running costly experiments that are not relevant for the final application.

4. Which (meta)data do I need to develop my product?

Another key element of a SDL is a proper data infrastructure that allows, ideally, a programmatic and automated data transfer from the robotics platform into a database which is then the primary source of the ML framework. Important is that not only data, but also all relevant metadata are stored. Data without context are rather meaningless. A proper data model that allows to capture all data and metadata will be essential as proper data infrastructure is the backbone of a self-driving lab. Furthermore — as data generated by machines typically cannot be stored directly in a database — custom parsers and data transformers will be required that convert the machine output into formats defined by the data model. When setting up the data infrastructure, data scientists should already be heavily involved to make sure that all data are stored, — and can also be retrieved in a straightforward manner! — in order to build ML models. If done properly, the data can also be used several years from now and will drastically speed up the development of similar products.

5. How can I create a new design automatically?

The ML framework should allow the generation of new designs based on existing data whereby a multi-objective function is optimized with input that can be discrete and/or continuous. As ideally all relevant historical data are used to train the models, one needs to take potential experiment-to-experiment-variation into account, typically through normalization with respect to a control that is taken along in each experiment.

As with all optimization problems, one also needs a strategy on how to avoid getting stuck in local minima and how to explore the design space efficiently. A common approach in the SDL literature is Bayesian Optimization (BO), a method often applied to black-box problems for global optimization. Some literature on BO can be found below. The optimization algorithm will suggest the best experiments to run next which will give the highest probability of success for achieving the targets set in the objective function.

6. Which processes do I need to orchestrate, how to set up a workflow manager?

If all the above elements from above are in place, one needs a good workflow manager in place that orchestrates the data generation on the robots, the (meta)data capture, their processing and storage, the design generation by the ML framework and the generation of the input required for the robots to start a new round of experiments. While full automation is desired, it is also essential to allow users to check the quality of generated data and design suggestions; a manual intervention and revision should always be possible.

Multidisciplinary, well communicating teams are key for success

Given the complexity of such a project, one needs a good and diverse team to tackle it. In a corporate setting, one needs people who can translate customer requests into quantifiable KPIs. One needs a team of lab automation experts that can set up the assays to read out the desired KPIs and data engineers to develop a data model and store the generated data in a structured way so that data scientists can programmatically retrieve them to build the required ML models. Cloud and software architects are needed to set up the workflow manager and cloud infrastructure to deploy databases, models and the workflow manager itself. Communication is — as for most teams — the key for success: understanding what exactly the customer is asking for, translating those requests into KPIs that can be quantified in the lab, setting up the required assays, making sure that the data generated can be used to build ML models and discussing the design suggestions by the ML framework — all of this will only work if people communicate well and often.

(Is there a) Future of lab work?

One of the first questions that come up when discussing the SDL concept is whether experimentalists become redundant once SDLs are fully established. Personally, I don’t think so, however, the type of lab work might change: One can only automate things that have been invented, developed and implemented in a robust way. It is — at this stage — difficult for me to imagine that a SDL would invent a method like CRISPR, that a SDL decides which assays to choose for a specific use case, that it develops a new assay by itself. But once the method/assay is developed, robust enough to be executed in a reliable manner, robots can take over the execution of the repetitive work. What I expect to happen is that experimentalists will primarily work on the development of new methods, make them robust and implement them on a robotics platform. Lab work will therefore not disappear at all, but will become more focused on development and not so much on the execution.

Further reading

As mentioned above, the starting point for me was the excellent review paper by Florian Häse, Loïc Roch and Alán Aspuru-Guzik. Other, also more general articles were published e.g. by Milad Abolhasani’s lab , e.g. “Universal self-driving laboratory for accelerated discovery of materials and molecules”, Martin Seifried et al., and Benji Murajama’s lab. A literally self-driving lab was built in Andy Cooper’s lab; the paper can be found here and a nice video that shows the robot in action is available here. Other great applications can be found e.g. in Christensen et al., MacLeod et al., and Caramelli et al. to name just a few.

Bayesian Optimization (BO) is frequently used in the papers related to autonomous experimentation. There are a lot of resources out there to familiarize oneself with the topic, e.g. the distill created by Agnihotri and Batra and Roman Garnett’s BO book which could be good starting points as a general introduction. Javier Gonzalez’ website has links to a large number of useful resources on BO and he published fantastic papers on e.g. Batch and Preferential Bayesian Optimization. There are several frameworks out there that can be used to implement BO, e.g. Phoenics, Gryffin (see also the article here) and BoTorch which can be used on Ax.

Of course, this list here is far from exhaustive, but might serve as a starting point to dive into this exciting field of autonomous experimentation which has also been adopted in corporate settings. Exciting times ahead of us!

Thanks a lot Kevin Dunn and Aschwin Schilperoort for reviewing the article and for the valuable suggestions on how to improve it!

--

--

No responses yet