To develop a new drug that safely and effectively targets cancer cells or the processes that cause cancer, researchers typically rely on costly, time-consuming, and high-failure experimentation with different combinations of molecules. In an age when computing dominates scientific advancement, a novel consortium—Accelerating Therapeutics for Opportunities in Medicine (ATOM,—is establishing a new paradigm for drug discovery. Lawrence Livermore National Laboratory (LLNL) joins Frederick National Laboratory for Cancer Research (FNLCR), the University of California at San Francisco (UCSF), and pharmaceutical company GlaxoSmithKline (GSK) in this effort to significantly reduce drug development time. (Image above is courtesy of the ATOM consortium.)

Jonathan Allen, informatics scientist and LLNL’s research and development manager for ATOM, explains, “Each institution brings complementary resources to the project.” GSK provides experimental data for more than 2 million compounds and deep expertise in small molecule drug development, while FNLCR and UCSF lend subject-matter expertise in oncology, chemistry, biology, computational biology, and cancer therapeutics. UCSF also provides dedicated physical space where consortium members can meet and work together. LLNL contributes supercomputing power as well as modeling, simulation, and data analysis capabilities. With these combined resources, the consortium aims to transform drug discovery into a rapid, integrated, and patient-focused process—and thus achieve better outcomes.

relationships between ATOM's web, application, and data services
Figure 1. Data services allow the ATOM development team to organize raw data, curated and model-ready data sets, performance results, simulation output, and more. The Kubernetes-based, containerized architecture supports GPU resource allocation and integration of server- and client-side application programming interfaces (APIs).

For example, cancer drugs may be more successful if tailored to specific patient groups, but drug development for small patient populations is currently not cost-effective. According to Allen, “Designing pharmaceuticals that affect specific molecular processes of tumor growth is difficult. Our computation-driven approach will fast-track the discovery of drugs that modulate a targeted disease activity. ATOM’s research will open up new drug design possibilities.”

Specifically, ATOM leverages specialized high performance computing (HPC) hardware and software to accelerate molecular analysis and scale predictive machine learning (ML) algorithms. “Combining GSK’s data with publicly available drug data yields a huge number of possible variations of candidate compounds,” Allen states. “We’re developing new computational tools to evaluate properties of different molecular combinations, looking for new compounds that show promise before laboratory experimentation needs to take place.”

One year into the effort, the consortium has made significant progress in its core research model: the chemistry design loop. This approach optimizes the processes of exploring chemicals, rapidly evaluating molecules, and proposing new compounds. Allen leads the LLNL team responsible for implementing the loop, which includes characterizing and simulating the safety, efficacy, and pharmacokinetics of candidate compounds.

Among other activities, the team is building a library of ML models for automated comparison of hundreds of drug properties known to be important for drug design. This evaluation relies on a sophisticated ML pipeline that integrates traditional statistical models with deep learning algorithms. The process generates predictions of new simulations to run and experiments to conduct by ranking top molecules as well as those with less confident predictions. Prediction quality is expected to improve as the algorithms learn from previous iterations, starting the loop anew.

As with any ML technique, reliability depends on characterizing uncertainty. Allen says, “We’re working toward understanding the criteria needed to make accurate ML predictions on candidate compounds and, crucially, when they will fail. We also must ensure that new data collected will be valuable in further algorithm training.” The ATOM team is exploring the use of uncertainty quantification analysis to guide active learning, characterize confidence in model predictions, and assign weight to model ensembles such as random forests and neural network committees.

workflow arrows pointing right that show data ingestion and curation, featurization, model training and tuning, prediction generation, and visualization and analysis
Figure 2. The consortium’s data-driven machine learning pipeline enables analysis of diverse data sets (data lake), integration of proprietary model repositories (model zoo), and rapid evaluation of model architecture (results database). The approach also includes data security and access control. The team plans to release the software framework as open source.

The design loop runs on Livermore Computing’s graphics processing unit (GPU) clusters including Pascal, Surface, and Lassen. GPU architectures enable significant speed ups in processing—a necessity for computationally expensive ML algorithms and analysis of large-scale data sets. Furthermore, Allen notes, “We have the infrastructure to work with protected data.” ATOM also requires reproducibility and traceability of data processing, so the project leverages flexible, lightweight software tools to build web applications and data services. LLNL’s HPC experts have created an environment where researchers and domain scientists can securely access project data, quickly spin up multi-node processes, and easily run many cycles.

In 2018, ATOM was given the Department of Energy’s (DOE’s) Technology Transfer Working Group’s “Best in Class” award for innovation in partnerships. Jim Brase, ATOM’s co-lead and head of technology, says, “ATOM is about developing a new way of designing molecules based on computation and large data sets. Although our current focus is on new medicines, the underlying capabilities have potential for a variety of purposes, such as in materials design.” The consortium’s rapid technical progress is beginning to attract new partners, including GPU manufacturer NVIDIA.

Along with staff from the Biosciences and Biotechnology Division, LLNL’s ATOM team includes Computing researchers Ryan Forsyth, Derek Jones, Kevin McLoughlin, Amanda Minnich, Marisa Torres, and Adam Zemla. ATOM research and results have been presented at numerous high-profile events, including Supercomputing ‘18, NVIDIA’s GPU Technology Conference, and the Society for Industrial and Applied Mathematics Conference on Computational Science and Engineering ‘19.