AstraZeneca

Pushing the boundaries of data to deliver life-changing medicines

For every dollar we spend on a data initiative, we are able to get $40 in return.

Andy McPhee

Accelerate research and development by establishing a single source of truth

3 minutes

to get 90% of data ready for analysis

99% time savings

bringing planning cycles down to 3 hours

$1 billion annual savings

by shaving just 1 month off each clinical trial


Biopharmaceutical company AstraZeneca focuses on the discovery, development, and commercialization of prescription medicines used by millions of patients worldwide. Like all pharmaceutical companies, AstraZeneca faces stiff competition. As Andy McPhee, Data Engineering Director at AstraZeneca, points out: “We must balance this desire to speed the process with trusted data. If we do not have the quality in our data, our drugs will not be approved, and we will be affecting the lives of potential patients. Talend provides us the speed and trust we need.”

AstraZeneca decided to build a data lake to hold the data from its wide range of source systems. Cloud was a very important aspect of this architecture, providing scalability and flexibility. But as McPhee explains, the data must be correct and protected. “Data without trust is useless,” says McPhee. “Data Governance is critical to knowing that we can trust our data and ensuring that the data is well understood, well looked after, and only accessible to the right people.”

Shortening the drug development timeframe is important to AstraZeneca because it gets lifesaving and life-improving drugs into the hands of patients faster and enables the company to complete the process before the patent expires.

The search capability provided by Talend enables data scientists to exploit the massive amounts of data provided by the drug discovery process, imaging, and genome projects. For the imaging work, AstraZeneca has access to years of images of particular diseases and their associated metadata. With Talend, the company can use this metadata, digitize it, and push it through specific workloads and pipelines to allow data scientists to write algorithms to learn what is in the images. In the future, when they get a new image, they can predict with high degree of certainty if an image includes the disease.

Completing the clinical trials is one of the most expensive parts of drug development, costing millions of dollars. Shaving just one month off of each clinical trial will save AstraZeneca $1 billion per year. Stuart Charles, Clinical Control Tower Technical lead explains, “We have approximately 20 drugs in the pipeline at any one time. Clinical trials are complicated and regulations vary per country so that each drug will go through separate trials across 100 different countries. Every time you are dealing with data, there are real people behind the data. If there is a safety issue, speed is essential as we need to report it quickly to stop further patients from receiving that treatment.”