For pharmaceutical companies in the digital era, intense pressure to achieve medical miracles falls as much on the shoulders of CIOs as on lead scientists.
Rigid requirements to ensure the accuracy of data and veracity of scientific formulas as well as machine learning algorithms and data tools are common in modern laboratories.
When Bob McCowan was promoted to CIO at Regeneron Pharmaceuticals in 2018, he had previously run the data center infrastructure for the $81.5 billion company’s scientific, commercial, and manufacturing businesses since joining the company in 2014.
In that capacity, he knew that, in addition to having the right team and technical building blocks in place, data was the key to Regeneron’s future success.
“It is all about the data. Everything we do is data-driven, and at that time, we were very datacenter-driven but the technology had lots of limitations” says McCowan. “It worked for us to keep the company successful, but it wasn’t giving us the scale and horsepower needed.”
To achieve what the company would need going forward, McCowan knew Regeneron would have to undergo a major transformation and build a more enhanced data pipeline that could inject data from up to 1,000 data sources in “analytical ready formats” for both the business and the scientists to consume, the CIO says.
And to do this, a move to the cloud was essential. “The only way to enable our scientists and scale up and grow in the future is to really embrace the cloud, and not just in terms of computational power and storage, but being able to deploy into different environments, different countries,” McCowan says. “If you are not on the cloud, you are going to be left behind.”
Empowering scientists through the cloud
McCowan set about migrating Regeneron to Amazon Web Services in late 2018. By 2020, IT had moved roughly 60% of all company data to the cloud — no minor task for an international firm that generated $16 billion in revenue in 2021, employs more than 10,000 people, and holds nine FDA- and EMA-approved drugs with an additional 30 in clinical trials.
The company’s multicloud infrastructure has since expanded to include Microsoft Azure for business applications and Google Cloud Platform to provide its scientists with a greater array of options for experimentation.
“Google created some very interesting algorithms and tools that are available in AWS,” McCowan says. “And some things [Regeneron’s scientists] can only try out in the Google cloud. So, we are using all three mainstream clouds, but really the core of it is around AWS.”
Due to the complexity of the Regeneron’s experimentation and testing, the company uses a variety of standard SaaS tools for analysis but its enhanced cloud-based MetaBio Data Discovery Platform, which provides a wide array of data services, data management tools, and machine learning tools as “icing on the cake,” is the crown jewel of the company’s analytics operations, McCowan says.
MetaBio, which received a 2022 CIO 100 Award, provides a single source for datasets in a unified format, enabling researchers to quickly extract information about various therapeutic functions without having to worry about how to prepare or find the data.
“Scientists come to us with white papers which may be identifying theoretical ways that you could analyze a scientific experiment,” McCowan says. “We’ll work with those scientists and actually build the computer models and go run it, and it can be anything from sub-visual particle imaging to protein folding,” he says. “In other cases, it’s more of a standard computational requirement and we help them provide the data in the right formats. Then the data is consumed by SaaS-based computational tools, but it still sits within our organization and sits within the controls of our cloud-based solutions.”
Much of Regeneron’s data, of course, is confidential. For that reason, many of its data tools — and even its data lake — were built in-house using AWS.
“We have our own data lakehouses in AWS,” says McCowan, who also lead Regeneron IT to a 2020 CIO 100 Award, for developing Regeneron Deva Platform, a research computing platform built to simplify, scale, and accelerate the early discovery analytical experience. “By creating some small adjustments, we are allowing scientists to connect data in ways they were not able to before. Our vision for the data lake is that we want to be able to connect every group, from our genetic center through manufacturing through clinical safety and early research. That’s hard to do when you have 30 years of data.”
The data platform provides constant access to connected and contextualized data via data lakes, scalable clouds, data processing and AI services, the CIO says, adding that the company’s data lakes manage roughly 200 terabytes of data.
Fueling innovation with data
McCowan is cautious not to restrict the use of external tools — particularly cloud-native tools — that help scientists dig for discoveries. At the infrastructure level, Regeneron scientists use AWS EMR and Cloudera. At the data pipeline level, scientists use Apigee, Airflow, NiFi, and Kafka. At the data warehouse level, scientists use Redshift. As you go up the stack, different data analytics come into play, such as DataIQ. From a language perspective, scientists use Python and Jupyter Notebooks.
For McCowan, the key is to give scientists any and all tools that allow them to explore their hypotheses and test theories. “One of the fantastic things about Regeneron is that we’re driven by curiosity,” the CIO says. “We’re driven by science, and by innovation, and we try to avoid putting hard boundaries around what we do because it tends to stifle innovation.”
Despite the fact that Regeneron scientists have AI and ML tools at their disposal, data remains the key, McCowan says, and it’s the power of the cloud and analytics alone that may reveal the next biggest breakthrough from data that is 10 years old.
“I can’t tell you how many times I’ve read about these fantastic projects using AI and ML, but you never see the output because they fail,” McCowan says. “And the reason they are failing is that people are not putting enough thought into where the data is coming from. That is why we built our data infrastructure. So, by the time that data lands in the data lakes, and we start applying AI and ML, we know we are using it against high-quality data.”
As the company’s chief technologist, McCowan’s job is to digitize everything and help scientists make the best use of the data and metadata regardless of how it is generated.
“It always comes back to the data and the insights that we can provide using different technologies and increasing the speed of decision-making,” McCowan says, adding that providing scientists with the ability to run experimentation mathematically through engines using AI and ML models speeds up discovery, but it will never replace the wet lab.
The combination of enhanced IT and science is what will drive maximum innovation at Regeneron, McCowan says. And here, the MetaBio data platform will play a key role in facilitating breakthrough discoveries far faster than previously possible.
“The level of detail there with us digitizing everything, we’re able to apply technology and tools to help scientists make connections that they were just not able to make before,” McCowan says. “If you look at it from a pure data perspective, what we can do is find ways to [enable scientists] to connect the data better and faster and make those insights and bring drugs to market down to a five-year or four-year [process], when before it was a 10-year process.”