Bridging the Data Gap: How R&D Can Drive Diversity in Clinical Research: An MIT Sloan Career Development Office Conversation with Dr. Brian H. Johnson, VP of R&D Technology at Takeda

The integration of AI in pharmaceutical research and development (R&D) holds immense potential to accelerate drug discovery, optimize clinical operations, and improve patient outcomes. However, a persistent challenge remains: the data gap. AI-driven models are only as strong as the datasets on which they are trained, and historically, these datasets have lacked diversity, resulting in biased algorithms that may exacerbate healthcare inequities. To address this pressing issue, industry leaders are advocating for more inclusive data strategies that ensure underrepresented populations are accurately reflected in clinical research.

In this exclusive interview, Dr. Brian H. Johnson, VP of R&D Technology at Takeda, explores the intersection of AI, real-world data (RWD), and regulatory frameworks in shaping the future of equitable clinical research. He discussed how data management, evidence-based medicine, and strategic innovation can bridge gaps in healthcare and drive meaningful change in the development of new drugs. Ensuring Inclusive AI in ClinOps, Enhancing Data Diversity, Leveraging RWD/RWE, and Navigating Regulatory Challenges in Pharma R&D.

You have previously expressed concerns that AI in Clinical Operations (ClinOps) may widen existing healthcare equity gaps due to biased datasets. What strategies do you believe can help mitigate this issue and ensure AI-driven decision-making is more inclusive and equitable?

Dr. Brian H. Johnson: It is a real issue, and the notion of the digital divide gets bigger and bigger as our technology moves forward. We lack adequate data representation and the ability to understand, manage, and monitor bias. So, there are three pieces to this:

  • First, from a strategic standpoint, it is about understanding and finding those sources of data. Those companies have devoted their time to analyzing the data, integrating it into the larger ecosystem and environment, and helping inform how clinical trials can be made more diverse. That is one part—finding individual who have been investing in this for a while, and Inside Edge is a notable example of that.
  • Second, it is identifying what data are available today that people willingly give up. So, can we explore other available data sources and utilize them to drive that? Maybe not as a valid substitute for health data, but as a proxy.
  • The third is what we just talked about—the notion of bias. As we begin to develop these AI tools, I am becoming increasingly interested—and concerned—about understanding the robustness of these models and their applicability, especially given that I do not expect to see explainability in these models soon.

Especially as you consider generative models and approaches, where you are now creating subsystems and capabilities that we do not fully understand, the onus is on validation, vetting, and understanding how to detect and manage bias. It will be in our best interest to start to develop those capabilities. It may require a different approach to thinking about statistics and analytics. It may require a unique perspective on our AI models. However, our ability to manage bias—detect it, understand its root cause, and address it—is crucial.

When talking about these things, I always think about them in two ways:

  • The problem statement
  • How to implement these kinds of solutions or create sustainable capabilities

When I think about this question, what it comes down to is that as we understand healthcare, both in the U.S. and globally, our minority populations are some of the sickest populations that we have. And in the U.S., that trend is only growing. The minority population continues to grow, while the Caucasian majority population continues to decline. It is based on the notion that as the majority-minority dynamic flips, our healthcare system is not set up for minority participation in an authentic way. If we can start crafting a narrative that speaks to our future, where more people will need to be treated, where more people from outside the U.S. will require care, then this becomes critical to our success, vital to our GDP, and essential in the workforce and education system.

So, one approach is to find and invest in capturing this data, breaking down barriers to future-proof healthcare. At the same time, we need to examine technology and ensure that we build upon the necessary safeguards and approaches to leverage this competent yet nascent technology.

Given that AI is only as good as the data it is trained on, studies show that over eighty percent of genomic data is used in AI-driven healthcare—how can pharmaceutical companies and research organizations work toward improving diversity, accuracy, and completeness of datasets, particularly for underrepresented populations?

Everything else I said, let us take as foundational. One area where AI can significantly help is the generation of synthetic data. If I want to experiment, I can visit regions like India or Africa, where healthcare infrastructure is well-established, and collect data. For example, if I need 1,000 African men with heart disease, I can find that data. However, the real science begins when I layer this with attributes from the limited U.S. dataset—say, 50 Black men with heart issues—and create a synthetic dataset that represents real populations.

This requires deep scientific, clinical, and genomic expertise, which is often found in pharmaceutical companies. Additionally, breaking down the walls of distrust in healthcare is crucial. Pharma companies can create value streams, invest in trust brokers, and work with organizations that already have credibility in underserved communities.

“If we don’t improve data diversity, the industry will lose impact and sustainability.”

  • Today, we might be treating 32% of the population, but if that drops to 20%, can we justify the costs and investments? The industry must fundamentally change its approach to patient populations and establish trust through intermediaries.

It is not a small task; it is not a slight tweak—it is a fundamental change in the way the industry views the patient population and builds these relationships through trust brokers.

Regulatory bodies, such as the FDA, are developing guidelines for AI, including software as a medical device. Pharma companies that take the lead in setting standards for synthetic data will have a first-mover advantage. If I were a Pfizer, J&J, or Takeda, I would think about how I can create an environment that is optimized for me in terms of these things.

That is imperative. However, if that value proposition is not clearly articulated, I can see many organizations within the industry holding back and waiting for someone else to make that investment.

With your expertise in evidence-based medicine and new product development, how do you see real-world data (RWD) and real-world evidence (RWE) shaping the future of drug development? Can these be leveraged to address healthcare disparities and improve access to innovative treatments?

I think in a couple of ways. First, real-world data (RWD) is a rich source of insight into the reality of the situation. It helps identify underserved communities and assess healthcare service disparities. Fundamental analysis can show where to invest to improve outcomes. But real-world evidence (RWE) can push the limits even further. Imagine a future where evidence informs safety profiles long before clinical trials begin. Discovery scientists could design for safety and equity from the start.

For instance, if we observe a disparity in the effectiveness of an existing drug, such as albuterol, among different populations, we can utilize Real-World Evidence (RWE) to inform drug design, thereby avoiding molecular structures that contribute to inequities.

Now, by integrating RWE into multimodal analysis, we gain entirely new dimensions of insight. Generative AI plays a crucial role in this context. It democratizes analytic capability—what used to take experts weeks or months, anyone can now do in minutes.

In the pharmaceutical industry, we are learning to harness new data sources, including fundamental world evidence (RWE), spatial genomics, and more. Once these datasets become relatable, all bets are off. We move from 10 to 100 dimensions of analysis, making precision medicine a reality.

Precision medicine involves tailoring treatments to an individual’s specific needs. Health equity is a subset of this—understanding population differences to maximize impact. The pharma industry’s goal should be to scale these methods, preparing for a future where every patient receives genuinely personalized care.

“By investing in these technologies safely and ethically, we drive business growth while benefiting communities—paving the way for the Holy Grail of medicine: true precision healthcare.”

Artificial intelligence (AI) is rapidly transforming research and development (R&D) in the pharmaceutical industry. However, regulatory bodies are still catching up with AI-driven innovations. What regulatory changes or industry-wide standards do you believe are necessary to ensure ethical, effective, and equitable AI adoption in pharmaceutical R&D?

I recall a period in 2005-06 when it was possible to fast-track molecules into FDA approval. However, many were later pulled from the market because they were not well understood. Regulatory bodies play a crucial role in ensuring patient safety. The pharmaceutical industry’s pressure, risk, and investment levels demand strong regulations, not as an innovation killer but as a driver. Unlike tech companies, we cannot afford errors that risk lives.

Regulatory bodies must rethink staffing by incorporating technologists and data scientists alongside clinicians and regulatory professionals. The fundamental challenge in AI adoption is the need for explainability. While traditional machine learning (ML) models offer some interpretability, AI-driven decisions in the pharmaceutical industry may remain opaque.

Regulatory bodies rely on understanding, so the key question is: What replaces explainability? Is empirical evidence from extensive patient studies enough? Should we maintain a continuous test set to challenge AI models? Can we actively identify and address bias and discrepancies in outcomes? These concerns require dialogue between regulators, industry leaders, and ethicists. We must redefine the concept of “proof” in AI-driven R&D.

Clinical trial designs evolved through evidence and refinement. AI regulations must follow a similar process, but at a much faster pace. We should explore orthogonal AI solutions, such as bias detection mechanisms that evaluate model reliability.

Stakeholder trust—among patients, physicians, and payers—matters as much as regulatory approval. My experience with digital diagnostics showed payers were eager to identify patients who would benefit rather than excluding those who wouldn’t. AI should initially focus on high-risk diseases where the benefits outweigh the risks, refining models over time to expand into lower-risk areas.

We need accelerated cycles of validation to establish AI’s role in drug development, but the question remains: What will be “good enough” for adoption? What knowledge and skills should professionals, both recent graduates and industry alumni, acquire to excel in the intersection of life sciences and technology? Should they focus more on domain knowledge or model building, and how can they effectively combine to drive innovation in this field?

I had a conversation with many leaders in R&D at Takeda. AI will transform the way we deliver IT capabilities and services. My need for technology-based people diminishes every day because I am looking at tools that can manage most of the development of lifecycles. If I find a tool that does 60-70% correctly, that is transformational.

My advice for those entering the industry is to become facile with AI/ML. You do not need to be a practitioner, but you do need to understand its principles, usability, and where it is effective or not. AI is evolving rapidly, so be prepared to relearn, unlearn, and continue learning over the next two to five years.

I train my teams by placing them in distinct functions—research, pharmaceutical sciences, clinical, regulatory—so they understand the value chain. But at some point, they must specialize. The future belongs to experts who understand both their domain and AI.

Envision cross-functional programs—not just IT or science but a fusion of both. Just as personal computing revolutionized education, AI will become integral to how we work. Embrace it rather than resist it. The sooner we incorporate AI in meaningful and safe ways, the better prepared we will be.

Wrapping Up

The discussion highlighted the transformative impact of AI in life sciences, emphasizing the need for professionals to embrace continuous learning, adaptability, and interdisciplinary expertise. The integration of AI in clinical trials, drug discovery, and regulatory processes is not just enhancing efficiency but also reshaping traditional roles in IT and life sciences.

Success in this evolving landscape requires a deep understanding of both AI and domain knowledge, as automation is reducing the need for purely technical roles. Collaboration between scientists, AI practitioners, and regulatory professionals will be key in driving innovation. Those who adopt AI-driven tools, specialize in their domain, and leverage cross-functional expertise will lead the future of life sciences and healthcare transformation.

Bios:

Dr. Brian H. Johnson

Mr. Johnson is a Ph.D.-trained molecular biologist with over 25 years of experience in R&D across the Pharmaceuticals, Consumer Packaged Goods, and Diagnostics industries. He is deeply passionate about mentoring and guiding professionals to advance their careers. His expertise spans data management and integration, new product development processes, evidence-based medicine, and addressing healthcare disparities. With a strong track record of driving innovation and optimizing research strategies, Mr. Johnson excels in translating scientific insights into real-world applications. He is committed to fostering cross-functional collaboration and bridging the gap between science and industry to create a meaningful and lasting impact.

Partha Anbil

Partha Anbil is a Contributing Writer for the MIT Sloan Career Development Office and an alumnus of MIT Sloan. Besides being the VP of Programs of the MIT Club of Delaware Valley, Partha is a long-time veteran in the life sciences consulting industry. He has held senior leadership roles at IBM, Booz & Company (now PWC Strategy&), IMS Health Management Consulting Group (now IQVIA), and KPMG. He can be reached at partha.anbil@alum.mit.edu.

Michael Wong 

Michael is a Contributing Writer for the MIT Sloan Career Development Office and an Emeritus Co-President and board member of the Harvard Business School Healthcare Alumni Association. Michael is a Part-time Lecturer for the Wharton Communication Program at the University of Pennsylvania, and his ideas have been shared in the MIT Sloan Management Review and Harvard Business Review.

By MIT Sloan CDO
MIT Sloan CDO