Don’t feed Gizmo (AI) after midnight!

In the 1984 Joe Dante film Gremlins, a cute and gentle creature can turn into a nightmarish monster if you don’t precisely follow the care instructions. Training a healthcare AI algorithm, although not quite as dramatic, can give its creators cold sweats nonetheless. Feeding medical imaging data to an algorithm must be conducted as meticulously as caring for a mythical Hollywood character. The right data must be selected, carefully de-identified and professionally labeled in order to ensure its safe release to clinical production.

AI’s Appetite for Data

The topic of Artificial Intelligence (AI) in medical imaging and what modern technology can now achieve comes up often when speaking with other Silicon Valley tech executives and individual contributors at Big Unicorn companies and small startups. The technology has advanced exponentially over the past decade to accomplish many types of video recognition, de-identification, and image enhancements, yet most software used in production today were developed when machine learning and AI were still in their infancy. Machine learning requirements weren’t evident at the time the solutions were created and therefore not taken into consideration. A staggering number of image processing algorithms have been developed over the past few years, and the technology used to develop them has evolved by leaps and bounds. In just the past five years we’ve seen exciting developments in machine learning tools and techniques.

The most challenging element of AI development in enterprise imaging is where and how to procure the necessary data – namely images – to accurately train and test algorithms at scale. The main success of AI, a data driven self learning algorithm, is based on constant learning by processing vast amounts of imaging data.

This topic is extremely relevant to us at Dicom Systems, given the volume of data normalization we routinely perform. Our flagship product Unifier with AI Conductor for PACS and EHR drives and conducts AI workflows to deliver the right information in the right place and in the right format. This allows us to effectively manage and resolve workflow bottlenecks that impact successful AI implementations.

More Data = Less Problems

The more volume and variety of data made available to an algorithm, the more accurate the learning, and the more accurate the outcomes will be. If this occurs, the problem of data bias can be mitigated. A small percentage of “inconsistent” data, relative to a large data set, is acceptable. The outliers can easily be identified and separated, to prevent the “contamination” of clean data. If you’ve ever made chicken stock from scratch, you know that skimming the bits that bubble up to the top will improve the clarity of your end product. The same will happen with biased and inconsistent data which can be easily identified assuming the data set is large enough. The key: more data, less problems, which is the opposite of what The Notorious B.I.G. famously rapped in his 1997 hit song, “Mo Money, Mo Problems.” In the case of data, more is definitely more.

As Samantha from the science fiction movie “HER” so famously said, “The DNA of who I am is based on the millions of personalities of all the programmers who wrote me. But what makes me me is my ability to grow through my experiences. So basically, in every moment I’m evolving, just like you.” We need data in order for algorithms to evolve and produce a desired result.

Joaquin Phoenix in “Her,” the first feature both written and directed by Spike Jonze. Image Credit Warner Bros.

Problems With Teleradiology Data

Training data is a resource used by engineers to develop machine learning models. It’s used to train algorithms by providing them with comprehensive, consistent information about a specific task. Training data is usually composed of a large number of data points, each formatted with labels and other useful and relevant metadata. Less data points means the possibility that an algorithm could be ultimately fooled, which could also mean dire consequences for a patient’s health if that algorithm is entrusted with a key diagnostic mission, for example the Goldendoodle or Fried Chicken meme of 2020.

High-quality data: All data must be cleaned and organized before it’s used on a model. Duplicate, incorrect, or irrelevant content can wreak havoc on a project. Small mistakes, such as the incorrect categorization of an image, can negatively impact the performance of the model and its ability to make accurate predictions.

Useful tags: Without a range of annotations and labels, it can be tough for models to learn properly. In radiology, useful tags could include:

  • Body part
  • Study description
  • Patient index
  • Date of Birth (DOB) and age format, or annotation information.

These additional criteria provide the model with information it needs to make more accurate predictions. Additionally, enriching the data set with more corollary information could also prove useful by correlating data that typically live in separate silos of information, such as family antecedents, pathology, genomics and prescription data. Ultimately, the correlation and plurality of data fed into an algorithm enhances our ability to make progress in precision medicine and population health.

Error and Discrepancy in Radiology Data

One study written by Adrian P. Brady and published by Insights Imaging found that errors and discrepancies in radiology practice were uncomfortably common, with an estimated day-to-day rate of 3–5% of studies reported, and much higher rates were reported in many targeted studies. Nonetheless, the meaning of the terms “error” and “discrepancy” and the relationship to medical negligence are frequently misunderstood. The abstract outlined the incidence of such events, the ways they can be categorized to aid understanding, and potential contributing factors, both human- and system-based. Possible strategies to minimise error were considered, along with the means of dealing with perceived underperformance when identified. The inevitability of imperfection is explained, while the importance of striving to minimise such imperfection is emphasised.

Key learnings:

  • Discrepancies between radiology reports and subsequent patient outcomes are not inevitably errors.
  • Radiologist reporting performance cannot be perfect, and some errors are inevitable.
  • Error or discrepancy in radiology reporting does not equate negligence.
  • Radiologist errors occur for many reasons, both human- and system-derived.
  • Strategies exist to minimise error causes and to learn from errors made.

AI can undoubtedly improve the overall accuracy of diagnostics, and will ultimately make life easier for diagnosticians. However, until imaging AI reaches critical mass in readiness and adoption, the need to source and prepare massive amounts of imaging data for machine learning will remain a priority. There is a general lack of thorough references available to educate radiologists on the importance of pre-processing as the first step.

For the data to be usable in an AI algorithm, there are several steps:
(1) Define a problem with regards to the capabilities of deep learning algorithms;
(2) Collect and label the relevant data consistent with the algorithm to address the problem;
(3) Perform the necessary pre-processing steps to prepare the input data; and
(4) Build an algorithm that will be trained on the provided data to solve the problem.

It is necessary for both radiologists and data scientists to equally understand these pre-processing steps and DICOM standards, such as Structured Reports (SR), Key Object Selection (KOS), and other standard annotation formats, so they can work effectively together to create effective solutions.

Most notably however, before getting into the specifics of imaging annotations, there is a fundamental lack of incentivization for diagnosticians to go the extra mile and annotate, when they are already overworked and suffer from professional burnout.

Imaging data must be refined into usable fuel before it can effectively power machine learning. The crucial element, or enabler, necessary to unlock AI potential is labeling.

Labeling de-identified images represents a unique challenge not only because there is so much work to be done (petabytes of images to label), but primarily because it is a value-added activity that can legally and credibly be performed only by qualified physicians.

Radiologists are well-compensated for their reading services and diagnostic expertise in the normal course of business, and typically don’t have a lot of spare time – or willingness – to moonlight for free as image labelers for machine learning. Additionally, many physicians remain deeply suspicious of the impact AI could potentially have on their future, so helping to train neural networks does not rank very high on their list of priorities.

A few visionary healthcare AI research organizations understood this challenge early on and secured the services of luminaries in exchange for equity in their burgeoning startups, which constitutes a different type of compensation than Radiologists typically earn in exchange for their expertise. There are so many startups that could ultimately get away with this model. Labeling de-identified images is a tedious process that requires a considerable effort by many, not few, contributors.

This equity compensation model ultimately isn’t sustainable if entrusted to a limited group of fully vested physician shareholders. At first, it may be exciting for a physician to anticipate a financial exit when the startup is either acquired or goes public, especially in a climate of declining reimbursements, and when AI is on the rise. Many physicians are looking for a way to shore up their future income possibilities. However, no matter how glamorous or lucrative AI will be for these few, there’s still a lot of work to do and not enough labor available to get the job done.

Additionally, this approach also raises the problem of built-in bias in training of machine learning algorithms. If a handful of physicians are tasked with labeling images for algorithms, then these algorithms will be inexorably limited by the biased diagnostic interpretations of the same handful of diagnosticians.

Since retrofitting and annotating billions of images is not an ideal scenario, the correct approach for the future of imaging AI training is to ensure AI readiness.

How Do We Normalize Data For AI?

Dicom Systems has been involved in a number of large-scale projects over the past 14 years that included transferring and de-identifying millions of studies while preparing each one to run through custom algorithms. Often, the imaging studies must be associated with their respective radiology report so that the algorithm may derive useful learning from the correlation of pixel data with actual diagnostic information. However, the reports are typically kept in a separate repository – a RIS or EHR – making it tricky to ensure the images are associated with the correct report, let alone trying to make the connection in the first place.

This process would be much more straightforward if machine learning was taken into account when PACS/MIMPS and EHR vendors started to develop their systems. A few years ago in another article, we compared Radiology to an Archaeology Dig. Enterprise imaging workflows are powered by IT ecosystems that are often stratified with decades’ worth of solutions deployed, making it impossible for them to predict that their systems would one day be a source of truth for deep learning processes.

The same thought process can be applied to enterprise image workflows. If you adequately and meticulously prepare data for the AI algorithm in conjunction with the image being read for the diagnostician, the reports would not have to be reprocessed at a later date. A study by Young W. Kim and Liem T. Mansfield on delayed diagnoses in radiology with emphasis on perpetuated errors concluded that failure to consult prior imaging studies is about 5% Inaccurate or incomplete clinical history is about 2%. With the proper data normalization algorithm applied those errors could be decreased to fractions of percent.

If data standardization requirements could be articulated and adopted across the industry, ensuring imaging and report data are AI-compatible and provided in standard output formats, we would see within a few years an abundance of AI-ready data for future algorithms to leverage. The standard formats can be easily defined using the technologies available on the market today, such as FHIR and DICOMWEB, but until there is strong incentive or requirements for those organizations to process data and ensure it is AI-ready, we will continue to have many challenges to deploy AI algorithm in live clinical production.

Long Term Solutions to Data Normalization For AI

Standardizing and normalizing data is an investment in the future. Just like any personal investment that takes time and effort, whether it’s saving for retirement, or regular exercise, the payoff comes in the long run. Adding tags today is an extra task for radiologists that will generate a return on investment much later. Skipping this step, as we know from personal routines such as brushing our teeth or regular exercise, will lead to extra work and possible problems in the future.

In the health IT world, we have a fairly recent example of the government providing incentives for the adoption of a new standard or process: the Meaningful Use Initiative In 2011, CMS established the Medicare and Medicaid EHR Incentive Programs (now known as the Promoting Interoperability Programs) to encourage eligible professionals, eligible hospitals, and CAHs to adopt, implement, upgrade, and demonstrate meaningful use of certified electronic health record technology (CEHRT). Under meaningful use, each measure had a performance threshold, and each eligible provider was separately responsible for meeting each measure. By the end of 2015, 56% of office-based physicians were participating in the meaningful use program. As of 2016, over 95% of hospitals eligible for the Medicare and Medicaid EHR Incentive Program had achieved meaningful use of certified health IT. Today, meaningful use is history, having been superseded by CMS’ Promoting Interoperability Program in 2018.

In 2020, the U.S. Government Accountability Office developed six policy options for addressing the slow adoption of machine learning in drug development. One of these policy options specifically addresses data standardization: “Policymakers could collaborate with relevant stakeholders to establish uniform standards for data and algorithms.” The opportunity is described as: “help efforts to ensure algorithms remain explainable and transparent, as well as aid data scientists with benchmarking.”

In the United Kingdom, NHS has started offering incentives to providers who embrace AI, including a long-term plan to make use of AI and machine learning to help clinicians interpret scans.

Could a similar program be established as a way to incentivize radiologists?