Hey Siri…does Sue Smith have a family history of high blood pressure?

By Florent Saint-Clair

While Speech Recognition was a definite and necessary building block that made Natural Language Processing (NLP) ultimately possible, equating it with NLP is like comparing a 1921 Model T Ford with a 2021 model, self-driving Tesla.

A Tesla Model S next to a Ford Model T – Source: DigitalTrends

From the mind of Arthur C. Clarke to the imagination and vision of Stanley Kubrick, scores of science fiction fans continue to rave about 2001: A Space Odyssey. In 1968 when the movie was first released, the year 2001 seemed distant enough that any number sequence beyond the year 2000 could capture people’s imagination, young and old. Would there be flying cars? What would fashion be like (we’re still not wearing aluminum clothes by the way, although with global warming that may be coming soon…). The simple switch of a digit from a 1 to a 2, one millennium to the next, held immeasurable omen and promise to an entire generation. 

One of the most notable protagonists in the movie was not a human. It was Hal, the computer that took control of the spaceship and, therefore, also took control of the fate of humans on the ship. Hal drew us in thanks to its/his convivial and relatable human-like personality. Apple’s Siri, Amazon’s Alexa, and any other AI created to be more relatable by humans are all derivations of Hal.

As end-users become more and more comfortable with the idea of interacting with machines, user interfaces are also evolving beyond physical devices such as a mouse and keyboard – clue in Scotty, the resourceful Chief Engineer in Star Trek IV, the Voyage Home, when he attempts to interrogate a 20th century computer by speaking to it. incredulous that the computer couldn’t understand him, he decides to speak into the mouse, and then resigns himself to the reality that he will need to input his request by typing on a keyboard. 

Star Trek writers and creators were immensely visionary in these highly entertaining scenes. Their basic premise was that in the future, human/computer interfaces would leverage NLP, although they didn’t yet refer to it as such. At the time these movies were written, the vast majority of the technologies depicted belonged to the realm of fiction.

Fast forward to 2021, we now speak into our smart watches, speak to Alexa to turn lights on and off, choose a music playlist, remotely turn on the oven and change the temperature settings in our dwellings.

In the near future, NLP will help us avoid countless medical errors. The kinds of things that NLP can achieve for the evolution of healthcare go far beyond what rudimentary Speech Recognition could ever achieve. There is an important distinction between speech recognition and speech understanding. Speech recognition is mechanical listening, while NLP is intelligent interpretation. 

Speech Recognition vs. Speech Understanding

Voice dictations today, although much more sophisticated and capable than a few years ago, rely heavily on the speaker’s deliberate dictation of punctuation such as “comma,” “period,” “new paragraph,” etc. This is voice recognition; not bad, but not great either. When voice processing is capable of understanding the speaker’s tone of voice, it is then able to place a period at the end of a sentence without the speaker having to dictate the word “period.” This incremental improvement inches toward true NLP, much more so than mechanical voice recognition.

One step further is for NLP to accurately render the intended message while doing so across many possible human accents and ethnicities. A medical term dictated by a native Bostonian radiologist may sound vastly different if uttered by a physician who grew up in Germany or India, then received their education in Britain. We once collaborated with a Saoudi radiologist educated in Sweden, who perfectly leveraged NLP to dictate reports into his Sectra PACS. Navigating the complex array of ethnicities and accents is one of the many benefits of training NLP algorithms with a plurality of data, thus avoiding a built-in bias in the NLP’s ability to recognize and process human speech.

A physician using voice dictation on a mobile device.

So far, we’ve only analyzed NLP from the perspective of the human to computer interface. The true potential of NLP exists beyond the interaction of humans with computers. NLP applied to text and pixels has the potential to become a powerful analytics tool capable of scanning, analyzing, organizing, and exploiting vast amounts of data accumulated over a period of decades, and across many different sources of data. This plurality in the origin of medical data is where AI can truly uncover useful patterns that would otherwise have no chance of appearing, even if humans applied years of effort to the task.

NLP is a powerful building block toward precision medicine and effective population health. The art and science behind truly useful clinical decision support is in allowing NLP and AI to asynchronously access and correlate data across health systems, borders, patient populations, ethnicities, languages, patient charts, prescriptions, reports, medical images, genomics data, pathology, and beyond. 

Now, let’s combine all these elements together and visualize the possibilities for physicians in our immediate future: physicians can use NLP to interrogate a patient’s health record, much like he or she would interrogate a nursing assistant or another physician. “Hey Siri, show me Sue Smith’s lab results from last January” or “show me Patrick Durant’s last five brain MRIs.” How many severe allergic reactions to contrast could we avoid if an allergy red flag appeared when a physician places an order? The possibilities are quite literally endless. 

Today, 85 percent of radiologists use a speech recognition system to record their reports. When it comes to systems integration, more than half (53 percent) of provider organizations are integrating speech recognition into their diagnostic and clinical workflows. Outside the healthcare world, speech recognition is becoming ubiquitous, particularly in the post-2011 Age of Siri. Last year over 150 million smart speakers were sold globally according to Strategy Analytics research. People all over the world have welcomed bots into their homes via smart speakers, freely sharing their private thoughts, questions, and information with tech giants that own the technology. In the upcoming year, game developers will be able to use sophisticated neural networks to mimic human voices. A natural interface that is inevitable, fast and intuitive, voice recognition has the irresistible pull of an emotional connection that we trust. 

Voice recognition technology powered by AI can now be used to not only transcribe speech and radiology reports, but also analyze and predict the outcome of a conversation based on its tone and the words used. Is radiology ready to embrace NLP? And, more importantly, is NLP ready for radiology?

Why Voice and Speech Recognition

Health systems collect vast amounts of data, from hospital records and patient medical records, to results of medical exams, and IoT (Internet of Things) devices. Biomedical research also generates a significant portion of data relevant to public healthcare. Much of the data in its present form, and given the amount of time and effort it would take humans to read and reformat, is difficult to exploit productively. It is not ready to be consumed by algorithms that would make it meaningful and useful for decision support. Meanwhile, as healthcare shifts from a fee-for-service model to value-based care, the stakes for data-driven decision support have never been higher.

This is where Natural Language Processing, a subcategory of Artificial Intelligence comes into play. NLP based chatbots already possess the ability to mimic human behavior and to execute a myriad of tasks. Chabots can help solve issues while essentially holding a conversation in plain English. When it comes to implementing the same on a much larger use scale, like a hospital, NLP can potentially be used to parse information and extract critical strings of data, offering an opportunity to finally make sense of the terabytes of unstructured data generated in the clinical setting. 

Medical professionals, and diagnosticians in particular, have long adopted NLP in their daily practice. What used to be a voice recording that would ultimately be transcribed by a human has evolved into voice-recognition software, ultimately causing the near extinction of the medical transcriptionist profession. While there are a few old-school holdover physicians who refuse to adopt NLP, the vast majority of medical professionals come out of medical school fully trained on NLP and cannot fathom the idea of going back to voice recording and word processors (or typewriters if we go back even further).

Our NLP integration partner Nuance (now a part of Microsoft), has acquired the vast majority of NLP technologies, with the exception of Dolby Labs and M*Modal. Dragon Dictation software was the first technology that at scale allowed physicians to begin to trust voice recognition in their everyday professional lives. It took a substantial amount of time and effort to train the accuracy of Dragon’s dictation, however the end result enabled physicians to speak into a microphone and immediately see the output of their dictation on a screen, in a Word document. It was at this point, medical transcription evolved from a physician’s voice recording into text by a transcriptionist, to editing a fully dictated document to correct any voice recognition errors in reports.

Recognizing Medical Terms vs. Everyday Speech

One of the key challenges that NLP needed to overcome was to accurately transcribe highly technical medical terms. Early on, M*Modal excelled because its NLP engine was fundamentally trained to specialize in medical vocabulary. Conversely, M*Modal’s engine could not, and should not be expected to do a good job of transcribing a trivial grocery list – an expectation that was better suited for Dragon during that era.

Consumers are comfortable using a digital personal assistant such as Siri to transcribe a grocery shopping list

Consumers are comfortable using a digital personal assistant such as Siri to transcribe a grocery shopping list.

It is a distinctive attribute of AI that it can continuously improve its performance with each incremental new input. PowerScribe 360 has continued to evolve, and so has M*Modal (now a part of 3M). As more refinement takes place in these NLP algorithms, the engines will naturally tend to become more versatile and will ultimately become indistinguishable from one another in their ability to understand and process human input, including visual input and touch input. 

One cannot refer to Natural Language Processing and exclude other, non-verbal modes of human communication – ultimately NLP needs to incorporate sign language (visual) and braille (touch) in order to bring its efficiency to all human modes of communication, and thereby make it universally possible for all patients, physically challenged or not, to communicate with their caregivers.


An interesting branch of NLP is the combination of optical character recognition (OCR) and NLP for the purpose of detecting language among pixels. A clear distinction should be made between unstructured text, which is still searchable text, versus words on a scanned document, which are pixel representations of words – not actual words.

In 1929 French Surrealist Magritte painted one of his most famous pieces: “Ceci n’est pas une pipe.” In this piece Magritte points out that a word is an arbitrary construct that doesn’t necessarily represent actual meaning to the person viewing it. In this example, we are not looking at a pipe, but rather a visual representation of something called a pipe in some cultures. 

Rene Magritte’s famous painting The Treachery Of Images

Recognizing objects and patterns in medical images or documents can be fraught with misinterpretation and errors. Recognizing the meaning of words that are represented by pixels can be even more challenging because words must be first identified, then processed, then understood with the right context and knowledge of that word.

This branch of NLP, combined with OCR, has become particularly important in the field of machine learning in medical imaging. As millions of images can be leveraged to unlock valuable knowledge of pathologies for AI, it’s also important to ensure no protected health information (PHI) is leftover in the dataset being fed to the algorithm. De-identification of metadata is a relatively straightforward, albeit tedious process. However, scanned documents, non-DICOM data, Secondary Capture images, handwritten notes on documents can all constitute potential HIPAA violations if pixel-based PHI are still present in the dataset. Because of these potential HIPAA vulnerabilities, NLP cannot quite yet be 100% trusted to catch and de-identify PHI that exist at the pixel level.

NLP Backgrounder

Speech recognition cannot be equated with NLP. NLP is a branch of Artificial Intelligence (AI) that studies how machines understand human language. Its goal is to build systems that can make sense of text and perform tasks like translation, grammar checking, or topic classification. NLP helps computers communicate with humans in their own language.

The two major types of NLP techniques include:

  1. Syntactic analysis ‒ or parsing ‒ analyzes text using basic grammar rules to identify sentence structure, how words are organized, and how words relate to each other.
  2. Semantic Analysis focuses on capturing the meaning of text. First, it studies the meaning of each individual word (lexical semantics). Then, it looks at the combination of words and what they mean in context.

Combating COVID-19 with NLP

As with a multitude of technologies, the COVID-19 pandemic dramatically accelerated the adoption of NLP in healthcare. AI-powered chatbots and virtual assistants were at the forefront of the fight against COVID: they helped screen and triage patients, conducted surveys, provided vital COVID-related information, and more. 2020 saw more conversational agents in telemedicine — from FAQ-chatbots and virtual consultants to chatbot-therapists — that made health services more accessible to people who couldn’t leave their homes. At Massachusetts General Hospital in Boston, physician researchers are exploring the use of AI-powered robots to obtain vital signs and deliver medicine in COVID-19 surge clinics, allowing healthcare staff to avoid potentially dangerous human contact. Beyond the hospital walls, NLP was shown to be effective with detecting fake news about COVID-19. The pandemic may have accelerated healthcare NLP, but unlike sourdough bread and dalgona coffee, this pandemic-induced trend is here to stay. 

NLP has found applications in healthcare ranging from the most cutting-edge solutions in precision medicine applications to the simple job of coding a claim for reimbursement or billing.

The driving factors behind NLP in Healthcare are based on its potential to:

  1. Handle the surge in clinical data
  2. Support value-based care and population health management
  3. Empower patients with health literacy
  4. Improve patient-provider interactions with EHR
  5. Address the need for higher quality of healthcare
  6. Identify patients who need improved care

NLP in Radiology: The Free Text / Unstructured Data Challenge

Radiology reporting has generated large quantities of digital content, which is potentially a valuable source of information for improving clinical care and supporting research. There are various guidelines for effective reporting of diagnostic imaging, but overall, a typical report consists of free text, organized in a number of standard sections. Due to the free-text nature of radiological reports, their conversion into a computer manageable representation is a challenge. NLP can convert unstructured text into a structured form, thus enabling automatic identification and extraction of information. Used on radiology reports, NLP techniques enable automatic identification and extraction of information. For example, NLP could help classify patients by groups, or by codes from a clinical coding system.

Eventually, natural language processing tools might be able to bridge the gap between the insurmountable volume of data in healthcare generated every day and the limited cognitive capacity of the human brain.

The Promise of Healthcare NLP to Improve Outcomes

There are high hopes for NLP In the Healthcare world. Healthcare NLP can improve function in a number of areas:

  1. EHR usability
  2. Predictive analytics
  3. Phenotyping
  4. Quality improvement

For instance, NLP can enable an EHR interface that makes patient encounter information easier for clinicians to find. The interface makes it easier for clinicians to find buried data and make diagnoses they might have otherwise missed. 

When it comes to predictive analytics, a case study that is often cited is the use of NLP to predict suicide attempts by monitoring social media. Suicide is among the 10 most common causes of death, as assessed by the World Health Organization. The study shows that there are quantifiable signals present in the language used on social media that machine learning algorithms can use, with relatively high precision, to separate users who would attempt suicide from those who would not. The authors of the study found that the NLP algorithm’s ability to predict which users would attempt suicide is good enough and worth considering how using similar methodologies might fit into a clinical application. While the technology enables scalable screening for suicide risk, the potential benefits must be balanced with ethics and privacy concerns that are coupled with mining data from social media networks. 

Phenotyping is another promising application of NLP, with the premise that it helps clinicians categorize patients to provide a deeper, more focused look into data (e.g., listing patients who share certain traits) and the ability to compare patient cohorts. NLP also allows for richer phenotypes. For example, pathology reports contain a lot of information, such as a patient’s condition, location of a growth, stage of a cancer, procedure(s), medications, and genetic status. A UK pilot study in myocardial infarction and death found that using free text, in addition to structured data, increased the recorded proportion of patients with chest pain in the week prior to MI from 19 to 27%, and differentiated between MI subtypes in a quarter more patients than structured data alone.

NLP will undoubtedly continue to evolve and improve in future years as it adapts and learns from its technology predecessors, similar to how Tesla is continuously evolving and learning. The evolution from the model T to Tesla was gradual technological improvement, and NLP will follow the same path with an accelerated timeline thanks to AI.