This post is a framework for diagnosing diseases with artificial intelligence. It draws its inspiration heavily from a transcript of Henry Cohen‘s excellent lecture in 1943, “The Nature, Method and Purpose of Diagnosis.” I liked reading Cohen’s lecture because it is clear and concise and seemed to fit well with an artificial intelligence approach to diagnosing diseases. My interest in this subject is in developing algorithms to make diagnoses.
There are no diseases, only disease.
This kind of sums the whole thing up and is a great place to start. This quote reinforces the idea that we’re not looking to exhaustively search innumerable avenues. We’re looking to find what, at its root, is bothering the patient.
Why artificial intelligence to diagnose diseases? The reason is to provide a consistently high quality of patient care, a quality of care that is repeatable and reliable. Cohen argues that one of the main problems with patient care is consistency. The same patient could get different diagnoses from different doctors. Doctors are human, after all, and clearly differ. How do doctors differ?
- Observational prowess
- Knowledge of symptoms, signs, and syndromes of disease
- Interpretive ability
- Use different labels
Creating and deploying an artificial intelligence based system with the same observational and interpretive abilities, and a consistent taxonomy would relieve the confusion of conflicting medical advices.
The literature on using artificial intelligence to algorithmically make medical diagnoses is surprisingly timid. Usually, attempts at automatically diagnosing diseases are couched in words like, “assist” and “consult” and rarely, if ever, take full responsibility for making the diagnosis. One author suggested a reason for this: trepidation about encroaching on the doctor patient relationship. Usually, algorithmic diagnostic systems are tailored to diagnosing specific disease. The task of this post is to outline a framework for diagnosing disease.
The first stage in the diagnosing disease is the recognition of simple quantitative deviations from normal.
This sentence from Cohen provides a great start to our diagnostic engine. ”First stage” implies that a simple state machine structure for the main diagnostic engine is appropriate, with the first stage seeking to gather information and compare it to what is considered normal for that patient.
The inputs to the observation stage are the patient’s medical history, their testimony about why they are seeking diagnosis and a physical examination. Processing the patient’s testimony can exist on a spectrum with one end anchored by using natural language processing (NLP) to parse and comprehend the patient’s testimony and the other by letting the patient choose from a drop down menu. The former is much more sophisticated, involved, but more accurately answers the task as it allows the patient to consult the device when they don’t know what is wrong, which makes the device infinitely more usable. The latter essentially boils down to a choose-your-own adventure approach, is trivial to implement, but does not significantly move the needle beyond simple internet searches, leaving the responsibility of diagnosis or deciding whether or not to consult a doctor in the hands of the patient. The NLP approach, or something close to it is preferred.
The NLP lies outside of the main diagnostic engine so that different algorithms can be swapped in and out seamlessly. The diagnostic flow should not depend on the specifics of the NLP used to parse the patient’s testimony. The NLP outputs certain key words gleaned from the patient’s testimony ordered and weighted in terms of “importance.” Clearly, the NLP will need to know what the observation stage of the diagnostic engine considers important, but does not need to be embedded in the diagnostic engine.
The patient’s testimony, their medical record and any vital signs recorded are collected in the observation stage and passed to the next stage. The observation stage may and probably will order or perform certain biometric tests on the patient and wait for those results before proceeding to the next stage, Interpretation. For example, the observation stage may perform and pass on the results of an electrocardiogram (ECG) test based on certain watchwords figuring prominently in the patient’s testimony.
The observation stage looks at physical exam measurements and compares them to expected values given the patient’s data form their medical records (age, height, weight, gender…). It also performs additional tests based on simple keyword matching from the patient’s testimony. The observation stage writes back to the patient’s file and passes everything onto the interpretation stage. The observation stage is like the nurse taking your vital signs and the interpretation stage is like the doctor who comes to make a medical diagnosis.
The output of the observation stage is an up-to-date medical record of the patient. The ability to index the patient’s record temporally is important as this allows the Interpretation stage to analyze how a particular condition has changed over time.
The interpretation stage can ask the patient questions directly to obtain additional information. It is better to keep this ability directly in the interpretation stage instead of going back to the observation stage because this new information will probably be directly linked to branching in this stage of the diagnosis. To simplify things, the query/response format in the interpretation stage does not need to be as open-ended, from a linguistic standpoint, as in the observation stage. In the observation stage, natural language processing is needed because we may not be sure what we are trying to figure out yet, so we have to derive meaning from a very complex set of possibilities. However, in the interpretation stage, we are seeking specific, targeted information, so the response options should be limited. This fits well into a dropdown box, an example is below.
Have you felt chest pain?
The last three, “I don’t know”, “I don’t understand the question,” and “It’s complicated” allow the response/query routine to improve its chances for getting a useful answer out of the patient.
The AI for medical diagnosis will need to reason anatomically, that is, it will have to move from one part of the body to the other in search for interpretations that fit the existing data. Cohen considered the “fundamental tripod of medicine” to be anatomy, physiology and pathology. Of these, anatomy lends itself well to being described as a connectivity graph. The AI could have different graphs for different systems in the body such as circulatory, respiratory, endocrine, …. each describing how different parts of the body are connected together. A simple 1, 0 (connected, not connected) would probably do as the AI is simply looking for what to try next, that is, once it traverses the graph from, say, the heart to the liver, it is using “liver” as a keyword to lookup potential next steps.
What about a Bayesian approach to interpretation? I would stay away from it because it relies on “models that are subjective, and the resulting inference depends greatly on the model selected.”  We are seeking a framework that can be used for diagnosing a wide range of diseases, not tuned to specific diseases. The framework must be general and its reasoning mathematical. The reasoning itself cannot have a subjective foundation.
The output of the interpretation stage is a provisional medical decision about which steps to take next. If the algorithm does not have enough information to make a decision, it does not need to do so. It can order more tests or suggest a therapy to alleviate the condition and have the patient report back. When the patient reports back, they start again in the observation phase.
Symbolization, Corrective Action and Evaluating an Algorithm
Even after the diagnostic engine reaches the heart of the matter, what’s wrong with the patient, there is still much more work to do. First it must encode the diagnosis in a manner that will allow it to treat the same disease the same way, every time, from patient to patient. The Oxford American Dictionary defines syndrome as
syndrome n. 1) a group of concurrent symptoms of a disease
If the list of syndromes for a disease is complete enough, it will uniquely identify a disease. Cohen assesses syndromes as the site, functional disturbances and cause of disease . This should be enough information to universally encode the disease. Notice we have not included any prescriptive remedy in the encoding as this will vary from patient to patient as patients with the same disease at the same site may need different courses of action based on age, gender….
Second, we must figure out the cause of the disease and its implications.
Too frequently we have been content with a diagnostic label without investigating its implications.
Causation implies a search for antecedents, and not for the ultimate — the final — cause of all things. This means not a single antecedent or even a chain of antecedents, but a whole interlacing network of them.
This points directly to graph theory for reasoning through the causes and implications for the disease. Somehow we’d need to map corporal function to a manifold and be able to traverse it. This is significantly more complicated than the simple graph traversal in the Interpretation stage, as there, we are simply seeking clues to help us along our decision making tree. We’ve already mapped the several systems of the body to graphs: circulatory system, skeletal, respiratory system, and are simply looking them up. In this stage we’d likely need to do the mapping on the fly based on what we figured out from the previous stage.
In addition to affixing a label to the diagnosis, the output of this stage is to recommend a corrective action.
The main aim of diagnosis, that of providing the rational basis for treatment and prognosis…
The main implementation decision to make here is: do we spend more time/energy investigating causes and implications and make the treatment recommendation and prognosis estimate simpler or vice versa. For instance, if the algorithm is good at figuring out causation and implication, maybe the treatment and prognosis can be a simple look up table. If causation/implication is simple, then we’ll want to do something more complicated for treatment/prognosis. I prefer the former. Because they are so tightly coupled, causation/implication and prognosis/treatment I consider them part of the same stage of diagnosis, even though they may have separate artificial intelligence approaches.
Evaluating a Medical Diagnosis Algorithm
‘…common sense pressed for time accepts and acts on acceptance.’ We physicians are often confronted by a situation in which we have to give a provisional verdict on the admittedly inadequate available evidence.
For any algorithm, execution time will be crucial. The algorithm will have to provide feedback quickly to the patient, even if it does not have a final diagnosis. The user experience aspect of keeping the patient informed as the algorithm works through its reasoning will help the patient become comfortable with seeking diagnosis from a machine, as opposed to a human doctor. Time is of the essence; in fact, the algorithm should not get bogged down in spending clock cycles on getting every corner case right in exchange for reaching most common conclusions quickly.
The New England Journal of Medicine published the results of case records presented to clinicians and discussant groups and whether or not they were able to correctly diagnose the disease in the case studies. Of the 43 case studies, individual clinicians were correct 65% of the time and the discussant groups were correct 80% of the time. The study asked the participants to assess the confidence level of their diagnosis as well.
It seems that being right 80% is good enough, at least it was the state of the art in 1969. That’s another important thing to remember when testing medical diagnostic AI: what are we comparing it against? If medical diagnostic AI can approach 80% success rate then it will be a viable alternative to seeing a doctor. Even at roughly 50% success rate, its a reasonable alternative. For some purposes such as the Tricorder X-Prize, this should be good enough since the potential use of the Tricorder X-Prize device would be to help people decide if they should go see a doctor.
 Cohen, Henry. “The Nature, Method and Purpose of Diagnosis,” The Skinner Lecture, 1943. Cambridge, UK: University Press, 1943.
 Hogg, Robert and Allen Craig. Introduction to Mathematical Statistics. 5th ed, NJ: Prentice-Hall, 1995.
 Case Records of the Massachusetts General Hospital (Case 30-1969). New England Journal of Medicine. 1969; 281: 206-213.