## The Artificial Intelligence of Diagnosing Diseases

March 7th, 2012

This post is a framework for diagnosing diseases with artificial intelligence.  It draws its inspiration heavily from a transcript of Henry Cohen‘s excellent lecture in 1943, “The Nature, Method and Purpose of Diagnosis.”  I liked reading Cohen’s lecture because it is clear and concise and seemed to fit well with an artificial intelligence approach to diagnosing diseases.  My interest in this subject is in developing algorithms to make diagnoses.

There are no diseases, only disease.

[1]

This kind of sums the whole thing up and is a great place to start.  This quote reinforces the idea that we’re not looking to exhaustively search innumerable avenues.  We’re looking to find what, at its root, is bothering the patient.

Why artificial intelligence to diagnose diseases?  The reason is to provide a consistently high quality of patient care, a quality of care that is repeatable and reliable.  Cohen argues that one of the main problems with patient care is consistency.  The same patient could get different diagnoses from different doctors.  Doctors are human, after all, and clearly differ.  How do doctors differ?

• Observational prowess
• Knowledge of symptoms, signs, and syndromes of disease
• Interpretive ability
• Use different labels

[1]

Creating and deploying an artificial intelligence based system with the same observational and interpretive abilities, and a consistent taxonomy would relieve the confusion of conflicting medical advices.

The literature on using artificial intelligence to algorithmically make medical diagnoses is surprisingly timid.  Usually, attempts at automatically diagnosing diseases are couched in words like, “assist” and “consult” and rarely, if ever, take full responsibility for making the diagnosis.  One author suggested a reason for this: trepidation about encroaching on the doctor patient relationship.  Usually, algorithmic diagnostic systems are tailored to diagnosing specific disease.  The task of this post is to outline a framework for diagnosing disease.

Observation

The first stage in the diagnosing disease is the recognition of simple quantitative deviations from normal.

[1]

This sentence from Cohen provides a great start to our diagnostic engine.  ”First stage” implies that a simple state machine structure for the main diagnostic engine is appropriate, with the first stage seeking to gather information and compare it to what is considered normal for that patient.

The inputs to the observation stage are the patient’s medical history, their testimony about why they are seeking diagnosis and a physical examination.  Processing the patient’s testimony can exist on a spectrum with one end anchored by using natural language processing (NLP) to parse and comprehend the patient’s testimony and the other by letting the patient choose from a drop down menu.  The former is much more sophisticated, involved, but more accurately answers the task as it allows the patient to consult the device when they don’t know what is wrong, which makes the device infinitely more usable.  The latter essentially boils down to a choose-your-own adventure approach, is trivial to implement, but does not significantly move the needle beyond simple internet searches, leaving the responsibility of diagnosis or deciding whether or not to consult a doctor in the hands of the patient.  The NLP approach, or something close to it is preferred.

The NLP lies outside of the main diagnostic engine so that different algorithms can be swapped in and out seamlessly.  The diagnostic flow should not depend on the specifics of the NLP used to parse the patient’s testimony.  The NLP outputs certain key words gleaned from the patient’s testimony ordered and weighted in terms of “importance.”  Clearly, the NLP will need to know what the observation stage of the diagnostic engine considers important, but does not need to be embedded in the diagnostic engine.

The patient’s testimony, their medical record and any vital signs recorded are collected in the observation stage and passed to the next stage.  The observation stage may and probably will order or perform certain biometric tests on the patient and wait for those results before proceeding to the next stage, Interpretation.  For example, the observation stage may perform and pass on the results of an electrocardiogram (ECG) test based on certain watchwords figuring prominently in the patient’s testimony.

The observation stage looks at physical exam measurements and compares them to expected values given the patient’s data form their medical records (age, height, weight, gender…).  It also performs additional tests based on simple keyword matching from the patient’s testimony.  The observation stage writes back to the patient’s file and passes everything onto the interpretation stage.  The observation stage is like the nurse taking your vital signs and the interpretation stage is like the doctor who comes to make a medical diagnosis.

The output of the observation stage is an up-to-date medical record of the patient.  The ability to index the patient’s record temporally is important as this allows the Interpretation stage to analyze how a particular condition has changed over time.

Interpretation

The interpretation stage can ask the patient questions directly to obtain additional information.  It is better to keep this ability directly in the interpretation stage instead of going back to the observation stage because this new information will probably be directly linked to branching in this stage of the diagnosis.  To simplify things, the query/response format in the interpretation stage does not need to be as open-ended, from a linguistic standpoint, as in the observation stage.  In the observation stage, natural language processing is needed because we may not be sure what we are trying to figure out yet, so we have to derive meaning from a very complex set of possibilities.  However, in the interpretation stage, we are seeking specific, targeted information, so the response options should be limited.  This fits well into a dropdown box, an example is below.

Have you felt chest pain?

The last three, “I don’t know”, “I don’t understand the question,” and “It’s complicated” allow the response/query routine to improve its chances for getting a useful answer out of the patient.

The AI for medical diagnosis will need to reason anatomically, that is, it will have to move from one part of the body to the other in search for interpretations that fit the existing data.  Cohen considered the “fundamental tripod of medicine” to be anatomy, physiology and pathology.  Of these, anatomy lends itself well to being described as a connectivity graph.  The AI could have different graphs for different systems in the body such as circulatory, respiratory, endocrine, …. each describing how different parts of the body are connected together.  A simple 1, 0 (connected, not connected) would probably do as the AI is simply looking for what to try next, that is, once it traverses the graph from, say, the heart to the liver, it is using “liver” as a keyword to lookup potential next steps.

What about a Bayesian approach to interpretation?  I would stay away from it because it relies on “models that are subjective, and the resulting inference depends greatly on the model selected.” [2]  We are seeking a framework that can be used for diagnosing a wide range of diseases, not tuned to specific diseases.  The framework must be general and its reasoning mathematical.  The reasoning itself cannot have a subjective foundation.

The output of the interpretation stage is a provisional medical decision about which steps to take next.  If the algorithm does not have enough information to make a decision, it does not need to do so.  It can order more tests or suggest a therapy to alleviate the condition and have the patient report back.  When the patient reports back, they start again in the observation phase.

Symbolization, Corrective Action and Evaluating an Algorithm

Even after the diagnostic engine reaches the heart of the matter, what’s wrong with the patient, there is still much more work to do.  First it must encode the diagnosis in a manner that will allow it to treat the same disease the same way, every time, from patient to patient.  The Oxford American Dictionary defines syndrome as

syndrome n. 1) a group of concurrent symptoms of a disease

If the list of syndromes for a disease is complete enough, it will uniquely identify a disease.  Cohen assesses syndromes as the site, functional disturbances and cause of disease [1]. This should be enough information to universally encode the disease.  Notice we have not included any prescriptive remedy in the encoding as this will vary from patient to patient as patients with the same disease at the same site may need different courses of action based on age, gender….

Second, we must figure out the cause of the disease and its implications.

Too frequently we have been content with a diagnostic label without investigating its implications.

[1]

Causation implies a search for antecedents, and not for the ultimate — the final — cause of all things.  This means not a single antecedent or even a chain of antecedents, but a whole interlacing network of them.

[1]

This points directly to graph theory for reasoning through the causes and implications for the disease.  Somehow we’d need to map corporal function to a manifold and be able to traverse it.  This is significantly more complicated than the simple graph traversal in the Interpretation stage, as there, we are simply seeking clues to help us along our decision making tree.  We’ve already mapped the several systems of the body to graphs: circulatory system, skeletal, respiratory system, and are simply looking them up.  In this stage we’d likely need to do the mapping on the fly based on what we figured out from the previous stage.

In addition to affixing a label to the diagnosis, the output of this stage is to recommend a corrective action.

The main aim of diagnosis, that of providing the rational basis for treatment and prognosis…

[1]

The main implementation decision to make here is: do we spend more time/energy investigating causes and implications and make the treatment recommendation and prognosis estimate simpler or vice versa.  For instance, if the algorithm is good at figuring out causation and implication, maybe the treatment and prognosis can be a simple look up table.  If causation/implication is simple, then we’ll want to do something more complicated for treatment/prognosis.  I prefer the former.  Because they are so tightly coupled, causation/implication and prognosis/treatment I consider them part of the same stage of diagnosis, even though they may have separate artificial intelligence approaches.

Evaluating a Medical Diagnosis Algorithm

‘…common sense pressed for time accepts and acts on acceptance.’  We physicians are often confronted by a situation in which we have to give a provisional verdict on the admittedly inadequate available evidence.

[1]

For any algorithm, execution time will be crucial.  The algorithm will have to provide feedback quickly to the patient, even if it does not have a final diagnosis.  The user experience aspect of keeping the patient informed as the algorithm works through its reasoning will help the patient become comfortable with seeking  diagnosis from a machine, as opposed to a human doctor.  Time is of the essence; in fact, the algorithm should not get bogged down in spending clock cycles on getting every corner case right in exchange for reaching most common conclusions quickly.

The New England Journal of Medicine published the results of case records presented to clinicians and discussant groups and whether or not they were able to correctly diagnose the disease in the case studies.  Of the 43 case studies, individual clinicians were correct 65% of the time and the discussant groups were correct 80% of the time.  The study asked the participants to assess the confidence level of their diagnosis as well.

 Clinicians Discussant Correct, definite 23 29 Correct, tentative 5 6 Total 43 43

[3]

It seems that being right 80% is good enough, at least it was the state of the art in 1969.  That’s another important thing to remember when testing medical diagnostic AI: what are we comparing it against?  If medical diagnostic AI can approach 80% success rate then it will be a viable alternative to seeing a doctor.  Even at roughly 50% success rate, its a reasonable alternative.  For some purposes such as the Tricorder X-Prize, this should be good enough since the potential use of the Tricorder X-Prize device would be to help people decide if they should go see a doctor.

References

[1] Cohen, Henry. “The Nature, Method and Purpose of Diagnosis,” The Skinner Lecture, 1943. Cambridge, UK: University Press, 1943.

[2] Hogg, Robert and Allen Craig.  Introduction to Mathematical Statistics.  5th ed, NJ: Prentice-Hall, 1995.

[3] Case Records of the Massachusetts General Hospital (Case 30-1969).  New England Journal of Medicine.  1969; 281: 206-213.

Tags: , , , ,
Posted in Uncategorized | Comments Off

## Which Diseases to Diagnose for Tricorder X-Prize?

February 7th, 2012

The Tricorder X-Prize is a 10M competition to foster innovation in medical diagnostics. The goal of the competition is to create a medical device that can diagnose 15 diseases. The competition guidelines do not state which 15 diseases the tricoder will need to diagnose, however, it seems the competition guidelines will be refined in September. Maybe they will be announced at that point. Maybe they won’t be announced before the competition. For now, it’s fun to speculate which disease should be included. In 2008, the Center for Disease Control and Prevention did a survey of ambulatory care in the US. They summarized the most prevalent diagnoses at office visits for nearly a million participants. The most common of all diagnoses was essential hypertension.The fourth most common diagnosis was diabetes mellitus. Each of these medical conditions has a fairly well-understood decision tree for diagnosis.  Primary Diagnosis Number of Visits Percentage Essential hypertension 45,969 4.81% Routine infant or child health check 43,178 4.52% Acute upper respiratory infections, excluding pharyngitis 29,296 3.06% Arthropathies and related disorders 28,404 2.97% Diabetes mellitus 25,365 2.65% Spinal disorders 24,376 2.55% Normal pregnancy 22,140 2.32% General medical examination 20,913 2.19% Malignant neoplasms 19,770 2.07% Rheumatism, excluding back 18,757 1.96% Specific procedures and aftercare 18,372 1.92% Follow up examination 17,652 1.85% Heart disease, excluding ischemic 17,017 1.78% Gynecological examination 16,140 1.69% Otitis media and eustachian tube disorders 15,812 1.65% Disorders of lipoid metabolism 15,274 1.60% Ischemic heart disease 14,448 1.51% Chronic sinusitis 12,506 1.31% Acute pharyngitis 11,729 1.23% Allergic rhinitis 9,966 1.04% All other diagnoses 528,885 55.32% TOTAL 955,969 100.00% Table 1: Primary Diagnosis Groups from NAMCS 2008 Survey [1] My understanding — and I am not a doctor — is that hypertension is diagnosed primarily with a high blood pressure reading. You do have to make sure that the reading is repeatable and not primarily influenced by external factors, such as the presence of a doctor. Overall, it sounds like diagnosing hypertension boils down to getting consistently high blood pressure readings for the patient’s profile (gender, age, etc…). Blood pressure is not difficult too measure non-invasively — you see blood pressure monitoring machines in grocery stores. The main design consideration for the Tricorder competition would be is there an even less non-invasive way to do it? One that does not involve requiring the patient to strap a band around themselves. Even using a traditional approach, for the price of a blood pressure monitor, a device could diagnose nearly 5% of all office visits in the US. Diabetes mellitus is #5 with 2.7% of office visit diagnoses. Again, my understanding is that the decision tree is pretty simple: blood glucose readings outside of the norm for a patients profile. However, blood glucose is traditionally measured very invasively, by taking a small blood sample. While the Tricorder X-Prize guidelines do not rule out devices that use invasive techniques, they strongly encourage noninvasive techniques. In fact, a medical doctor on our board at Chesney Research, described noninvasive blood glucose monitoring to me as one of the “holy grails” of medical device technology. Since one of the stated goals of the competition is to drive sensor technology, I think diagnosing diabetes has to be one of the diseases in the competition. Another holy grail is characterizing bacterial versus viral upper respiratory tract infection. This disease is the third most prevalent diagnosis in office visits, according the NAMCS survey. Right now, there’s no real way to tell the difference other than waiting; bacterial infections tend to last 7-10 days and viral only 2. However, the course of treatment is very different for each: antibiotics for the bacteria, but not for the virus, since they do not respond to antibodies. Further down the list is heart disease, the non-ischemic variety, that is, not due to low blood volume. Heart disease is a pretty broad category. However, there are analog integrated circuits on the market aimed at measuring electrocardiogram (ECG) signals. For the price of this chip (typically around20) and the appropriate interface with the patient, a medical device could take a big step towards diagnosing heart disease.  There is also a wealth of information on the links between heart disease and hypertension and heart disease and diabetes. With an ECG, a blood pressure monitor, a glucose meter and some fancy AI, a team may be well on its way to gobbling up a significant portion of heart diseases diagnoses.  In fact, those three, hypertension, diabetes and heart disease, would get you nearly one out of every ten (9.24%) of all office visit diagnoses.

If we look at the NACMS top 20 again, and take out routine follow-ups, checkups and pregnancy, we are left with 14 diseases.  They are given in Table 2.  They accounted for nearly one third of all office visits in 2008.

 Rank Primary Diagnosis Number of Visits Percentage 1 Essential hypertension 45,969 4.81% 3 Acute upper respiratory infections, excluding pharyngitis 29,296 3.06% 4 Arthropathies and related disorders 28,404 2.97% 5 Diabetes mellitus 25,365 2.65% 6 Spinal disorders 24,376 2.55% 9 Malignant neoplasms 19,770 2.07% 10 Rheumatism, excluding back 18,757 1.96% 13 Heart disease, excluding ischemic 17,017 1.78% 15 Otitis media and eustachian tube disorders 15,812 1.65% 16 Disorders of lipoid metabolism 15,274 1.60% 17 Ischemic heart disease 14,448 1.51% 18 Chronic sinusitis 12,506 1.31% 19 Acute pharyngitis 11,729 1.23% 20 Allergic rhinitis 9,966 1.04% TABLE TOTAL 288,689 30.20% TOTAL DIAGNOSES 955,969 100.00%

Table 2: Top 14 Diseases, Including Chronic Conditions from NAMCS 2008 Survey Data

The competition’s 15 diseases will need to be diagnosed on 30 different patients and the Tricoder will be evaluated for its effectiveness and ease of use by a panel of judges.  The devices should be able to tell the patient if they need to go see a doctor or not.  These 14 diseases are a good place to start.

References

[1] National Ambulatory Medical Care Survey: 2008 Summary Tables.  The Center for Disease Control and Prevention.  http://www.cdc.gov/nchs/ahcd.htm

Tags: , , , , ,
Posted in Uncategorized | Comments Off