What Do Doctors Think Of Watson Oncology

IBM's virtual doctor makes mistakes

The artificial intelligence Watson for Oncology recommends questionable and incorrect therapy options. However, this does not put patients at risk, emphasizes the manufacturer IBM.

Doctor algorithm has suffered a setback. The artificial intelligence (AI) Watson for Oncology of the American IT company IBM advises cancer doctors in 230 hospitals worldwide on the search for the best therapy in each case. But as early as autumn 2017, the head of the cancer department at Copenhagen's Reichskrankenhaus, Leif Jensen, sharply criticized the system: If you follow Watson's tips, patients could die instead of recovering. His department stopped the experiment. Now internal IBM documents show: The executives knew much earlier, namely in the summer of 2017, that Watson often gave "unsafe and incorrect treatment recommendations" while they praised the system in the media. This is reported by Statnews, a website cooperating with the newspaper "The Boston Globe" on medical topics, which has the documents.

The advantages of the algorithms

Developers of similar algorithms firmly believe in the arrival of artificial intelligence in medicine. Their goal is to save time with high accuracy in diagnosis and therapy at the same time. "We want to relieve doctors of routine tasks such as evaluating image data," says Jaroslav Bláha. His Hamburg-based company Cellmatiq trains AI together with doctors, among other things, to detect glaucoma on images of the fundus. Recently, researchers from Heidelberg showed that algorithms correctly identified melanomas that were particularly difficult to assess as skin cancer more often than a group of dermatologists.

Another use of AI is emerging for so-called personalized medicine. This should take into account the peculiarities of each individual patient. Because hardly any cancer case is the same as another. Chemotherapy that helps one patient fails the next. Artificial intelligence sharpens the eye for subtle differences, for example it can recognize "contrast differences that are invisible to the eye on X-ray images," explains Bláha. At the University Hospital Essen, an algorithm examines image data from lung cancer patients, as the radiologist practicing there, Michael Forsting, reports. “In this way, we can predict very precisely which therapy the patient will respond to and which will not,” says Forsting.

The demands of the developers at Watson for Oncology are, however, much greater. The algorithm is designed to suggest personalized therapies for thirteen common cancers. To do this, Watson for Oncology compares knowledge from hundreds of medical journals and textbooks with the respective case. In addition, oncologists from the New York Cancer Clinic Memorial Sloan Kettering trained the system. The New York specialists showed Watson cancer cases and how they treat them. On the basis of this training and a literature research, the system now makes suggestions based on the current state of knowledge. According to IBM, this should also include therapy options that the doctor does not come up with.

Some users praise the system because it enables more informed decisions. The background provided by Watson gave him certainty that the treatment he was considering was the right one for a lung cancer patient, a doctor from Jupiter Medical Center in Florida told Statnews. Bláha, on the other hand, criticizes IBM's claim as being “extremely difficult to control”. "We train our AI for very specific subtasks," explains the computer scientist. IBM, on the other hand, is trying to simulate the entire doctor. But the doctor's feeling is essential for taking an anamnesis.

The fact that some serious errors in Watson's therapy recommendations are now being made public is grist to the critics' mill. Statnews cites the test case of a 65-year-old patient with bleeding, to whom the AI ​​recommended a drug that could be fatal if bleeding. IBM replies that this was not a real patient.

Experts see various reasons for IBM AI failures. First: Too little training data. Typically, algorithms are trained with thousands of cases per diagnosis; the New York doctors fed Watson with only a few hundred cases per cancer. In ovarian cancer there were only 106 cases. Second, doctors from different countries follow different guidelines. Also because Watson was trained at an American clinic, he decided differently than Leif Jensen's team of doctors in Copenhagen in two out of three cases. Third, no AI is better than the data it is trained with. For the training of an AI, data must be provided with expertise, for example an X-ray image with a correct diagnosis. Getting such valid data is a huge problem, said Michael Forsting at a recent panel discussion. Often pictures are labeled with the wrong diagnosis. This is another reason why experts doubt the quality of Watson's training data. In some cases, the cases were not real, but were constructed by the doctors, writes Statnews. Fictitious training examples for modeling are not uncommon, explains Steffen Konrath from the German Federal Association of AIs, at least when real data can only be obtained with immense effort. However, knowledge about the applicability of synthetic data is still in its infancy.

"Watson makes no diagnoses"

The head of IBM's Watson Health department, Deborah DiSanzo, defends the doctors' approach: The hypothetical cases are representative, she writes in a blog post. The data took into account changes in medical practice as real but historical data could not. “This tool does not make any diagnoses,” emphasizes DiSanzo. Rather, it should provide evidence for treatment options. IBM works closely with medical professionals "to ensure that Watson evolves with their needs and processes."