Every once in a while, there appears a paper that is so good, so well-written and so packed with insights, that I would read it a few times… that I wished I had written it myself. That happened when I read this perspective paper by Jonathan Chen and Steven Asch.
Normally I’d tweet a link to it, or may be even add a figure from it (if I expect people to learn from it without them having to spend time), and there ends the matter. Sometimes I might add few more tweets, if I couldn’t pack it all in a single tweet. But this piece is so good, the two pages are well worth a read. I loved almost every sentence of it, so I feel like I should collect my favourite insights from this paper here, to avoid spamming my lovely followers with a ton of tweets.
If you wanted to learn only a single thing, then it has already been given to you in the title, but explained better in this quoted excerpt from the paper:
“With machine learning situated at the peak of inflated expectations, we can soften a subsequent crash into a “trough of disillusionment” by fostering a stronger appreciation of the technology’s capabilities and limitations.”
The references peak of inflated expectations and trough of disillusionment come from the Gartner’s 2016 report on “Hype Cycle for Emerging Technologies”, which provides this packed and insightful illustration:
That would be the gist of the perspective paper. However, if you are seriously interested in the fields of healthcare and machine learning, I feel you would really appreciate the following insights (emphases are all mine):
- “Given that the practice of medicine is constantly evolving in response to new technology, epidemiology, and social phenomena, we will always be chasing a moving target” to allude us to the point that prediction in the medical domain is even harder than typical ML studies you encounter in other domains.
- In reference to some fundamental limitations of ML:
- “Yet if the future will not necessarily resemble the past, simply accumulating mass data over time has diminishing returns”
- “no amount of algorithmic finesse or computing power can squeeze out information that is not present”
- “Machine-learning approaches are powered by identification of strong, but theory-free, associations in the data. Confounding makes it a substantial leap in causal inference to identify modifiable factors that will actually alter outcomes“
- On relevancy, recency and size of data:
- “Research into decision-support algorithms that automatically learn inpatient medical practice patterns from electronic health records reveals that accumulating multiple years of historical data is worse than simply using the most recent year of data” implying bigger volume of data may not always be better!
- “When our goal is learning how medicine should be practiced in the future, the relevance of clinical data decays with an effective half-life of about 4 months“
- Choosing your goals and operating points:
- “Though no method can precisely predict the date you will die, for example, that level of precision is generally not necessary for predictions to be useful.”
You may already have these insights as gut feelings, if you had been working at the cross-section of machine learning and clinical data for a while, but having them so clearly put together in a broader perspective would produce a must read like this. Add it to your reading list.
The full citation for the paper is: Chen, Jonathan H., and Steven M. Asch. “Machine Learning and Prediction in Medicine—Beyond the Peak of Inflated Expectations.” New England Journal of Medicine 376.26 (2017): 2507-2509.