Ragon Institute

Machine Learning: the Right Tool for the Right Diagnostic Test

Ragon faculty member Musie Ghebremichael, PhD, recently published a BMC paper entitled “A comparison of machine learning techniques for classification of HIV patients with antiretroviral therapy-induced mitochondrial toxicity from those without mitochondrial toxicity.” Additional authors include Jong Lee, Elijah Paintsil, and Vivek Gopalakrishnan, of University of Massachusetts, Lowell; Yale University; and The Johns Hopkins University, respectively. 

 

Antiretroviral therapy (ART) is an effective way to control HIV. However, a potential serious side effect is mitochondrial toxicity, in which a cell component known as the mitochondria, the “powerhouse” of the cell, become damaged or reduced in number. This can result in a number of symptoms, varying in severity from numbness in fingers and toes to organ failure and death. 

 

Currently, there is no clear diagnostic procedure for mitochondrial toxicity. Diagnosis is made through a combination of reported symptoms, assays, and tissue biopsies, of which the latter two are considered cost-prohibitive and invasive. Not all ART patients develop mitochondrial toxicity and there is no way to predict who is at risk.

 

The study uses machine learning in a cohort of 50 HIV+ people on ART, 25 with mitochondrial toxicity and 25 without, to identify potential markers for diagnosis and predictive tests of mitochondrial toxicity. In particular, intracellular ATP concentration is identified as a potential diagnostic test, using developed and well-established assays already available.

 

 

This paper also serves as a case study for the use of machine learning in biological datasets. Ghebremichael, the Ragon’s faculty statistician, describes machine learning algorithms commonly available in statistics packages and the process of deciding which algorithm is the best fit for analyzing his dataset. This open access paper was written for biologists interested in using machine learning to analyze their own data and includes careful analysis of each machine learning algorithm for datasets of varying sample sizes, distributions, and correlation structure alongside the authors’ example of applying the algorithms to their own data.