Predicting Active Tuberculosis with AI

Project Findings, Challenges, and Next Steps

In-Woo Park
3 min readFeb 9, 2024

The problem: blindly prescribing TB antibiotics

As with many health conditions, most deaths caused by tuberculosis (TB) are preventable. Through antibiotic treatment plans, TB is known to have a high global treatment success rate of 85% (World Health Organization, 2020). But why are millions still dying?

Tuberculosis has two main stages: latent (asymptomatic) and active (symptomatic). 1/4 of the global population is infected with latent TB infections (LTBI) but only 10% of LTBI actually progress to active TB infections (ATBI).

However, physicians are not able to accurately determine which LTBIs would progress to ATBI; therefore, ~90% of patients are prescribed unnecessary antibiotic treatments, often coming with severe side effects.

Currently, there are few antibiotics available to treat LTBI to kill the “sleeping TB” in the patient, aiming to prevent progression to an ATBI. However, these antibiotics have risks of severe side effects (e.g., 10% of patients experience anemia or lymphopenia as a result of Rifapentine treatment — 10% is a dangerously high incidence rate for drug reactions) and again, are mostly unnecessary for use.

This is how we’re currently treating TB, and it’s only breeding more antibiotic resistance and failing to actually address the global epidemic.

What are we currently doing about this?

Most current solutions can only detect TB; they cannot predict TB progression from LTBIs to ATBIs. For example, Qiagen’s standard IGRA test or’s AI algorithm to analyze medical imaging are both great methods of detecting the presence of latent TB in a patient but still put the responsibility on the physician to decide whether or not an antibiotic prescription is needed. In other words, these “solutions” do not actually solve the problem.

The only solution that attempts TB prediction is McGill University’s online interpreter, TSTin3D. This tool takes in a holistic collection of TB biomarkers and variables to provide a risk assessment of how likely one is to develop an ATBI.

However, the interpreter follows a fixed formula and does not consider how certain variables will affect other biomarkers (e.g., HIV co-infection affects TST thresholds). Further, the biomarkers and variables needed for the interpreter are difficult to obtain. For example, the interpreter asks for a silicosis diagnosis, which requires X-ray and CT scans that physicians tend to avoid even in 1st world countries like Canada. This deems the interpreter as both inaccurate and difficult to use for healthcare providers.

My revised solution: AI-driven analysis

After closely analyzing existing solutions and discussing their gaps with physicians and nurses, it became clear that a different approach is needed.

My project aims to predict which LTBIs are most likely to progress to ATBIs by utilizing holistic AI-driven analysis of multiple biomarkers and variables. This will be executed as a SaMD (software-as-medical-device) so practitioners can access and input biomarker data without having to integrate any software into the system beforehand.

Points of impact:

  • Only patients in need of treatment will be prescribed medication
  • More patients will likely start + finish treatment knowing their actual need for treatment (currently, only 36.0% of patients actually complete antibiotic treatment for LTBI)
  • Results in fewer hospital admissions due to ATBI
  • Yielding risk levels of TB disease progression allows CDCs and healthcare systems targeted allocation for resources and care strategies
  • Appeals to multiple stakeholders: pharmaceutical companies, medical laboratories, healthcare providers

My current challenge: insufficient data to train the AI algorithm

After determining the biomarkers needed for this project to work, I sought out clinical data to train the AI model. However, after searching public and private databases (including the CDC and WHO), it became clear that the holistic collections of biomarker and variable data that I need do not exist.

My next step: creating an online portal to collect data

As conventional research initiatives require millions of dollars, personal certifications, and a certified lab, I will collect the data differently.

I will be building an online data collection portal for research institutions, physicians, nurses, and technicians to input the biomarker data I need for this project along with some heavy marketing. In doing so, I can collect the data needed to achieve AI-driven active tuberculosis prediction.



In-Woo Park

17yo | Bio-Researcher | TKS Innovator | Pharmacy Assistant | Human Longevity