Joint Research Projects

Study In Thailand

Development of Machine Learning Algorithms to Predict Resistant Bacteria and Assist Empiric Antibiotic Prescribing among Patients with Suspected Infection





Antibiotic resistance is a major threat to public health. Substantial increases in antibiotic resistance rates have raised concerns and bleak estimates as to the future of effective antibiotic treatment. The emergence of antibiotic resistance is mainly shaped by the evolutionary forces of genetic variation (ie, mutations and horizontal gene transfer) and selection exerted by antibiotic usage. Correspondingly, antibiotic consumption has been repeatedly correlated with increases in antibiotic resistance rates. However, decreases in antibiotic consumption can revert bacterial populations to antibiotic susceptibility, likely due to the fitness cost that antibiotic resistance incurs. Hence, a straightforward intervention to reduce the burden of antibiotic resistance is to decrease antibiotic consumption, for example, by reducing inappropriate antibiotic use during empiric therapy.

Two main types of errors occur during empiric therapy:

  • The Prescription of Inefficient Antibiotics

The antibiotics prescribed do not clear the bacterial pathogen due to its resistance to them.

  • The Prescription of Antibiotics with Coverage that is Too Broad

Antibiotics with lower coverage would suffice to treat the infection.



Regularization is the process of adding information in order to solve an ill-posed problem or to prevent overfitting. The regularization term, or penalty, imposes a cost on the optimization function to make the optimal solution unique. The work flow usually is, that one tries a specific regularization and then figures out the probability density that corresponds to that regularization to justify the choice. In machine learning, the data term corresponds to the training data and the regularization is either the choice of the model or modifications to the algorithm. It is always intended to reduce the generalization error, i.e. the error score with the trained model on the evaluation set and not the training data. One of the earliest uses of regularization is related to the method of least squares.

  • L1-regression (least absolute shrinkage and selection operator; also Lasso or LASSO)

A regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the resulting statistical model.



In this study, we use L1 regularization logistic regression as our method. We take fifteen features as input data of model, and we utilize these features to predict (1) which bacteria group the patient belongs to (2) whether the patient is resistant for Amikacin when the bacteria group retain one of these groups, such as Acinetobacter spp., Escherichia coli, Gram negative, Klebsiella spp., Pseudomonas aeruginosa, Staphylococcus spp., and Staphylococcus aureus. Amikacin is an antibiotic medication that are commonly used for several bacterial infections. Therefore, we utilize machine learning to solve this classification problem due to powerful applicable ability of machine learning technique. Predicting which bacteria group the patient belongs to is a multiclass classification while predicting whether the patient is resistant for Amikacin that belongs to binary classification (resistant or susceptibility).

First, for task 1 the experimental results are shown in Table I. We can notice that whatever the penalty weight of L1 is, and the results are similar.

Table I 

Predicting which bacteria group the patient belongs to
L1 Penalty weight Training Accuracy Test Accuracy
0.001 0.482 0.477
0.01 0.482 0.477
0.1 0.482 0.477
1 0.482 0.477
10 0.481 0.479
100 0.482 0.481

In addition, the numerical results of task 2 are illustrated with Fig. 1., the x-coordinate represents bacteria group and y-coordinate represents predicting accuracy.

Fig. 1. Binary classification for predicting whether the patient is resistant for Amikacin



A major possible improvement of empiric therapy can stem from use of large medical datasets in conjunction with machine learning (ML) algorithms. This approach has been gaining traction lately and is recognized as likely being a part of future treatment in many medical fields. Various studies have identified risk factors for antibiotic-resistant infections based on patient comorbidities, demographics, previous treatments, and other patient characteristics.

When treating patients for infection, the physician must balance the survival benefit that may result from the prompt initiation of effective antibiotic therapy against the risk of adverse side effects, complications, potential for the development of resistance and the increased costs that may follow the use of unnecessary broad-spectrum agents. The modern health system has access to enormous amounts of information about each patient. Recent studies have demonstrated the ability of open-source machine-learning algorithms to use such data in the prediction of antibiotic resistance in different clinical settings.