The Covid-19 pandemic has spread quickly, making identification
of the virus critically important in assisting overburdened healthcare
systems. Numerous techniques have been used to identify Covid-19,
of which the Polymerase chain reaction (PCR) test is the most common.
However, obtaining results from the PCR test can take up to two days.
An alternative is to use X-ray images of the subject’s chest area as inputs
to a deep learning neural networks algorithm. The two problems
with this approach are the choice of architecture and the method used
to deal with the imbalanced data. In this study a comparative analysis of
a standard convolutional neural network (CNN) and a number of transfer
learning algorithms with a range of imbalanced data techniques was
conducted to detect Covid-19 from a data set of chest x-ray images. This
data set was an amalgamation of two data sets extracted from the Kaggle
Covid-19 open source data repository and non-Covid illnesses taken
from the National Institute of Health. The resulting data set was had
over 115k records and 15 different type of findings ranging from no-illness
to illnesses such as Covid-19, emphysema and lung cancer. This study
addresses the problem of class imbalance on the largest data set used
for x-ray detection of Covid-19 by combining undersampling and oversampling
methods. The results showed that a CNN model in conjunction
with these random over and under sampling methods outperformed all
other candidates when identifying Covid-19 with a F1-score of 93%, a
precision of 90% and a recall of 91%.
Item Type:
Conference or Workshop Item (Paper)
Event Type:
Conference
Refereed:
No
Uncontrolled Keywords:
Covid-19; Oversampling; Undersampling; CNN; transfer learning; chest x-ray