Imbal-OL: Online Machine Learning from Imbalanced Data Streams in Real-world IoT
Sudharsan, BharathORCID: 0000-0001-5906-113X, Breslin, John G.ORCID: 0000-0001-5790-050X and Ali, Muhammad IntizarORCID: 0000-0002-0674-2131
(2021)
Imbal-OL: Online Machine Learning from Imbalanced Data Streams in Real-world IoT.
In: 2021 IEEE International Conference on Big Data (Big Data), 15-18 December 2021, Orlando, FL, USA.
Typically a Neural Networks (NN) is trained on data
centers using historic datasets, then a C source file (model as a
char array) of the trained model is generated and flashed on IoT
devices. This standard process impedes the flexibility of billions of
deployed ML-powered devices as they cannot learn unseen/fresh
data patterns (static intelligence) and are impossible to adapt
to dynamic scenarios. Currently, to address this issue, Online
Machine Learning (OL) algorithms are deployed on IoT devices
that provide devices the ability to locally re-train themselves -
continuously updating the last few NN layers using unseen data
patterns encountered after deployment.
In OL, catastrophic forgetting is common when NNs are
trained using non-stationary data distribution. The majority of
recent work in the OL domain embraces the implicit assumption
that the distribution of local training data is balanced. But the
fact is, the sensor data streams in real-world IoT are severely
imbalanced and temporally correlated. This paper introduces
Imbal-OL, a resource-friendly technique that can be used as
an OL plugin to balance the size of classes in a range of data
streams. When Imbal-OL processed stream is used for OL, the
models can adapt faster to changes in the stream while parallelly
preventing catastrophic forgetting. Experimental evaluation of
Imbal-OL using CIFAR datasets over ResNet-18 demonstrates
its ability to deal with imperfect data streams, as it manages
to produce high-quality models even under challenging learning
settings
Metadata
Item Type:
Conference or Workshop Item (Paper)
Event Type:
Conference
Refereed:
Yes
Uncontrolled Keywords:
IoT Devices; TinyML; Online Learning; Imbalanced Data; Class Balancing