Active Learning for Rare Event Detection with Cost-Sensitive Acquisition on Streaming Logs
DOI:
https://doi.org/10.71465/mrcis160Keywords:
Active Learning, Anomaly Detection, Log Analysis, Streaming Data, Cost-Sensitive LearningAbstract
The exponential growth of modern distributed systems has led to a corresponding explosion in system logs, which serve as the primary source of information for system observability and failure diagnosis. However, detecting rare events—such as critical security breaches or catastrophic system failures—within these massive, high-velocity data streams presents a formidable challenge. Traditional supervised learning approaches fail due to the scarcity of labeled anomalies, while unsupervised methods often suffer from high false-positive rates that induce alert fatigue among operators. Active Learning (AL) offers a promising avenue by querying human experts to label only the most informative instances; yet, existing AL frameworks typically assume a uniform cost of annotation and often neglect the temporal drift inherent in streaming data. This paper proposes a novel framework: Cost-Sensitive Streaming Active Learning (CS-StreamAL). Our approach integrates a dynamic budget management system with a hybrid query strategy that balances uncertainty, diversity, and annotation cost. We introduce a utility function that weighs the information gain of a query against the operational cost of interrupting a human analyst. Extensive experiments on large-scale public log datasets (HDFS, BGL, and Thunderbird) demonstrate that CS-StreamAL achieves superior detection performance for rare events while reducing the labeling budget by approximately 40% compared to state-of-the-art uncertainty sampling methods.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
All articles published in the Multidisciplinary Research in Computing Information Systems are licensed under an open-access model. Authors retain full copyright and grant the journal the right of first publication. The content can be freely accessed, distributed, and reused for non-commercial purposes, provided proper citation is given to the original work.
