Active Learning for Rare Event Detection with Cost-Sensitive Acquisition on Streaming Logs

Authors

  • Margaret Gonzalez School of Computing and Information Systems, University of Melbourne, Melbourne VIC 3010, Australia Author
  • Jun Chen School of Computing and Information Systems, University of Melbourne, Melbourne VIC 3010, Australia Author

DOI:

https://doi.org/10.71465/mrcis160

Keywords:

Active Learning, Anomaly Detection, Log Analysis, Streaming Data, Cost-Sensitive Learning

Abstract

The exponential growth of modern distributed systems has led to a corresponding explosion in system logs, which serve as the primary source of information for system observability and failure diagnosis. However, detecting rare events—such as critical security breaches or catastrophic system failures—within these massive, high-velocity data streams presents a formidable challenge. Traditional supervised learning approaches fail due to the scarcity of labeled anomalies, while unsupervised methods often suffer from high false-positive rates that induce alert fatigue among operators. Active Learning (AL) offers a promising avenue by querying human experts to label only the most informative instances; yet, existing AL frameworks typically assume a uniform cost of annotation and often neglect the temporal drift inherent in streaming data. This paper proposes a novel framework: Cost-Sensitive Streaming Active Learning (CS-StreamAL). Our approach integrates a dynamic budget management system with a hybrid query strategy that balances uncertainty, diversity, and annotation cost. We introduce a utility function that weighs the information gain of a query against the operational cost of interrupting a human analyst. Extensive experiments on large-scale public log datasets (HDFS, BGL, and Thunderbird) demonstrate that CS-StreamAL achieves superior detection performance for rare events while reducing the labeling budget by approximately 40% compared to state-of-the-art uncertainty sampling methods.

Downloads

Published

2025-12-30