Abstract
Modern software systems produce vast amounts of logs, serving as an essential resource for anomaly detection. Artificial Intelligence for IT Operations (AIOps) tools have been developed to automate the process of log-based anomaly detection for software systems. Three practical challenges are widely recognized in this field: data labeling costs, evolving logs in dynamic systems, and adaptability across different systems. In this paper, we propose CroSysLog, an AIOps tool for log-event level anomaly detection, considering these challenges. Following prior approaches, CroSysLog uses a neural representation approach to gain a nuanced understanding of logs and generate representations for individual log events accordingly. CroSysLog can be trained on source systems with sufficient labeled logs from open datasets to achieve robustness, and then efficiently adapt to target systems with a few labeled log events for effective anomaly detection. We evaluate CroSysLog using open datasets of four large-scale distributed supercomputing systems: BGL, Thunderbird, Liberty, and Spirit. We used random log splits, maintaining the chronological order of consecutive log events, from these systems to train and evaluate CroSysLog. These splits were widely distributed across a one/two-year span of each system's log collection duration, capturing the evolving nature of the logs in each system. Our results show that, after training CroSysLog on Liberty and BGL as source systems, CroSysLog can efficiently adapt to target systems Thunderbird and Spirit using a few labeled log events from each target system, effectively performing anomaly detection for these target systems. The results demonstrate that CroSysLog is a practical, scalable, and adaptable tool for log-event level anomaly detection in operational and maintenance contexts of software systems.
Original language | English |
---|---|
Title of host publication | Proceedings - 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2025 |
Number of pages | 11 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Publication date | 2025 |
Pages | 454-464 |
ISBN (Electronic) | 979-8-3315-3510-0 |
DOIs | |
Publication status | Published - 2025 |
MoE publication type | A4 Article in conference proceedings |
Event | IEEE International Conference on Software Analysis, Evolution and Reengineering - Montreal, Canada Duration: 4 Mar 2025 → 7 Mar 2025 Conference number: 32 |
Publication series
Name | European Conference on Software Maintenance and Reengineering proceedings |
---|---|
ISSN (Electronic) | 2640-7574 |
Fields of Science
- aiops
- anomaly detection
- cross-system
- log analysis
- meta learning
- transfer learning
- 113 Computer and information sciences