Combining Rule-Based System and Machine Learning to Classify Semi-natural Language Data.

Zafar Hussain, Jukka K Nurminen, Tommi Mikkonen, Marcin Kowiel

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Abstrakti

Command-line commands form a special kind of seminatural language. Analyzing their structure and classifying them is a useful approach in the field of cyber security to detect anomalous commands used by malicious actors. Without any contextual knowledge, commands' analysis is a difficult task as similar-looking commands might be performing different tasks, and commands with different aliases might be performing the same tasks. To understand command-line commands' structure and their syntactic and semantic meanings, we created a rule-based system based on expert opinions. Using this system, we classified command-line commands into similar and not-similar classes. This rule-based system transformed command-line commands' data into a binary classified form. We trained three machine learning models (a logistic regression document classifier, a deep learning document classifier, and a deep learning sentence-pair classifier) to learn the set of rules created in the rule-based system. We used Mathews Correlation Coefficient (MCC) score for the models' performance comparison. The logistic regression model shows an MCC score of 0.85, whereas both the Deep Learning (DL) models scored above 0.98. DL document classifier and DL sentencepair classifier achieved an accuracy of 0.943 and 0.983 respectively on unseen data. Our proposed hybrid approach solves the complex problem of classifying semi-natural language data. This approach can be used to create a domain-specific set of rules, and classify any semi-natural language data into multi-classes.
Alkuperäiskielienglanti
OtsikkoIntelligent Systems and Applications. IntelliSys 2022
ToimittajatKohei Arai
Sivumäärä18
KustantajaSpringer, Cham
Julkaisupäivä31 elok. 2022
Sivut424–441
ISBN (painettu)978-3-031-16071-4
ISBN (elektroninen)978-3-031-16072-1
DOI - pysyväislinkit
TilaJulkaistu - 31 elok. 2022
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaIntelligent Systems Conference 2022 (IntelliSys 2022) - Amsterdam, Alankomaat
Kesto: 1 syysk. 20222 syysk. 2022

Julkaisusarja

NimiLecture Notes in Networks and Systems (LNNS)
KustantajaSpringer, Cham
Vuosikerta542
ISSN (elektroninen)2367-3389

Tieteenalat

  • 113 Tietojenkäsittely- ja informaatiotieteet

Siteeraa tätä