HIKARI-2021 bridges the critical gap between lab-grade intrusion detection datasets and the encrypted reality of modern network attacks.
Here's the dirty secret of cybersecurity AI: most intrusion detection systems are trained on unencrypted network traffic. But in 2024, over 95% of web traffic is encrypted. The models powering your organization's defenses learned to spot threats in a world that no longer exists. HIKARI-2021 was built to fix this fundamental mismatch.
Researchers at a Japanese university captured real network traffic — the mundane hum of actual internet use — then layered in synthetic attacks using encrypted channels. The result is a dataset that mirrors what security systems actually face: threats hiding inside the same TLS tunnels as legitimate traffic. Brute force attacks, web exploits, and backdoor communications, all encrypted.
The dataset has been downloaded over 21,000 times by security teams and researchers across 80+ countries. Its impact extends beyond academia — commercial IDS vendors have used HIKARI-2021 to benchmark their detection engines against encrypted threats, revealing blind spots that older datasets simply couldn't expose.
Breakdown of malicious traffic types in the HIKARI-2021 dataset
ML model performance drops significantly when traffic is encrypted
dataset · 2022 · CC BY 4.0
HIKARI-2021 enables a new generation of IDS models that can detect threats without decrypting traffic — preserving privacy while maintaining security. This is the direction enterprise security must move.
By providing a realistic encrypted traffic benchmark, the dataset allows fair comparison between detection approaches. Papers using HIKARI-2021 are producing more reproducible, real-world-relevant results.
As regulations like GDPR make deep packet inspection legally fraught, security systems must learn to work with encrypted flows. This dataset trains models for that constrained reality.
Share this story