This is what some the world’s largest banks of malware look like stacked as hard drives

TL;DR

Researchers have estimated that VirusTotal’s malware archive, totaling 31 petabytes, would reach about 2,645 feet if stacked as hard drives, highlighting the enormous scale of cyber threat data. This comparison underscores the vast amount of malware data collected for cybersecurity research.

Cybersecurity researchers have estimated that VirusTotal’s malware sample archive, totaling approximately 31 petabytes, would reach about 2,645 feet if stored as stacked 1-terabyte hard drives, illustrating the enormous scale of collected malware data.

Malware research group vx-underground reported in a post on X (formerly Twitter) that its collection of malware source code amounts to about 30 terabytes. Separately, VirusTotal, a widely used online malware scanning service, stated that its repository contains roughly 31 petabytes of malware samples contributed by users. Both figures are considered approximate but reflect the vast volume of data accumulated for cybersecurity analysis. Using standard 1-terabyte hard drives, vx-underground’s 30 terabytes would be represented by 30 drives stacked about 30 inches high, roughly 2.5 feet. In contrast, VirusTotal’s 31 petabytes would be approximately 31,744 such drives, stacking to about 2,645 feet, slightly shorter than the tallest building in the world, the Burj Khalifa. These comparisons aim to provide a tangible sense of the scale of malware repositories, which are critical for training detection models and understanding evolving cyber threats.

Why It Matters

This scale of malware data highlights the enormous resources cybersecurity organizations dedicate to threat detection and analysis. The vast volume underscores the challenge of managing and analyzing malware samples at such a scale, which is essential for developing effective defenses against increasingly sophisticated cyberattacks. It also emphasizes the importance of data sharing and collaboration in cybersecurity efforts, as larger repositories enable more comprehensive threat intelligence.

UnionSine 1TB Ultra Slim Portable External Hard Drive HDD-USB 3.0 for PC, Mac, Laptop, PS4, Xbox one, Xbox 360-(Black)

UnionSine 1TB Ultra Slim Portable External Hard Drive HDD-USB 3.0 for PC, Mac, Laptop, PS4, Xbox one, Xbox 360-(Black)

【Upgraded version】 – The mirror logo strip is combined with the striped non-slip design. The rounded corners of…

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background

Both vx-underground and VirusTotal are prominent sources of malware samples used by cybersecurity professionals, researchers, and AI training models. vx-underground claims to have the largest collection of malware source code, while VirusTotal aggregates samples submitted by users worldwide. The size of these repositories has grown significantly over recent years, reflecting the increasing volume of malware and cyber threats. Prior efforts have focused on analyzing these datasets to improve detection algorithms and understand attack patterns, but the sheer scale remains a challenge for data management and security.

“The comparison of malware repositories to landmarks like the Eiffel Tower and Burj Khalifa helps visualize just how massive these datasets are, emphasizing the scale of modern cybersecurity efforts.”

— Zack Whittaker, TechCrunch security editor

“Our repository contains about 31 petabytes of malware samples contributed by users worldwide.”

— Bernardo Quintero, founder of VirusTotal

Western Digital WD 5TB Elements Portable External Hard Drive for Windows, USB 3.2 Gen 1/USB 3.0 for PC & Mac, Plug and Play Ready - WDBU6Y0050BBK-WESN

Western Digital WD 5TB Elements Portable External Hard Drive for Windows, USB 3.2 Gen 1/USB 3.0 for PC & Mac, Plug and Play Ready – WDBU6Y0050BBK-WESN

Plug-and-play expandability

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It remains unclear how these estimates will evolve as malware repositories continue to grow. The exact physical size of the datasets may vary due to storage efficiencies and compression, and the figures are approximate. Additionally, the actual utility of such vast datasets depends on effective data management and analysis tools, which are still under development.

Seagate IronWolf Pro 24TB Enterprise NAS Internal HDD Hard Drive – CMR 3.5 Inch SATA 6Gb/s 7200 RPM 512MB Cache for RAID Network Attached Storage, Rescue Services (ST24000NT002)

Seagate IronWolf Pro 24TB Enterprise NAS Internal HDD Hard Drive – CMR 3.5 Inch SATA 6Gb/s 7200 RPM 512MB Cache for RAID Network Attached Storage, Rescue Services (ST24000NT002)

High Performance: All-CMR (conventional magnetic recording) portfolio enables consistent, industry-leading 24×7 performance allowing users to access data anytime,…

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What’s Next

Next steps include improving data storage and analysis capabilities to handle these enormous datasets efficiently. Researchers and cybersecurity firms will likely continue expanding and sharing malware repositories, while developing more sophisticated AI tools for threat detection and response.

Amazon

large capacity data storage devices

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

How do malware repositories grow so large?

They grow as cybersecurity organizations and researchers collect and share malware samples from various sources, including infected devices, honeypots, and user submissions, to better understand and defend against cyber threats.

Why compare malware datasets to landmarks?

The comparison helps visualize the enormous scale of these repositories in a tangible way, making it easier for the public and professionals to grasp the size of cyber threat data.

What are the challenges of managing such large datasets?

Handling, storing, and analyzing petabyte-scale data requires advanced infrastructure, significant computational resources, and efficient algorithms, which remain ongoing challenges in cybersecurity.

Will the size of malware datasets continue to grow?

Yes, as cyber threats evolve and new malware variants emerge, these repositories are expected to expand further, necessitating ongoing investments in data management and analysis tools.

You May Also Like

Valorant’s new Vanguard update seems to be bricking cheaters’ PCs. Riot’s response? “Congrats on your $6k paperweights”

Riot Games confirms Vanguard anti-cheat does not damage PCs, addressing recent claims of bricking. Details on the update and ongoing concerns explained.

The advertising cartel coming to your web browser

Meta, Google, Apple, and Mozilla are collaborating on a built-in ad measurement system in browsers, raising privacy and competition concerns.

A 0-click exploit chain for the Pixel 10

Researchers reveal a zero-click exploit chain for Pixel 10, involving Dolby and VPU driver vulnerabilities, with patches issued in early 2026.

H.R. 6028 would fundamentally change the U.S. Copyright Office

The House approved H.R. 6028, a bill that would remove the Copyright Office from Library of Congress oversight and make the Register a presidential appointee, raising concerns about increased politicization.