Technical Articles

Review Cloudmersive's technical library.

Watch Out for Archives: The Importance of Content Verification & Restriction
9/13/2023 - Brian O'Neill


When we think about the common document types used to host & disguise custom content threats, PDF and Office file formats (DOCX, XLSX, etc.) are probably the first to come to mind. The last few decades of cyber-attack history would support that idea: malicious links/objects, macros, and similar file-based threat types are frequently buried within PDF and OpenXML file structure, designed to quickly access and compromise our device the moment the document is downloaded. Attackers rely on our familiarity with these everyday formats to facilitate tricking us into accessing and downloading their contents.

document tree stemming from folder

As our cybersecurity policies have evolved over years of experience and research to detect & mitigate these expected threat vectors, however, even more insidious malware delivery methods have quietly grown in popularity. In recent years, archive file formats (predominantly ZIP or RAR) have become increasingly popular hosts for concealing virus and malware threats. According to threat research conducted over a three-month period in 2022, archive formats made up more than 40% of malware delivery attempts, exceeding the utilization of Office file formats during that timeframe.

Why Archives?

The benefits of using archive formats like ZIP or RAR to smuggle files infected with viruses and malware are clear cut. In fact, they aren’t all that different from the benefits we gain from using archive formats to consolidate and share legitimate files. Archive compression algorithms allow us to share dozens of large files across a network at once, and powerful encryption & password protection measures ensure unwanted third parties can’t access our sensitive material.

Threat actors effectively want the same things when they deliver infected files, with one key difference: instead of using encryption to hide their content from unwanted eyes, they use encryption to hide malware-infected files from signature-based scanning methods. Virus & malware scanning solutions can’t detect what they can’t decrypt, and by the time these compressed malicious files are decrypted and unzipped on a device, there may be no malware scanning policy in place to detect the infected documents. Unzipping the archive file itself won’t initiate a cyberattack, but eventually opening one of the compressed files will.

Further, malicious archives can themselves be stashed within other illegitimate (spoofed) files, making detection of the archive itself a considerable challenge. Using HTML, for example, a sophisticated attacker can create a very convincing spoof of a PDF document’s online viewer, and this document can be designed to retrieve & unzip a malicious archive from an external internet source once downloaded by the victim. Disguising a threat in this way creates all kinds of problems for traditional one-dimensional malware scanning policies, and it ultimately demands greater restriction of content types independent of malware signature detection. If these files successfully enter our networks undetected, our risk of falling victim to a dormant custom content cyber-attack greatly increases.

Detecting Custom Content Threats with Cloudmersive

The Advanced Scan iteration of the Cloudmersive Virus Scan API provides unique 360-degree protection against both infected files and hidden content threats, offering in-depth content verification and customizable file-type restriction policies. This means files entering a protected location can be scanned against a continuously updated list of more than 17 million virus & malware signatures, and the contents of those files can be simultaneously verified against the original file extensions, ensuring illegitimate files (such as the spoofed PDF example above) are immediately identified alongside infected ones.

In addition, dangerous file types – including HTML, Executables, Invalid Files, Scripts, Password-Protected Files, Macros, and more – can be blocked by setting custom threat detection policies in the API request body (or through your Cloudmersive Account page when using no-code solutions like Shield or Storage Protect). Unwanted file types – like ZIP and RAR – can be blocked by supplying a comma-separated list of acceptable file extensions in a plain text string. Files which fail to pass content verification checks against any custom restriction policies will receive the same CleanResult: False response as infected files.

For more information on how the Cloudmersive Virus Scan API can help protect your systems, please do not hesitate to reach out to a member of our sales team.

800 free API calls/month, with no expiration

Get started now! or Sign in with Google

Questions? We'll be your guide.

Contact Sales