Catching Risky Files: A Quick Guide to Subfiles

Technical Articles

Review Cloudmersive's technical library.

12/18/2024 - Brian O'Neill

Malicious subfile archive concept

Categorizing Risky Files

When we talk about “risky files” moving in and out of our system, we’re really referring to a broad category of file-related threats. In today’s fast-evolving threat landscape, risky files can bear malicious content ranging from known malware or virus signatures to intentional (or even accidental) file errors capable of exposing zero-day application vulnerabilities, and each can lead to equally devastating outcomes.

Files bearing viruses, for example, are well documented to replicate quickly throughout our system if left unchecked. A seemingly innocuous invalid file, on the other hand, could trigger an as-of-yet unknown memory overflow vulnerability in one of most heavily utilized file processing libraries, leading to sudden, widespread Denial of Service attack against one of our critical applications.

Identifying risky files like these – which are, ultimately, as different as they are equally dangerous – starts with applying different security policies. Viruses, for example, can often be identified by comparing their file signature against known virus & malware “families”, or by utilizing advanced heuristics and other sandboxing methods to predict potentially malicious activity. Risky invalid files can typically only be identified when we verify the file contents to ensure they rigorously conform with their expected file format standards.

Unfortunately, flagging risky files like those described above isn’t always as straightforward as matching a potential threat with an ideal file-scanning solution. Risky files are often layered and obfuscated in clever ways - particularly when they’re created by sophisticated threat actors carrying out targeted attacks (as opposed to spam attacks via email, for example). Obfuscated threats are designed specifically to bypass file scanning policies by tricking them into missing an underlying threat.

Obfuscation via Subfiles

One of the most common modern day threat obfuscation methods involves the use of malicious subfiles. Subfiles are files stored within other files, and they pose as much threat to our network as their parent file – but they’re often overlooked by traditional threat scanning policies. It’s increasingly important to address subfile threats given the complexity of modern file structures (i.e., the ability to store malicious files or link document viewers to malicious payloads).

Subfile Threat Example

We can imagine, for example, that an attacker has crafted an innocuous Microsoft Word (DOCX) file. This file contains several embedded images, and the attacker has used stenographic techniques to hide malicious code within the pixels of one image. The attacker understands that most up-to-date file scanning technologies will look for embedded executable code hidden within image files, but they may never perform that scan if the image subfile isn’t addressed separately from the parent file.

In the structure of a DOCX file, images are stored as independent objects, and that means each image can be extracted and loaded independently from the original document. In this scenario, the attacker expects file scanning policies to check the parent (DOCX) file directly for threats without scanning the image subfile. They expect that the image payload will be executed automatically when the document is opened, or perhaps when the image itself is opened in an image reader by an unsuspecting client.

Subfile Categories

OLE (Object Linking & Embedding)

Microsoft Open Office files are structured as archives containing a series of XML specifications for document content, layout, and other details. The OLE feature allows document creators to store images, charts, media files, text frames, form controls, and other such content within Open Office files.

OLE subfiles, including file objects (e.g., images, videos, other Office documents, etc.) and file URLs stored within the Open Office file structure, can initiate malware downloads – or exploit zero-day application vulnerabilities – when the parent files are opened. Threat actors can hide malicious files and links within Open Office files to obfuscate them from file scanning policies.

Malicious files can also be stored within an Open Office archive without being directly embedded within the XML document specifications. It’s relatively straightforward to convert Open Office files to .zip archives, place malicious content within the archive, and then convert the archive back to an Open Office format. This isn’t exactly OLE, and it typically renders the parent file invalid – but it is another way Open Office documents can be used in threat obfuscation.

While OLE has been around for decades, modern file interconnectivity has greatly increased the scale and impact OLE attacks can have. In the last decade, a variety of successful OLE attacks (both targeted and spam email attacks) have been carried out, resulting in ransomware and spyware downloads on servers with extremely sensitive and valuable information.

MSG Files

MSG is a file format Outlook offers to enable the storage of individual email messages outside of Outlook (to be clear, it is NOT the format Outlook uses to send emails from one server to another). MSG files can carry message content, HTML formatting (where relevant), file attachments, links, and other metadata.

MSG files are often used to share email messages outside of Outlook. Because MSG files can store links and attachments, they represent a significant subfile obfuscation attack vector. Each subfile should be identified and scanned before an MSG file enters or leaves a network.

MSG files carrying malicious links (e.g., phishing links or remote malware downloads) and malicious documents (e.g., DOCX files with malicious macros) have been used in successful attacks within the last decade.

Cloudmersive Virus Scan

The Advanced iteration of the Cloudmersive Virus Scan API navigates subfiles and archive directories (i.e., zip files) to scan each layer of a potential threat. Subfiles including images, documents, and URLs are scanned independently of the parent file, and subfiles attributes are returned in the Virus Scan API response body along with the parent file. This prevents threat actors from obfuscating malicious files by abstracting them away from the parent file scan.

For expert advice on properly scanning and detecting malicious subfile content, please reach out to a member of our team.

Technical Articles

Categorizing Risky Files

Obfuscation via Subfiles

Subfile Threat Example

Subfile Categories

Archives

OLE (Object Linking & Embedding)

MSG Files

Cloudmersive Virus Scan

Related

800 free API calls/month, with no expiration

API Products

Validate APIs

Natural Language Processing (NLP) APIs

Optical Character Recognition (OCR) APIs

Barcode APIs

Image and Face Recognition and Processing APIs

Virus Scan APIs

Security Threat Detection APIs

Document and Data Conversion APIs

Questions? We'll be your guide.