|
What is an XLSX File |
5/9/2025 - Brian O'Neill |
OverviewIn this article, we'll learn what XLSX files are, understand the Open XML file structure, and examine the risks associated with uploading and processing Excel XLSX spreadsheets in web applications. We'll highlight the most common ways attackers embed threats within them, and draw a distinction between XLSX and macro-enabled XLSM threats. At the end, we'll discuss how the Cloudmersive Advanced Virus Scan API mitigates XLSX-based threats. Understanding XLSX File StructureSince its introduction in 2007, XLSX has been the default format for spreadsheets created in Microsoft Excel. Unlike its predecessor XLS – a binary container format – XLSX format is based on the same Open XML standard all modern MS Office documents use. It’s essentially a ZIP archive containing a structured collection of XML documents and related files; this includes worksheets, styles, shared strings, and relationships which collectively define the document’s content and formatting. Because they’re ZIP-based containers, XLSX files can be converted to ZIP (using the .zip file extension) and unzipped to manually inspect – or modify – their content. This architecture, while useful for extensibility and performance, also opens an enticing door to threat actors. By diving into the XLSX archive, threat actors can cleverly hide malware and even exploit critical flaws in the tools that parse Excel documents. Weaponizing ExcelWhen we think about Excel-based cyberattacks, we might imagine a suspicious document emailed to us along with a message encouraging us to open the file quickly. We might envision this document packed with malicious macros ready to execute after a mistaken double-click. While macros remain a concern in certain contexts, they aren’t directly supported by files with the XLSX extension; macro-enabled Excel documents require an XLSM extension instead. Threats embedded in XLSX documents take different forms. Common attack tactics with XLSX files include the use of OLE embedded objects, external resource references, formula injections, hidden payloads in cells or sheets, and metadata abuse – all of which can lead to devastating outcomes equaling the severity of macro-malware-based attacks. Let’s break each of these down. Embedded Threats via OLE ObjectsWhile XLSX files don’t support macros like old XLS files used to, they do still support embedded objects through Object Linking and Embedding (OLE). That means threat actors can insert other instances of MS Office files (e.g., .docx, .pptx, or even .xlsm) into the XLSX container. These files can carry malicious content of their own – including insecure external links and macro malware. As such, OLE is an effective method for threat obfuscation which harnesses the complexity of the XLSX container. To properly evaluate the risk of an XLSX file, a system must be capable of unpacking and scanning these embedded objects. External Resource ReferencesLinked cells or charts in an XLSX file can carry references to external resources – such as remote web URLs. Excel might attempt to access these external sources the moment the file is opened. If those links were placed there by a threat actor, this could mean incidentally downloading a malicious payload or potentially exposing user data. It’s possible for attacks of this nature to avoid user interaction entirely. If, for example, a client-side file upload system automatically parses XLSX files (e.g., to extract to contents or preview data), the parser might inadvertently trigger the malicious connections, leading to server-side exposure or data leaks. Formula InjectionWhile it’s less commonly seen in the wild, formula injection is still a noteworthy XLSX attack vector. Threat actors can embed malicious formulas in cells which are later exported to (or processed by) backend systems in the target environment. Non-standard Excel functions like This risk is significant if the Excel application exports or reprocesses this data into other environments (e.g., CSV or HTML). This scenario can trigger automatic actions when the file is opened by unsuspecting users. Hidden Cells and SheetsThreat actors can embed malicious scripts, encoded payloads, and even encoded phishing links into hidden areas within an Excel file – including hidden rows/columns and concealed cells. There’s a good chance these areas of the file will be overlooked by users, and a possibility weakly configured scanners won’t dig deeply enough into the file. This is another case where deep verification of the XLSX file content – including full cell parsing and validation – is critical to avert a major attack. Surface-level file validation or rudimentary signature scanning might miss dangerous and weakly obfuscated threats like these. Metadata AbuseThe metadata layer of XLSX files stores document properties including the author, company, creation date, and custom tags. This part of the file is easily accessible when the XLSX archive is unzipped. Threat actors can hide links, command fragments, or misleading information within the metadata layer. In some cases, malware indicators – and even command-and-control (C2) URLs – have been discovered buried in Excel document properties. A Quick Note on XLSM FilesAs discussed earlier, while XLSX files have limitations on embedded macros, XLSM files are explicitly designed to support macros — and that makes them a completely different beast. Macro-enabled XLSM files allow for the embedding of VBA code directly into the document. This code runs with the same privileges as the Excel user; when macros are executed (or bypassed through social engineering) with high enough employee privileges, attackers can execute arbitrary system commands, drop secondary payloads to initiate follow-up attacks, or exfiltrate data. It’s standard practice for file upload workflows to ban XLSM entirely or require strict sandboxing and inspection before processing. It’s still possible to obfuscated XLSM files by hiding them within other file types – including XLSX or any other file type in the MS Office suite. Preventing XLSX Threats with the Cloudmersive Advanced Virus Scan APIThe Cloudmersive Advanced Virus Scan API deeply inspects XLSX files to detect malicious content hidden within even the most obscure parts of the file structure. When processing XLSX uploads, the Advanced Virus Scan engine unpacks the archive and checks for OLE-embedded objects – including other MS Office documents or other nested files – ensuring that threats hidden within subfiles are detected even when the parent file does not contain direct threats. Additionally, other file types masquerading with the XLSX extension and icon are rooted out via deep content verification, which identifies content that fails to conform with strict XLSX formatting standards. Further, External Resource Links are flagged: the API identifies embedded links to remote resources (such as URLs in cells or charts), preventing unwanted connections and data leaks during downstream processing. Each cell within the file is checked for scripts with hidden payloads, malicious formula patterns are flagged to prevent accidental execution, and risky metadata values render the file invalid. Integrating the Advanced Virus Scan API into your upload flow provides strong, context-aware protection that goes beyond surface-level scanning. This is ideal for enterprise workflows that depend on safe ingestion of third-party XLSX files. ConclusionExcel XLSX format is deceptively simple on the surface, while underneath lies highly exploitable document structure. While XLSX files are ubiquitous in business workflows, treating them as “safe by default” is a major risk. Attackers are counting on enterprise antivirus (AV) software and users to glance over spreadsheets without fully verifying what lives under the hood. To learn more about defending against insecure XLSX files with the Cloudmersive Advanced Virus Scan API, please contact a member of our team. |