What is an XLS File

Technical Articles

Review Cloudmersive's technical library.

5/12/2025 - Brian O'Neill

XLS files are the legacy format for Microsoft Excel spreadsheets. They’ve been around since the early 1990’s, and they continue to show up in business workflows, public data repositories, and (unfortunately) various cyberattack campaigns.

But what exactly is an .xls file? How is it structured, and why does it still pose a security risk? These questions remain relevant - even 18 years after the introduction of modern XLSX format.

In this article, we’ll break down .xls files, compare their structure to modern .xlsx files, and explain why they remain a surprisingly common attack vector. We’ll explore real-world XLS vulnerabilities and explain how Cloudmersive’s Advanced Virus Scan API prevents malicious XLS-based attacks.

Breaking Down Binary File Containers

Before we take a dive into XLS-specific formatting conventions, it’s first important to understand the base concept of a binary file container.

Unlike the plaintext formats (e.g., CSV or XML) we predominantly interact with today, binary containers store data in a tightly packed, non-human-readable format. This includes some familiar structural elements like metadata, content blocks, formatting instructions and (sometimes) embedded objects – only they’re all wrapped inside a single binary shell.

If there’s one noteworthy benefit to binary containers, it’s that they allow for efficient content storage and rich formatting. Unfortunately, however, they also make file inspection and validation dramatically harder than text-based file alternatives. Because binary data isn’t visible to users – and may contain multiple layers of encoded content – attackers can easily leverage binary file complexity to bury malicious payloads and improve the chances of bypassing basic antivirus (AV) scanning tools.

How XLS Binary Containers Are Structured (and How That Differs from XLSX)

XLS binary containers are built using the Binary Interchange File Format (BIFF) – a proprietary Microsoft format which predates XML-based standards. BIFF structures .xls files as a sequence of binary records, each one beginning with a type identifier and length field (and followed by data). In this structure, worksheets, formulas, formatting, and embedded objects alike are all encoded in compact, binary chunks.

This is remarkably abstract when compared to ZIP-archived Open XML standard which Microsoft now uses. Excel .xlsx format – the modern equivalent to XLS, introduced in 2007 along with all other Open XML formats (DOCX, PPTX, etc.) – consists of multiple XML documents packed inside a structured and compressed folder system. Because XLSX it’s XML-based, it’s both human-readable and easier for most applications to parse. The attack surface for obscure exploits is much smaller this way, though it’s important to note that XLSX is an equally potent attack vector in its own right.

The critical point here is that XLS is opaque and unpredictable by design. It was never meant to be dissected and sanitized, which makes it really challenging to reliably investigate for threats. The binary format allows for all kinds of embedded content – including OLE objects, macros, and encoded payloads – to hide quietly in the background. As such, XLS files can serve as super innocuous carriers for a wide range of attacks.

XLS as a Threat Vector: Attack Methods and Risks

XLS is still a popular threat vector for modern attackers thanks to its complexity and legacy status. We won’t cover every single possible XLS vector in existence here – but we will break threats down into three common categories:

1. Malicious .xls files sitting at rest
2. Specially crafted XLS files designed to exploit vulnerabilities in relevant parsers
3. Email-based social engineering attacks with XLSX files

Threat-loaded XLS Files Sitting at Rest

When malicious XLS files are uploaded through a web form, API, or any external source and enter cloud (or on-prem server) storage, they can lie dormant indefinitely. At that point, they’ve bypassed perimeter security policies, and there’s no guarantee they’ll be detected at rest before they’re accessed and opened by someone with database access.

These files can be downloaded and opened by unsuspecting employees or customers, triggering embedded macros, OLE objects, or shell commands to execute in a sensitive environment. If they’re executed with the right level of permissions within a network, the ensuing breach can set a business back years – or even put them out of business entirely. Many ransomware campaigns, for example, begin with insecure file uploads via XLS, XLSX, and other similarly common file types.

Specially Crafted XLS Files

Backend servers that process or preview XLS content – think data extractors or document converters, for example – typically rely on specific well-known libraries to handle their core processing logic. When it’s discovered that these libraries have unpatched vulnerabilities, threat actors are often the first to know – and it’s a short leap from there to crafting “special” XLS files that exploit these vulnerabilities. Successfully exploiting a parser vulnerability with a specially crafted XLS file can lead to memory corruption, remote code execution, and even data exfiltration.

As a side note on this topic: it’s important to bear in mind that threat actors aren’t beholden to strict regulations or corporate standards; they can generally move quickly and nimbly to initiate attacks. On the flip side, large enterprise technology teams are often bogged down in red tape. No matter how quickly a vulnerability is identified and patched, there’s always a chance that an attacker (or group of attackers) moved to exploit the vulnerability before it was fixed. As such, malformed files should be screened for rigorously – as rigorously as we might check for any virus or malware signature – before entering any backend parsing workflow.

Email-based social engineering with XLS

Email campaigns continue to be the most common vector for file-based attacks today. Attackers continue to distribute .xls files via phishing emails, disguising them as invoices, receipts, internal memos, and other seemingly routine data sets. Responsibility for mitigating this vector tends to fall on the network user these days, given the ubiquity of corporate anti-phishing training – but that doesn’t mean there shouldn’t be exhaustive network perimeter security policies screening email message containers for malicious XLS content.

The dangers of executing malicious XLS attachments are simply too great to gamble on. These files can run malicious scripts, launch payloads, and even connect directly to remote servers. In certain business departments – especially finance or operations, which might deal with thousands of legacy and modern spreadsheets each year – legacy file formats like XLS can still bypass user skepticism.

Historical Example: The 2017 Cobalt Strike XLS Campaign

Back in 2017, a few financial institutions in Eastern Europe were targeted in an XLS-based attack. Attackers used .xls files to distribute Cobalt strike beacons; the payload was embedded using an OLE object which executed PowerShell scripts after the document was opened. The attack was widely effective, pairing clever social engineering tactics with a legitimate-looking spreadsheet format.

Security tools and awareness have improved since this attack occurred – but the underlying approach nevertheless remains highly relevant. Enterprises of all shapes and sizes can still expect to receive .xls files in legitimate workflows, especially from legacy systems or foreign partners. Any system that accepts and stores .xls needs to treat them as potential threats, even if they look like ordinary spreadsheets.

How Cloudmersive Mitigates XLS-Based Threats

Cloudmersive’s Advanced Virus Scan API identifies threats hidden within .xls files before than can reach sensitive system locations. It looks beyond surface-level content, examining the internal structure of the file and catching threats that might’ve been buried in embedded components or malformed areas to exploit parser vulnerabilities. Even the most seemingly harmless files are rigorously evaluated for signs of obfuscation – or any signatures and behavioral patterns linked to malware.

This deep verification approach to content inspection is especially valuable to abstract legacy formats like XLS – which were never designed with modern security expectations in mind.

Whether a file contains hidden payloads, suspicious objects, or extremely subtle indicators of tampering, the Advanced Virus Scan API offers a comprehensive front line of defense; one that’s highly practical for security document-heavy Enterprise environments without slowing down operations. The API can be integrated with an existing web application with minor code changes, or deployed strategically as a no-code product in defense of web infrastructure and cloud storage containers.

Conclusion

Despite being a legacy technology, .xls files are far from obsolete in the modern threat landscape. The binary file structure and support for embedded objects gives them unique utility in concealing malware and exploiting backend system vulnerabilities. Modern infrastructure must continue to contend with old formats; ignoring this risk can lead to devastating consequences.

To learn more about protecting your system with Cloudmersive, please contact a member of our team.

Technical Articles

Breaking Down Binary File Containers

How XLS Binary Containers Are Structured (and How That Differs from XLSX)

XLS as a Threat Vector: Attack Methods and Risks

Threat-loaded XLS Files Sitting at Rest

Specially Crafted XLS Files

Email-based social engineering with XLS

Historical Example: The 2017 Cobalt Strike XLS Campaign

How Cloudmersive Mitigates XLS-Based Threats

Conclusion

Related

600 free API calls/month, with no expiration

API Products

Virus Scan APIs

Spam Detection APIs

Security Threat Detection APIs

Document and Data Conversion APIs

Validate APIs

Natural Language Processing (NLP) APIs

Optical Character Recognition (OCR) APIs

Image and Face Recognition and Processing APIs

Questions? We'll be your guide.