What is File Hashing?

Technical Articles

Review Cloudmersive's technical library.

11/10/2023 - Brian O'Neill

Each file we encounter in our system is a sliver of data that can be represented in binary form & interpreted by our computer. This fundamental similarity between all file types makes it possible to differentiate files from one another in a standardized way: by calculating unique, fixed-length strings – called file hashes – that change reliably with the introduction of even the slightest file changes. The process of creating these unique identifying strings is called File Hashing, and it can be accurately characterized as digital fingerprinting.

blue file graphic

How does file hashing work, and what is it commonly used for?

File hashes are typically extracted from a file using one of several different hashing algorithms. Once a hash is produced, it belongs only to the specific iteration of the file it was extracted from, making it possible to identify exact copies of that file by referencing the original file hash in a database.

When protected files are stolen and/or illegally shared across the internet, for example, file hashing can play a key role in rapidly tracking down the stolen file. While more advanced, complex file detection techniques also exist today – such as machine learning detection – file hashing still represents a faster, more lightweight processing solution due to the limited size of a hash string.

What is file hashing in the context of malware threat detection?

In the context of malware threat detection, file hash detection can also play a somewhat similar role to signature-based threat detection – albeit in a more limited & less categorical capacity. If duplicate versions of previously encountered malware-infected files pass through a scanner with access to an up-to-date file hash database, it’s possible to flag those files immediately based on their file hash.

Additionally, file hashes can be used to determine if important existing files have been tampered with (e.g., changed by malicious code) in some way. The slightest change to a file’s composition will result in a new hash value; if changes occurred to an existing file without interaction from a trusted user, it’s possible those changes were made by a malicious external actor.

While many earlier cybersecurity solutions relied significantly (in some cases exclusively) on file hashing and/or signature matching for malware detection, relying so heavily on these policies is no longer considered a secure practice. The problem is that hundreds of thousands of unique malware iterations are introduced every day – numbers which file hash databases simply cannot scale to keep up with. On top of that, experienced threat actors know they can circumvent basic hash detection techniques by adding even the slightest alterations to their malicious executable files. That said, file hashing still does remain a useful & effective threat detection tool when employed prudently as one cog in a much larger malware-scanning whole.

File hashing with the Cloudmersive Virus Scan API

The basic, fundamental iteration of the Cloudmersive Virus Scan API identifies viruses & malware within a sandbox scanning layer that includes a combination of file hashing, signature extraction, pattern matching, heuristics, whitelisting, bytecode analysis, and certificate analysis. This multi-dimensional approach to scanning ensures a wide variety of threats – including established threats from known malware families & zero-day threats with no referable record – are reliably identified in high-speed, in-memory scans.

In addition, the Advanced Scan iteration of the Virus Scan API makes it possible to incorporate in-depth content verification steps into the virus scan process. Through this heavily utilized API iteration, it’s possible to set custom rules against threatening file types including invalid files, password-protected files, executables, script files, unsafe archives, and more without writing any code. Additionally, it’s possible to limit file types by providing a custom comma-separated whitelist of acceptable file extensions.

For more information on the Cloudmersive Virus Scan API, please do not hesitate to reach out to a member of our sales team.

Technical Articles

How does file hashing work, and what is it commonly used for?

What is file hashing in the context of malware threat detection?

File hashing with the Cloudmersive Virus Scan API

Related

600 free API calls/month, with no expiration

API Products

Virus Scan APIs

Content Disarm and Reconstruction APIs

Spam Detection APIs

Document Conversion & Processing APIs

Document AI APIs

Natural Language Processing (NLP) APIs

Optical Character Recognition (OCR) APIs

Image and Face Recognition and Processing APIs

Questions? We'll be your guide.