API Spotlight: Image Recognition APIs

Blog

Find out about the latest from Cloudmersive.

2/8/2023 - Brian O'Neill

printed photos on a table

From digital forensics to automatic photo captioning, our image recognition APIs make it possible for you to add powerful, intelligent features to your image processing applications at low cost. Below, we’ll take a closer look at a few of our most popular Image Recognition API iterations and examine how they can be used to impact your business.

Generate Perceptual Hash; Calculate Similarity Between Perceptual Hashes

Perceptual hashing is a very reliable and lightweight method used for comparing images against one another. It’s frequently used in digital forensics to track down illegally disseminated images, and it’s also regularly employed to identify cases of copyright infringement, such as a scenario where some unlicensed entity may have stolen a protected image and used it online for commercial gain.

Creating a perceptual hash for any image involves calculating a hash value based on that image’s specific properties, such as its color and texture. Once that hash is generated, it can be compared with the perceptual hash of a second image to identify the differences between them. The degree of difference between these two hash values is referred to as a “Hamming Distance” (named after Richard Hamming, a famous American mathematician whose research provided the basis for this operation).

The “Generate Perceptual Image Hash” API iteration accepts a common image format (such as PNG, JPG) in its request and returns an image hash as a string. The “Calculate Similarity Between Two Perceptual Image Hashes” API iteration accepts two separate perceptual hash strings as input and returns an “ImageSimilarityScore” based on the Hamming Distance between them. Together, both API iterations can be used to protect your online image content or identify duplicate images within your own systems.

Describe an Image in Natural Language

There are dozens of unique, exciting applications of Deep Learning Artificial Intelligence (AI) which have become increasingly democratized in recent years, and automatic image description is a worth inclusion on that list. At a high level, this process involves preprocessing and extracting key features from an image which can be subsequently encoded and referenced against a training dataset. There are a variety of applications for this technology, including automatic image captioning and indexing (making images easier to search with keywords).

This API accepts an image in a common format (such as PNG, JPG) and generates two separate descriptions for that image in return. The first description is the “BestOutcome,” and the second description is the “RunnerUpOutcome.” A confidence score ranging from 0.0 – 1.0 is provided alongside each description, with lower values indicting lower confidence, and higher values indicating higher confidence. Additionally, a “HighConfidence” Boolean response is provided independently of both descriptions to initially evaluate the overall success (potential utility) of the operation.

Detect People in an Image, Including their Location Within the Image

Whether you’re analyzing still-frames from security camera footage or planning to crop photos automatically for our website, understanding the existence and location of human subjects within your image is extremely important. Like the previous API described in this post, this operation also relies on feature extraction, leveraging a vast training dataset to accurately distinguish human forms from non-human forms within the confines a digital pixel matrix.

This API will identify the position, size and location of human subjects within an image – regardless of which direction those people are facing (i.e., this does not rely on facial recognition). The specific height, width and pixel coordinate location of each detected human subject is expressed in terms of pixels, and an overall “ObjectCount” is provided to summarize the total number of people identified.

Detect and Unskew a Photo of a Document

When we take photos of documents on our handheld cameras and smartphones, the images we take are rarely perfect, and in many cases these imperfections make subsequent OCR or document conversion operations difficult to perform with high fidelity. As a result, when we build application workflows designed to funnel images into OCR or document conversion operations, it’s important to first incorporate a step which automatically corrects our images’ imperfections, thus optimizing their contents’ machine readability.

The purpose of this API is specifically to assist in the downstream OCR and conversion operations mentioned above. This API accepts images in common formats (such as PNG, JPG) and automatically corrects natural image skewing, returning a perfectly square image. When calling this API, you’ll have the option to apply a Black and White postprocessing effect to the image as well, which serves to further aid in subsequent OCR operations.

For more information on our Image Recognition APIs, please feel free to contact our sales team.

Blog

Generate Perceptual Hash; Calculate Similarity Between Perceptual Hashes

Describe an Image in Natural Language

Detect People in an Image, Including their Location Within the Image

Detect and Unskew a Photo of a Document

Related

600 free API calls/month, with no expiration

API Products

Virus Scan APIs

Content Disarm and Reconstruction APIs

Spam Detection APIs

Document Conversion & Processing APIs

Document AI APIs

Natural Language Processing (NLP) APIs

Optical Character Recognition (OCR) APIs

Image and Face Recognition and Processing APIs

Questions? We'll be your guide.