|API Spotlight: Image Optical Character Recognition APIs
Image OCR Overview
This week, we’re putting a spotlight on our Image Optical Character Recognition (Image OCR) APIs. These APIs make it possible to extract text directly out of scanned images and photographs of documents, and several are designed with specific use-cases like Receipt expensing, Business Card information extraction, and Business Form processing. All are equipped to recognize dozens of common international languages, with English set as the default unless specified otherwise. In addition, all employ an advanced ‘recognition mode’ option by default which increases the fault tolerance for skewed document images, consuming around 28-30 API calls to ensure the highest possible end-product. Let’s take a closer look at how our Image OCR APIs can impact your business.
Converting Scanned Documents into Text
There are three iterations of our OCR API designed specifically to recognize images of documents which have been scanned rather than photographed (i.e., with a phone or other handheld device). These iterations include:
- Convert Scanned Images to Text
- This iteration executes a vanilla OCR operation on scanned images, returning a text result with a “MeanConfidenceLevel” score indicating the degree to which the API believes the operation was successful.
- Convert Scanned Images to Words/Text with Location
- This iteration returns scanned image text with metadata about the words within the image. This includes a “WordText” string, line & word numbers, x/y coordinate locations, width, height, and more.
- Convert Scanned Images to Lines/Text with Location
- This iteration returns scanned image text with metadata about specific lines within the document. This includes information about each word within the specified line, including a “WordText” string and other metadata also returned by the Words/Text API iteration.
Converting Photographs of Documents into Text
These iterations of the Image OCR API are intended only for use on images captured by a smartphone or other handheld convenience-based devices. They automatically attempt to compensate for problems commonly found in handheld photos of documents, such as crooked backgrounds and uncertain lighting. These iterations include:
- Convert a Photo of a Document into Text
- This iteration will return text from the input photo along with a “MeanConfidenceLevel” score indicating the degree to which the API feels it was successful.
- Convert a Photo of a Document OR Receipt into Words with Location
- This iteration returns photos of generic documents or receipts with the location of each text element. It can be used effectively on receipts, though not with the same degree of accuracy as the Receipt OCR API. A diagnostics mode can be enabled for this API by entering a ‘true’ Boolean in the relevant parameter; this is set to ‘False’ by default to streamline performance.
Converting Receipts, Business Cards, Generic Forms & Stored Template Forms into Structured Text
These last iterations of the Image OCR API are excellent at digitizing text from photographs (NOT scanned images) of documents with specific structures. Such documents include Receipts (most often used for expensing employee transactions), Business Cards (most often containing useful information for generating sales leads), and Business Forms (may include basic new-employee handouts invoices, and more). These iterations include:
- Recognize a Photo of a Receipt; Extract Key Business Information
- Along with the receipt total & subtotal, this iteration will extract key information from a receipt including the name, website, physical address & phone number of the business in question, as well as a complete list of items on the receipt and their respective prices.
- Recognize a Photo of a Business Card; Extract Key Business Information
- This iteration will extract key information from an input photo of a business card, including the name & title of the person on the card, the name of their business & their business address, and any phone number and email that is available on the card.
- Recognize a Photo of a Form; Extract Key Fields & Business Information
- This iteration allows you to pull specific data from a photo of a Business Form by customizing the fields of data to be extracted. There are a wide variety of options available with this iteration; to view the entire response model, refer to this iteration’s documentation dropdown on the Cloudmersive API Console.
- Recognize a Photo of a Form, Extract Key Fields using Stored Templates
- This iteration allows you to extract information from a form using stored templates. These templates can be configured in a configuration bucket by logging into the Cloudmersive Management Portal and navigating settings > API Configuration > Create Bucket. The bucket ID and Secret Key are required to execute OCR based on the stored template.