How to Detect Document Fraud using AI in Python

Technical Articles

Review Cloudmersive's technical library.

4/16/2026 - Brian O'Neill

Document fraud has always been a problem for enterprises, but the barrier to producing convincing fake documents has dropped considerably in recent years. AI generation tools that didn’t exist a few years ago can now produce fake invoices, fabricated ID scans, and even forced financial records. Some of these tools have already improved to the point where distinguishing between fake and legitimate files is almost impossible for human eyes. For any application that accepts uploaded documents as part of its workflow, this is a meaningful (and growing) risk.

The traditional response is manual review, which doesn’t scale, or rule-based detection, which tends to be brittle against anything it wasn’t explicitly designed to catch. What actually works is a solution that can reason about document content holistically, flag suspicious signals based on context, and factor in what it knows about the user submitting the document alongside what it finds in the document itself.

Detecting Document Fraud with the Cloudmersive AI Fraud Detection

That’s what the Cloudmersive AI Fraud Detection API does. It accepts a document as input, allows for optional user context, runs a comprehensive fraud assessment, and returns a structured result complete with a risk score, a set of specific fraud signal flags, and a plain-language rationale explaining how the assessment was reached.

The supported input formats include the ones you’re most likely to encounter in a real document intake workflow, including PDF, DOC/DOCX, XLS/XLSX, HTML, EML/MSG, PNG, JPG, and WEBP.

Walking Through a Real Implementation

In this walkthrough, we’ll build an example API call in Python using Google Colab (Python 3) and walk through what the response looks like. Code examples come directly from the Cloudmersive Swagger page, which you can find here.

As a quick note, you’ll need a Cloudmersive API key to follow along with this walkthrough. You can get one for free by signing up for a free account on the Cloudmersive website; that will give you up to 800 API calls to play around with and zero commitments once you expend that amount (they’ll just reset the following month).

Installing the SDK

The first thing we’ll do is run a pip command to install the Fraud API SDK:

pip install cloudmersive-fraud-detection-api-client

Importing Resources

And right after that, we’ll pull in the resources we need for our request:

from __future__ import print_function
import time
import cloudmersive_fraud_detection_api_client
from cloudmersive_fraud_detection_api_client.rest import ApiException
from pprint import pprint

Structuring our Request

We’ll now use the raw example code we walked about from the Swagger page as our starting point.

# Configure API key authorization: Apikey
configuration = cloudmersive_fraud_detection_api_client.Configuration()
configuration.api_key['Apikey'] = 'YOUR_API_KEY'



# create an instance of the API class
api_instance = cloudmersive_fraud_detection_api_client.FraudDetectionApi(cloudmersive_fraud_detection_api_client.ApiClient(configuration))
preprocessing = 'preprocessing_example' # str | Optional: Set the level of image pre-processing to enhance accuracy.  Possible values are 'Auto' and 'None'.  Default is Auto. (optional)
result_cross_check = 'result_cross_check_example' # str | Optional: Set the level of output accuracy cross-checking to perform on the input.  Possible values are 'None' and 'Advanced'.  Default is None. (optional)
user_email_address = 'user_email_address_example' # str | User email address for context (optional) (optional)
user_email_address_verified = true # bool | True if the user's email address was verified (optional) (optional)
input_file = '/path/to/inputfile' # file | Input document, or photos of a document, to perform fraud detection on (optional)

try:
    # Advanced AI Fraud Detection for Documents
    api_response = api_instance.document_detect_fraud_advanced(preprocessing=preprocessing, result_cross_check=result_cross_check, user_email_address=user_email_address, user_email_address_verified=user_email_address_verified, input_file=input_file)
    pprint(api_response)
except ApiException as e:
    print("Exception when calling FraudDetectionApi->document_detect_fraud_advanced: %s\n" % e)

Before this code does anything useful, we’ll first need configure our API key and fill in the request parameters.

Let’s handle authentication first:

configuration.api_key['Apikey'] = userdata.get('freekey')
configuration.host = "api.cloudmersive.com"

Note that we’ve manually configured the host in this case to “api.cloudmersive.com”, which is the default host for free-tier subscriptions like the one we’re using today.

Most of the additional configuration for this API happens through request headers rather than a JSON body, which makes it a bit different from some other Cloudmersive API calls. There are a few parameters worth understanding before we set values.

preprocessing controls how aggressively the API enhances image quality before running its analysis. The default is Auto, which handles most real-world input well. Setting it to None skips that step, which can reduce latency for documents you already know are clean and high-resolution.

resultCrossCheck adds a second-pass verification on the output when set to Advanced. The default is None, but for high-stakes intake workflows, flipping this on is usually worth the added processing time.

UserEmailAddress and UserEmailAddressVerified are optional, but they're worth passing when available. Providing user context alongside the document allows the API to factor submission-level signals into the fraud risk score rather than evaluating the document in isolation. A document submitted by an unverified account carries different weight than the same document submitted through a fully verified one.

CustomPolicyID lets the request run against a saved policy configuration, which is useful if different parts of your workflow require different fraud detection thresholds.

For our example, we'll submit a sample document with user context attached and leave resultCrossCheck set to Advanced to get the most thorough assessment. Here's the completed request:

from google.colab import userdata
import cloudmersive_fraud_detection_api_client
from cloudmersive_fraud_detection_api_client.rest import ApiException
from pprint import pprint

# Configure API key authorization: Apikey
configuration = cloudmersive_fraud_detection_api_client.Configuration()
configuration.api_key['Apikey'] = userdata.get('freekey')
configuration.host = "api.cloudmersive.com"



# create an instance of the API class
api_instance = cloudmersive_fraud_detection_api_client.FraudDetectionApi(cloudmersive_fraud_detection_api_client.ApiClient(configuration))
preprocessing = '' # str | Optional: Set the level of image pre-processing to enhance accuracy.  Possible values are 'Auto' and 'None'.  Default is Auto. (optional)
result_cross_check = '' # str | Optional: Set the level of output accuracy cross-checking to perform on the input.  Possible values are 'None' and 'Advanced'.  Default is None. (optional)
user_email_address = 'jdoe@somewhere.com' # str | User email address for context (optional) (optional)
user_email_address_verified = True # bool | True if the user's email address was verified (optional) (optional)
input_file = 'Invoice Example 2.PNG' # file | Input document, or photos of a document, to perform fraud detection on (optional)

try:
    # Advanced AI Fraud Detection for Documents
    api_response = api_instance.document_detect_fraud_advanced(preprocessing=preprocessing, result_cross_check=result_cross_check, user_email_address=user_email_address, user_email_address_verified=user_email_address_verified, input_file=input_file)
    pprint(api_response)
except ApiException as e:
    print("Exception when calling FraudDetectionApi->document_detect_fraud_advanced: %s\n" % e)

Interpreting the Response

After we run our code, we’ll get a response that looks like this:

{
    'analysis_rationale': None,
    'clean_result': True,
    'contains_ai_generated_content': False,
    'contains_asset_transfer': False,
    'contains_employment_agreement': False,
    'contains_expired_document': False,
    'contains_financial_liability': True,
    'contains_purchase_agreement': True,
    'contains_sensitive_information_collection': True,
    'document_class': 'Invoice',
    'fraud_risk_level': 0.3,
    'successful': True
}

There’s a bit more to unpack here than a typical API response, so it’s worth going through each field carefully.

Successful and CleanResult give us our top-level signals. Successful just confirms the request completed without error. CleanResult is the actual fraud assessment verdict, with True indicating the document passed and False indicating something was flagged.

FraudRiskLevel is the numeric score underneath that boolean, and it's where things get more useful in practice. Rather than treating fraud detection as a hard pass/fail gate, we can use this score to build tiered routing logic. Low-risk documents move through automatically, mid-range scores route to a review queue, and high scores get escalated or rejected outright.

Below that, we'll find a set of specific boolean flags that each surface a distinct category of risk. ContainsFinancialLiability and ContainsPurchaseAgreement are useful for catching documents whose content doesn't match their declared type. ContainsExpiredDocument catches anything submitted past its valid date. ContainsAiGeneratedContent is the flag most relevant to the current fraud landscape, identifying documents that appear to have been produced by a generative AI tool rather than a legitimate source.

AnalysisRationale returns a plain-language explanation of how the fraud assessment was reached. That's useful for audit trails and for giving human reviewers something actionable rather than just a number. DocumentClass rounds out the response with the API's classification of what type of document was submitted in the first place.

So if we submitted a document that turned out to be an AI-generated invoice with an expired date, we'd expect to see CleanResult come back False, a high FraudRiskLevel, and both ContainsAiGeneratedContent and ContainsExpiredDocument tripped to True, with AnalysisRationale explaining the reasoning behind each flag.

Conclusion

The Cloudmersive AI Fraud Detection API makes it practical for developers to add meaningful document fraud detection to a Python application without building out a full detection pipeline from scratch. The combination of advanced AI processing and contextual inputs is exactly what enterprises need in a current security landscape plagued with convincing AI-generated fakes.

If you’re looking for additional help fitting this API into a larger document intake & security workflow, feel free to contact the Cloudmersive team directly and they’ll be glad to assist you.

Technical Articles

Detecting Document Fraud with the Cloudmersive AI Fraud Detection

Walking Through a Real Implementation

Installing the SDK

Importing Resources

Structuring our Request

Interpreting the Response

Conclusion

Related

600 free API calls/month, with no expiration

API Products

Virus Scan APIs

Content Disarm and Reconstruction APIs

Spam Detection APIs

Document Conversion & Processing APIs

Document AI APIs

Natural Language Processing (NLP) APIs

Optical Character Recognition (OCR) APIs

Image and Face Recognition and Processing APIs

Questions? We'll be your guide.