This guide covers two features available on the three Advanced Redaction APIs in the Cloudmersive DLP API:
- Custom Fields, which let you detect and redact organization-specific data that is not covered by the built-in PII categories.
- Custom Policy ID, which lets you apply a centrally managed policy created in the Cloudmersive Management Portal.
Advanced Redaction APIs
- Advanced Text Redaction:
POST /dlp/redact/text/advanced. Accepts plain text.
- Advanced Document Redaction:
POST /dlp/redact/document/advanced. Accepts a document file (PDF, DOC, DOCX, XLS, XLSX, PPT, PPTX, HTML, EML, MSG, PNG, JPG, WEBP).
- Advanced Audio Redaction:
POST /dlp/redact/audio/advanced. Accepts an audio file (WAV, MP3, M4A, FLAC, OGG, WMA).
All three accept the same Custom Fields and Custom Policy ID parameters described below.
Custom Fields
The Advanced Redaction APIs detect a large set of built-in categories such as email addresses, phone numbers, social security numbers, credit card numbers, and health information. Custom Fields extend that detection to data types that are specific to your organization, such as internal participant codes, case numbers, or proprietary identifiers.
How it works
Each custom field has two properties:
Title: A short name for the field. This is also used to build the redaction tag that replaces matched content. The tag is the title in uppercase, with spaces and underscores converted to hyphens, wrapped in square brackets. For example, a title of internal participant code produces the tag [INTERNAL-PARTICIPANT-CODE].
Description: A specific description of what the data looks like. The more precise the description, the more reliably the field is detected. Include an example value where possible.
Custom fields are detected and redacted in addition to the built-in categories. They do not replace or disable any of the standard detection.
Request structure
Add a CustomFields array to the request body. Each entry is an object with a Title and a Description.
Advanced Text Redaction
{
"InputText": "Participant studyPTSD58147 was enrolled on the trial.",
"RedactionMode": "SemanticTag",
"CustomFields": [
{
"Title": "internal participant code",
"Description": "alphanumeric study or research participant identifier code (e.g. studyPTSD58147)"
}
]
}
With the request above, the matched value is replaced by the derived tag:
Participant [INTERNAL-PARTICIPANT-CODE] was enrolled on the trial.
Advanced Document Redaction
For documents, the InputFile is the document file bytes. Matched custom field content is redacted visually on the page using the configured RedactionMode (for example BlackOut or Blur).
{
"InputFile": "<base64-encoded document bytes>",
"FileName": "input.pdf",
"RedactionMode": "BlackOut",
"CustomFields": [
{
"Title": "case number",
"Description": "internal case reference number in the format CASE- followed by six digits (e.g. CASE-004821)"
}
]
}
Advanced Audio Redaction
For audio, the InputFile is the audio file bytes. The audio is transcribed, the custom field content is detected in the transcript, and the corresponding audio segments are redacted.
{
"InputFile": "<base64-encoded audio bytes>",
"LanguageCode": "ENG",
"CustomFields": [
{
"Title": "policy number",
"Description": "membership policy number spoken as the letter P followed by eight digits"
}
]
}
Tips for writing effective custom fields
- Write the
Description so that someone unfamiliar with your data could recognize a match. Include the format, length, and a representative example.
- Use a concise
Title. Remember it becomes the redaction tag, so keep it readable when uppercased and hyphenated.
- You can supply multiple custom fields in a single request. Each is detected independently and gets its own tag.
Custom Policy ID
Custom Fields are defined per request. A Custom Policy is defined once in the Cloudmersive Management Portal and referenced by ID across many requests. This gives you centralized policy management: you maintain the policy in one place, and any update applies everywhere the policy ID is used, with no change to your application code.
Custom Policies require a Managed Instance or Private Cloud deployment.
Applying a policy in a request
Set the CustomPolicyID property to the ID of the policy you created in the portal. This works on all three Advanced Redaction APIs.
{
"InputText": "Customer record attached.",
"CustomPolicyID": "550e8400-e29b-41d4-a716-446655440000"
}
Creating a Custom Policy in the Management Portal
Follow these steps to create a policy and obtain its ID.
- Log in to the Cloudmersive Management Portal.
- In the left navigation menu, select Custom Policies. This opens the Custom Policies page, which lists any policies you have already created with their Title, Type, and Version.
- Click Create Policy.
- On the Create Policy form:
- Type: select Data Loss Prevention Detection Policy.
- Title: enter a descriptive name, for example
PII Protection Policy.
- Version: leave the default of
1.0.0, or set your own value using semantic versioning (for example 1.2.0).
- Click Create Policy to save. You are returned to the Custom Policies list.
- On the list, click Manage next to your new policy. The Manage Policy page shows the policy Properties, including the ID field. This value is your Custom Policy ID.
- Copy the ID value. Use it as the
CustomPolicyID in your Advanced Redaction API requests.
Adding rules to a policy
A policy contains one or more rules that define its behavior.
- On the Manage Policy page, click Create Rule.
- On the Add Rule form:
- Rule Identifier: a short label for the rule, for example
1a.
- Rule Type: choose Guideline, Allow, or Block.
- Status: Active (default) or Disabled.
- Rule Text: a natural language description of the rule, for example
Redact all internal case reference numbers.
- Click Create Rule to save. Repeat to add additional rules.
Each rule appears in the Rules table on the Manage Policy page, where you can review or remove it.
Centralized management and updates
Because requests reference the policy by ID rather than embedding the rules, you can update a policy in the portal at any time, such as adding a rule, disabling a rule, or revising rule text, and the change takes effect for every request that uses that policy ID. There is no need to redeploy or modify your integration when policy requirements change.
Combining Custom Fields and Custom Policy ID
The two features can be used together in a single request. Custom Fields add request-specific detection, and the Custom Policy ID applies your centrally managed rules at the same time.
{
"InputText": "Participant studyPTSD58147 contacted support.",
"RedactionMode": "SemanticTag",
"CustomPolicyID": "550e8400-e29b-41d4-a716-446655440000",
"CustomFields": [
{
"Title": "internal participant code",
"Description": "alphanumeric study or research participant identifier code (e.g. studyPTSD58147)"
}
]
}