Kong provides several AI PII Anonymizer service Docker images, each of which has its own built-in NLP model and is tagged with the version-lang_code
pattern. For example, ai-pii-service:v0.1.2-en
means the version of the image is 0.1.2 and its built-in NLP model is English, ai-pii-service:v0.1.2-it
is version 0.1.2 for Italian, ai-pii-service:v0.1.2-fr
is version 0.1.2 for French, and so on.
The image tagged with all
contains English, French, German, Italian, Japanese, Portuguese, and Spanish models. If you need to use other NLP models, customize ai_pii_service/nlp_engine_conf.yml
with any of the available images.
These Docker images are private, contact Kong Support to get access to them.
The PII Anonymizer service loads NLP models, with one model loaded by default. Ensure that you have at least 600MB of free memory to run the image.
This service takes the following optional environment variables at startup:
-
GUNICORN_WORKERS
: Specifies the number of Gunicorn processes to run
-
PII_SERVICE_ENGINE_CONF
: Specifies the natural language processing (NLP) engine configuration file
-
GUNICORN_LOG_LEVEL
: Specifies log level
-
POST /llm/v1/sanitize
: Sanitize specified types of PII information, including credentials, and custom patterns
-
POST /llm/v1/sanitize_credentials
: Only for sanitizing credentials
You can anonymize data in requests using the following redact modes:
-
placeholder
: Replaces sensitive data with a fixed placeholder pattern, PLACEHOLDER{i}
, where i
is a sequence number. Identical original values receive the same placeholder.
For example, the location New York City
might be replaced with LOCATION
.
-
synthetic
: Redact the sensitive data with a word in the same type.
For example, the name John
might be replaced with Amir
.
- Custom patterns are replaced with
CUSTOM{i}
.
- Credentials are replaced with a string of
#
characters matching the original length.
You can define an array of custom patterns on a per-request basis.
Currently, only regex patterns are supported, and all fields are required: name
, regex
, and score
.
The name
must be unique for each pattern.
You can use the following fields in the anonymize
array:
-
general
: Anonymizes general PII entities such as person names, locations, and organizations.
-
phone
: Anonymizes phone numbers (for example, mobile
, landline
).
-
email
: Anonymizes email addresses.
-
creditcard
: Anonymizes credit card numbers.
-
crypto
: Anonymizes cryptocurrency addresses.
-
date
: Anonymizes dates and timestamps.
-
ip
: Anonymizes IP addresses (both IPv4 and IPv6).
-
nrp
: Anonymizes a person’s nationality, religious, or political group.
-
ssn
: Anonymizes Social Security Numbers (SSN) and other related identifiers like ITIN, NIF, ABN, and more.
-
domain
: Anonymizes domain names.
-
url
: Anonymizes web URLs.
-
medical
: Anonymizes medical identifiers (for example, medical license numbers, NHS numbers, medicare numbers).
-
driverlicense
: Anonymizes driver’s license numbers.
-
passport
: Anonymizes passport numbers.
-
bank
: Anonymizes bank account numbers and related banking identifiers (for example, VAT codes, IBAN).
-
nationalid
: Anonymizes various national identification numbers (for example, Aadhaar, PESEL, NRIC, social security, or voter IDs).
-
custom
: Anonymizes user-defined custom PII patterns using regular expressions only when custom patterns are provided.
-
credentials
: Anonymizes the credentials, similar to /sanitize_credentials
.
-
all
: Includes all the fields above, including custom ones.