PCI, SSN, PII Detection¶
VoiceBase allows you to detect sensitive data in your recordings (and, optionally, redact it) from the recordings, transcripts, and analytics.
PCI and related detectors are based on machine learned models of real calls where both a caller and an agent are audible. This gives higher reliability and adaptability to real world situations than deterministic models, but also means that for accurate results the audio being processed for PCI, SSN, or PII detection must reflect a real transaction. For example: a phone order may reflect some amount of conversation, followed by a product and quantity, the agent giving a total, asking for the card type and number, expiration date and possibly CVV code. The Number detector is rule-based and will detect any portion of the conversation containing numbers.
VoiceBase offers two options for PCI detection (and redaction). The 'PCI' model will detect sensitive portions of the conversation and may mark some buffer before and after sensitive portions for good measure. The 'pci-numbers-only' model will return ONLY the segments of the conversation containing digits within the PCI portion of the conversation. It is worth noting that this second approach while much more specific, relies on recognition of the speech as numbers or words that sound sufficiently like numbers. An expiration date of: 08/2017 will be redacted, but August 2017 will result in only the year '2017' marked as PCI.
The API offers the following three detectors for sensitive data:
- Payment Card Information (PCI) Detector
- Detects PCI sensitive numbers, including:
- Credit Card, Debit Card, and Payment Card numbers
- Card expiration dates
- CVV validation codes
- Detects PCI sensitive numbers, including:
- Social Security Number (SSN) Detector
- Detects Social security numbers
- Number Detector
- Detects numbers, to be used for Personally Identifiable Information (PII) numbers that do not fall into above categories
Detected Regions¶
When detection for sensitive data is enabled, the API returns detected regions as part of analytics for the recording. For example, in a recording with two regions detected as PCI and one region detected as SSN, the analytics would contain:
{
"prediction": {
"detectors": [
{
"detectorId": "abcdefg-1f10-11f2-a085-ec48ab4fbb59",
"detections": [
{
"detectorClass": 1,
"detectorClassLabel": "PCI",
"detectedSegments": [
{
"occurrences": [
{ "s": 362000, "e": 410055 },
{ "s": 575390, "e": 629607 }
]
}
]
}
]
},
{
"detectorId": "e79c540f-0d47-484e-859e-30d1ae6e4009",
"detections": [
{
"detectorClass": 1,
"detectorClassLabel": "ssn",
"detectedSegments": [
{
"occurrences": [
{ "s": 202293, "e": 229835 }
]
}
]
}
]
}
]
}
}
For each detection, the API returns three data points:
detectorName
and/ordetectorId
: The type of sensitive data detecteddetections
: array of the detected regionss
: The start time of the detected region, in millisecondse
: The start end of the detected region, in milliseconds
PCI Detector¶
To enable it, add PCI detector to your configuration when you make a POST request to the /v3/media resource.
IMPORTANT NOTE: Currently, the PCI detector requires to disable number formatting.
{
"transcript": {
"formatting" : {
"enableNumberFormatting" : false
}
},
"prediction": {
"detectors": [
{ "detectorName": "PCI" }
]
}
}
SSN Detector¶
To enable it, add the SSN detector to your configuration when you make a POST request to the /v3/media resource.
IMPORTANT NOTE: Currently, the SSN detector requires to disable number formatting.
{
"transcript": {
"formatting" : {
"enableNumberFormatting" : false
}
},
"prediction": {
"detectors": [
{ "detectorName": "SSN" }
]
}
}
Number Detector¶
To enable it, add the Number detector to your configuration when you make a POST request to the /v3/media resource.
IMPORTANT NOTE: Currently, the Number detector requires to disable number formatting.
{
"transcript": {
"formatting" : {
"enableNumberFormatting" : false
}
},
"prediction": {
"detectors": [
{ "detectorName": "Number" }
]
}
}
Examples¶
** Note: Export your api TOKEN
prior to running the following example.
export TOKEN='Your Api Token'
Enabling the detectors¶
curl https://apis.voicebase.com/v3/media \
--header "Authorization: Bearer ${TOKEN}" \
--form media=@recording.mp3 \
--form configuration='{
"transcript": {
"formatting" : {
"enableNumberFormatting" : false
}
},
"prediction": {
"detectors": [
{ "detectorName": "PCI" }
{ "detectorName": "SSN" }
{ "detectorName": "Number" }
]}
}'