Skip to content

Commit

Permalink
feat: Updates fingerprinting and deduplication docs (#61)
Browse files Browse the repository at this point in the history
  • Loading branch information
dhudec authored Feb 7, 2023
1 parent 74ad517 commit b5edb07
Show file tree
Hide file tree
Showing 4 changed files with 54 additions and 28 deletions.
18 changes: 9 additions & 9 deletions docs/api/tenants/tenants.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@ func main() {
#### Request Parameters

| Attribute | Required | Type | Default | Description |
| ---------- | -------- | -------- | ------------------------------------------ | ----------------------------------------------------- |
|------------|----------|----------|--------------------------------------------|-------------------------------------------------------|
| `name` | true | _string_ | `null` | The name of the Tenant. Has a maximum length of `200` |
| `settings` | false | _string_ | [Tenant Settings](#tenant-settings-object) | The settings for the Tenant |

Expand Down Expand Up @@ -545,7 +545,7 @@ Returns a [Tenant Usage Report](#tenant-usage-report-object) for the provided `B
## Tenant Object

| Attribute | Type | Description |
| ------------- | ------------------------------------------ | ---------------------------------------------------------------------------------------------------- |
|---------------|--------------------------------------------|------------------------------------------------------------------------------------------------------|
| `id` | _uuid_ | Unique identifier of the Tenant |
| `owner_id` | _uuid_ | The user ID which owns the Tenant |
| `name` | _string_ | The name of the Tenant |
Expand All @@ -557,21 +557,21 @@ Returns a [Tenant Usage Report](#tenant-usage-report-object) for the provided `B

### Tenant Settings Object

| Attribute | Type | Description |
| -------------------- | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| `fingerprint_tokens` | _string_ | (Bool) Whether the [default fingerprint expression](/docs/api/tokens/token-types) is applied if no fingerprint expression is provided on the token |
| `deduplicate_tokens` | _string_ | (Bool) Whether tokens are deduplicated on creation and updates |
| Attribute | Type | Description |
|----------------------|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `fingerprint_tokens` | _string_ | (Bool) Whether all tokens will be fingerprinted using the [default fingerprint expression](/docs/api/tokens/token-types) for its token type. When disabled, fingerprinting can still be enabled per token by setting a [fingerprint expression](/docs/expressions/fingerprints) on the token. |
| `deduplicate_tokens` | _string_ | (Bool) Whether tokens are [deduplicated](/docs/concepts/what-are-tokens#deduplication) on creation and updates |

## Tenant Usage Report Object

| Attribute | Type | Description |
| -------------- | -------------------------------------------- | ----------------------------- |
|----------------|----------------------------------------------|-------------------------------|
| `token_report` | [Token Report](#tenants-token-report-object) | Token Usage Report for Tenant |

### Token Report Object

| Attribute | Type | Description |
| -------------------------------- | --------------------------------------------------------------- | --------------------------------------------------------------------------- |
|----------------------------------|-----------------------------------------------------------------|-----------------------------------------------------------------------------|
| `metrics_by_type` | _map\<string, [TokenTypeMetrics](#token-type-metrics-object)\>_ | Token Metrics by [TokenType](/docs/api/tokens/token-types) |
| `included_monthly_active_tokens` | _long_ | Number of included monthly active tokens for the billing plan |
| `monthly_active_tokens` | _long_ | Number of tokens that have been created, read, or used in the current month |
Expand All @@ -584,6 +584,6 @@ Returns a [Tenant Usage Report](#tenant-usage-report-object) for the provided `B
### Token Type Metrics Object

| Attribute | Type | Description |
| ----------------- | ------ | ----------------------------------------------- |
|-------------------|--------|-------------------------------------------------|
| `count` | _long_ | Number of tokens |
| `last_created_at` | _date_ | (Optional) Last created date in ISO 8601 format |
5 changes: 4 additions & 1 deletion docs/api/tokens/token-types.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,10 @@ Token Types define the rules around a data type such as validation requirements,
[token containers](/docs/concepts/what-are-containers),
fingerprint expressions, and mask expressions.

When [creating a token](/docs/api/tokens#create-token) and no `fingerprint_expression` is provided, the [`fingerprint_tokens`](/docs/api/tenants/#tenant-settings-object) tenant setting must be set to `true` to automatically apply the Default Fingerprint Expression associated with the token type.
When [creating a token](/docs/api/tokens#create-token) without a `fingerprint_expression`, the
[`fingerprint_tokens`](/docs/api/tenants/#tenant-settings-object) tenant setting must be set to `true`
to automatically apply the Default Fingerprint Expression associated with the token type.
See our docs on [Fingerprints](/docs/expressions/fingerprints) for details.

## Token

Expand Down
33 changes: 19 additions & 14 deletions docs/concepts/what-are-tokens.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,11 @@ Another example may be that you want to format your email and retain the domain

### Fingerprinting

Fingerprinting provides a way to correlate multiple tokens together that contain the same data without needing access to the underlying data. Creating multiple tokens with the same token type, data, and [fingerprint expression](/docs/expressions/fingerprints) will result in the same fingerprint. This can be useful for correlating purchases with the same credit card for multiple members of the same household or helping with master data management of multiple user accounts.

By default, all tokens are fingerprinted with the contents of the `data` property using the default fingerprint expression of `{% raw %}{{ data | stringify }}{% endraw %}`, however you can customize this fingerprint expression to meet your needs of what should uniquely identify a token.
Fingerprinting provides a way to correlate multiple tokens together that contain the same data without needing access to the underlying data.
Creating multiple tokens in a tenant with the same token type, data, and [fingerprint expression](/docs/expressions/fingerprints) will result in the same fingerprint.
This can be useful for correlating purchases with the same credit card for multiple members of the same household or helping with
master data management of multiple user accounts. Token fingerprints are also used for automatic [deduplication](#deduplication) of tokens within a tenant.
For more details about fingerprinting, check out the [docs](/docs/expressions/fingerprints).

In the following example, we will create a token with user data, but we want to fingerprint on the email address:

Expand Down Expand Up @@ -230,19 +232,22 @@ In this example, we have a customer account we want to search over parts of the

In the above example, we can now perform a search with `john`, `doe`, `111-22-3333`, `3333`, `johndoe@[basistheory.com](http://basistheory.com)` or `basistheory.com` and get back the token. To see all additional capabilities of search, see our [API documentation](/docs/api/tokens/search).

## Deduplication
### Deduplication

Duplicate data can be problematic for some systems. It can potentially lead to data consistency problems as multiple copies of the data
need to be kept in sync. For example, you may have an accounts payable system and an e-commerce system both accepting credit cards for customers,
and you want to ensure a single credit card is on file for that customer as the source of truth.
Deduplication may also be required within some business domains; for example, a system may collect sensitive user information and wish to
match or correlate user accounts having identical information.

Duplicate data can be problematic for some systems. This can create data integrity problems in some systems where unique
values are required. For example, you may have an accounts payable system and an e-commerce system both accepting
credit cards for customers, and you want to ensure duplicate credit cards are not on file for that customer.
Deduplication ensures tokens that have the same `fingerprint` return the same token when created.
Token deduplication can help solve these use cases by ensuring that only one copy of equivalent tokenized data can exist within a Basis Theory tenant.
By default, every tokenization request creates a new token, but with deduplication [enabled at the tenant](/docs/api/tenants#tenant-settings-object)
or on each [tokenization request](/docs/api/tokens#create-token), tokens will be deduplicated based
upon their fingerprint. This ensures that if multiple systems or the same system creates multiple tokens with the same
data, they do not create duplicate tokens.
or on each tokenization request, tokens will be deduplicated based
upon their [fingerprint](/docs/concepts/what-are-tokens#fingerprinting). This ensures that if multiple systems or
the same system creates multiple tokens containing the same data, they do not create duplicate tokens.

To deduplicate a token during the tokenization request, we pass the `deduplicate_token` flag to the create token request.
This will override the tenant-level deduplicate tokens setting:
To deduplicate a token during tokenization, pass the `deduplicate_token` flag to the [create token request](/docs/api/tokens#create-token).
This will override the tenant-level deduplicate tokens setting for this request:

```json showLineNumbers
{
Expand All @@ -252,7 +257,7 @@ This will override the tenant-level deduplicate tokens setting:
}
```

In this scenario, if we detect an existing token with the same `fingerprint`, the existing token is returned instead of
If an existing token is found with the same `fingerprint`, the existing token is returned instead of
creating a new token. When an existing token is matched, its data and metadata will only be returned within the response
if the requester has `token:read` permission to the matched token. If the requesting Application does not have read
permission, then the `data`, `metadata`, and other potentially sensitive attributes will be redacted to prevent
Expand Down
26 changes: 22 additions & 4 deletions docs/expressions/fingerprints.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,33 @@
title: Fingerprints
---

import {Alert} from "../../src/components/shared/Alert";

# Fingerprints

A fingerprint can be generated at the time a token is created, which can be used to uniquely identify the contents of a token for its type. You will get different fingerprints for the same content
across different token types, but the same fingerprints for the same content across the same token types. Fingerprints are cryptographically secure and cannot be reversed to recover the original token's data, so they are safe to store in your application and used to compare tokens without retrieving plaintext token data (e.g. for token de-duplication).
A fingerprint can be generated at the time a token is created, which can be used to uniquely identify the contents of a token.
Fingerprints are cryptographically secure and cannot be reversed to recover the original token's data,
so they are safe to store in your application and used to compare tokens without retrieving plaintext token data (e.g. for token [deduplication](/docs/concepts/what-are-tokens#deduplication)).

Fingerprinting can either be globally enabled via the `Fingerprint All Tokens` [tenant setting](/docs/api/tenants#tenant-settings-object),
or controlled on a per-token basis by disabling this setting. For new tenants, this setting defaults to `false`,
and its value can be modified either through the [Portal](https://portal.basistheory.com/settings)
or through the tenant [management API](/docs/api/tenants#update-tenant). Note that if this setting is enabled, there is no way
to disable fingerprinting on a per-token basis.

When [creating a token](/docs/api/tokens#create-token), fingerprint expression can be specified within the request.
You are able to reference the `data` and `metadata` variable within an [object](/docs/expressions#objects) expression -
<Alert>
Since fingerprinting performs a CPU-intensive cryptographic operation, creating tokens with fingerprints will increase
the latency of each request. For this reason, we recommend that you leave the <code>Fingerprint All Tokens</code> setting disabled,
and only enable fingerprinting on a per-token basis when it is necessary unless you need every new token to be fingerprinted.
</Alert>

When specifying a [fingerprint expression](/docs/expressions/fingerprints) during token creation, you may provide an expression
that references the `data` and `metadata` variable within an [object](/docs/expressions#objects) expression -
`data` and `metadata` will be bound to the provided token data and metadata, respectively.

Token fingerprints are unique per tenant and [token type](/docs/api/tokens/token-types), so you will get different fingerprints when tokenizing the same data
across different token types, or across different tenants.

## Examples

### Fingerprinting Primitive Tokens
Expand Down

1 comment on commit b5edb07

@vercel
Copy link

@vercel vercel bot commented on b5edb07 Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.