aboutsummaryrefslogtreecommitdiffstats
path: root/README.md
blob: 3832ec4ec762d69bc7844c1277f1d9605a9a1da9 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# FortKnox Data Protection Service (FK)

Provides at least three core systems:

- Central library which performs tokenization / redemption
- Self-hosted HTTP+JSON endpoint for Tokenization / Redemption
- (Optional) SQL proxy mode. 
  SQL statements containing `TOKENIZE(…)` and `REDEEM(…)` will make the application strip out the values, perform the exchange on-server, and then pass upstream to SQL datastore containing only the replaced values (only tokenized data goes to/from the sql datastore).

## Requirements

- Rust 1.74+
- git

## Token Specification

### Token Format

Tokens are 128-bit (16 byte) UUIDs which are always base64 encoded using the URL-safe alphabet without padding.
This means that every token is a 22-character string containing the alphabet: "`[a-z][A-Z][0-9]_-`"

### Namespacing

Tokens MUST always be generated within a namespace.
If a namespace is not provided, the request is rejected.

### Prefixing

A prefix is an unsigned 16-bit value (Hex: 0x000-0xFFFF) as a means of uniquely identifying token sources.
The 16-bit value has a maximum value of 65,536 (0-65535 inclusive).

Prefixes MAY be set via a runtime configuration or defined in the datastore within a namespace. 
Once defined in the datastore, such prefixes MUST NOT be changed.

## API Specification

Endpoints include:

- `/ping` - liveness check
- `/ready` - readiness check
- `/api/` - API related documentation, including OpenAPI spec
- `/health` - limited internal health data: backend DB type, latency to backend(s), cache usage, prefix (if enabled), signing pubkey (if enabled) 
- `/` - Tokenize or Redeem endpoint, split per deployment.

## Notes

- System should be self-contained / self-hosting.
  Extra "parts" should be separable and/or unnecessary for normal functioning up to a certain limit.
- Must leverage a sqlite datastore by default.
- Connect to PostgreSQL, Oracle, or other provdiers via ODBC connector (?)

## Limitations

If operating with a remote database, FK must not try to operate in a peering / cluster mode.

### Namespace limits

Non-prefixed UUIDs will follow the UUIDv4-Variant1 specification in [RFC-4122](https://www.rfc-editor.org/rfc/rfc4122#section-4.4):

                0      0 0      1 1      2 2      3
                0      7 8      5 6      3 4      1
                -----------------------------------
     000-031    xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx 
     032-063    xxxxxxxx xxxxxxxx 0100xxxx xxxxxxxx
     064-095    01xxxxxx xxxxxxxx xxxxxxxx xxxxxxxx
     096-127    xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx


Prefixed Tokens will use a 16-bit identifier replacing the "random" least-significant bits of the time-low (clock) sequence.
In order to support this and prevent compatibility problems with other UUID representations, FortKnox will [generate UUIDv8-based tokens](https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format#name-uuid-version-8).

Assuming P is an identifier bit and using a zero-index count, the bit-specific structure would be as follows: 

                0      0 0      1 1      2 2      3
                0      7 8      5 6      3 4      1
                -----------------------------------
     000-031    PPPPPPPP PPPPPPPP xxxxxxxx xxxxxxxx 
     032-063    xxxxxxxx xxxxxxxx 1000xxxx xxxxxxxx
     064-095    01xxxxxk xxxxxxxx xxxxxxxx xxxxxxxx
     096-127    xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx


This leaves a maximum table space of `2^106` values or 8.1129638415e31 bits. At 100 bytes per associated token, this allows for billions of exabytes per regional namespace.