aboutsummaryrefslogtreecommitdiffstats
path: root/docs/datastores.md
blob: ef22c727e33cae673f5f70771a04fcc264480962 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# Data Stores

## Scope

This standard prescribes database and data storage technologies used to solve many related data-retention concerns.
The solutions recommended below are designed to encourage deep expertise in a few stable and well-understood systems, rather than maximal "fit" for each distinct use case.

As such, the solutions may not be the most optimal but their performance, maintenance, optimizations, and reliability requirements are understood and supported by the engineering community.

## Terms

- _Database_: provides long-term, durable storage for data whose loss or unavailability would mean violating an application's Availability or Business requirements.
- _Cache_: provides short-term, volatile storage which does not preserve data.
  Caches are not in the scope of this standard.

## Capability Matrix

| Capabilities | RDBMS | KV Store | File/Object |
|--------------|-------|----------|-------------|
| [Relational]            | X | O |   |
| [Key-Value]             | X | X | X |
| [Document-Oriented]     | X | X | X |
| [Object-Based]          | X | X | X |

_O_: While KV-stores cannot store relational data, some KV-focused databases provide relational-like "tagging" and other attribute aggregations.

## Selection Criteria

### PostgreSQL

Applications SHOULD use PostgreSQL (Aurora in Cloud environments and the latest stable release in OnPrem environments.)

PostgreSQL across all environments supports all storage methods including:

- Simple [Key-Value] stores (via [hstore])
- [Document-Oriented] storage and queries (via [jsonb])
- [Large-object] storage directly within the database

If the application is running in the Cloud environment and needs a total data size over [64TB (RDS)][1] or [128TB (Aurora)][2], then it MUST use another approved option.

### DynamoDB

If an Application is hosted in AWS and requires many of:

- Flexible, schemaless data model that will change in the future
- Read-heavy access model for items
- Single-millisecond response times
- Very fast caching for hot keys and values (< 10ms)
- Multi-region deployments

and does not require any of:

- Strongly consistent reads and writes in all situations
- Item/Row sizes over 400 KB
- A maximum data size of over 1TB
- Joins between different stored values

then it MAY use DynamoDB.

### S3 / File store

If the Applications persists data with many of the following:

- Large files (>100MB each)
- Large volumes of data (>1 million records)
- Simple identification requirements (e.g., "filename as key")
- Simple relational requirements (e.g., folders and files)

and does not require any of the following:

- Low-latency access (< 200ms)
- Relational join operations
- Low response times (< 1000ms)

then it MAY use a filestore or S3.

[Aurora (PostgreSQL)]: https://aws.amazon.com/rds/aurora/postgresql-features/
[RDS (PostgreSQL)]: https://aws.amazon.com/rds/postgresql/
[DynamoDB]: https://aws.amazon.com/dynamodb/
[S3]: https://aws.amazon.com/s3/
[Relational]:https://en.wikipedia.org/wiki/Relational_database
[Key-Value]: https://en.wikipedia.org/wiki/Key-value_database
[Document-Oriented]: https://en.wikipedia.org/wiki/Document-oriented_database
[Object-Based]: https://en.wikipedia.org/wiki/Object_storage#Cloud_storage
[hstore]: https://www.postgresql.org/docs/13/hstore.html
[jsonb]: https://www.postgresql.org/docs/current/datatype-json.html
[Large-object]: https://www.postgresql.org/docs/13/largeobjects.html
[1]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html
[2]: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/CHAP_Limits.html