aboutsummaryrefslogtreecommitdiffstats
path: root/docs/tiers.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/tiers.md')
-rw-r--r--docs/tiers.md48
1 files changed, 48 insertions, 0 deletions
diff --git a/docs/tiers.md b/docs/tiers.md
new file mode 100644
index 0000000..60cc7b3
--- /dev/null
+++ b/docs/tiers.md
@@ -0,0 +1,48 @@
+# Application Tiers
+
+## Definition
+
+Platforms and services can have different expectations depending on the technologies used, its support systems, and customer-impact.
+This document defines those expectations into four "tiers" from the most-critical (Tier 1) to the least-critical (Tier 4).
+
+### Base Requirements
+
+- Teams MUST plan for both course-of-business failures and disaster-level events.
+- Teams MUST assign a tier number for each application (service, platform, or system) they support.
+- Applications MUST meet the availability and resilience targets of their tier.
+
+### Tier 1
+
+Tier 1 applications are **core/critical systems upon which all else is built**.
+Examples include Active Directory, Kubernetes clusters (Dev, Integration, or Production), and Datacenter Firewalls.
+
+Tier 1 applications MUST provide at least **[%99.95 availability](https://uptime.is/99.95)** or less than four hours of downtime n-total per year.
+
+### Tier 2
+
+Tier 2 applications are **critical and/or time-sensitive**.
+Such applications could include a customer-facing billing system, an IAM gateway, or a central code-management platform (Github).
+
+Tier 2 applications MUST provide at least **[%99.9 availability](https://uptime.is/99.9)** or less than nine hours of downtime in-total per year.
+
+### Tier 3
+
+Tier 3 applications are **important and not time-sensitive**.
+These include systems for end-of-month billing, internal (non-customer-impacting) metrics,
+
+Tier 3 applications MUST provide at least **[%99 availability](https://uptime.is/99)** or less than four days of downtime in-total per year.
+
+### Tier 4
+
+Tier 4 applications have **low impact when delayed**.
+Tier 4 applications include everything which is not assigned to other tiers.
+
+Tier 4 applications MUST provide at least **[%97 availability](https://uptime.is/97)** or less than ten days of downtime in-total per year.
+
+## Availability Calculations
+
+Availability is calculated based on whether a given service is responding *correctly* and is *communicating*.
+
+- If a service endpoint is not reachable, but is otherwise returning positive metrics (or no errors), the system is not Available.
+- If a service endpoint can be connected, but only returns error messages, the system is not Available.
+- If a service has a combination of unreachability and erroneous responses for a duration which exceeds the Availability limit, that service has broken its SLA guarantee.i