aboutsummaryrefslogtreecommitdiffstats
path: root/docs/tiers.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/tiers.md')
-rw-r--r--docs/tiers.md64
1 files changed, 43 insertions, 21 deletions
diff --git a/docs/tiers.md b/docs/tiers.md
index 60cc7b3..14af482 100644
--- a/docs/tiers.md
+++ b/docs/tiers.md
@@ -2,47 +2,69 @@
## Definition
-Platforms and services can have different expectations depending on the technologies used, its support systems, and customer-impact.
-This document defines those expectations into four "tiers" from the most-critical (Tier 1) to the least-critical (Tier 4).
+Platforms and services can have different expectations depending on the
+technologies used, its support systems, and customer-impact. This
+document defines those expectations into four "tiers" from the
+most-critical (Tier 1) to the least-critical (Tier 4).
### Base Requirements
-- Teams MUST plan for both course-of-business failures and disaster-level events.
-- Teams MUST assign a tier number for each application (service, platform, or system) they support.
-- Applications MUST meet the availability and resilience targets of their tier.
+- Teams MUST plan for both course-of-business failures and
+ disaster-level events.
+- Teams MUST assign a tier number for each application (service,
+ platform, or system) they support.
+- Applications MUST meet the availability and resilience targets of
+ their tier.
### Tier 1
-Tier 1 applications are **core/critical systems upon which all else is built**.
-Examples include Active Directory, Kubernetes clusters (Dev, Integration, or Production), and Datacenter Firewalls.
+Tier 1 applications are **core/critical systems upon which all else is
+built**. Examples include Active Directory, Kubernetes clusters (Dev,
+Integration, or Production), and Datacenter Firewalls.
-Tier 1 applications MUST provide at least **[%99.95 availability](https://uptime.is/99.95)** or less than four hours of downtime n-total per year.
+Tier 1 applications MUST provide at least **[%99.95
+availability](https://uptime.is/99.95)** or less than four hours of
+downtime n-total per year.
### Tier 2
-Tier 2 applications are **critical and/or time-sensitive**.
-Such applications could include a customer-facing billing system, an IAM gateway, or a central code-management platform (Github).
+Tier 2 applications are **critical and/or time-sensitive**. Such
+applications could include a customer-facing billing system, an IAM
+gateway, or a central code-management platform (Github).
-Tier 2 applications MUST provide at least **[%99.9 availability](https://uptime.is/99.9)** or less than nine hours of downtime in-total per year.
+Tier 2 applications MUST provide at least **[%99.9
+availability](https://uptime.is/99.9)** or less than nine hours of
+downtime in-total per year.
### Tier 3
-Tier 3 applications are **important and not time-sensitive**.
-These include systems for end-of-month billing, internal (non-customer-impacting) metrics,
+Tier 3 applications are **important and not time-sensitive**. These
+include systems for end-of-month billing, internal
+(non-customer-impacting) metrics,
-Tier 3 applications MUST provide at least **[%99 availability](https://uptime.is/99)** or less than four days of downtime in-total per year.
+Tier 3 applications MUST provide at least **[%99
+availability](https://uptime.is/99)** or less than four days of downtime
+in-total per year.
### Tier 4
-Tier 4 applications have **low impact when delayed**.
-Tier 4 applications include everything which is not assigned to other tiers.
+Tier 4 applications have **low impact when delayed**. Tier 4
+applications include everything which is not assigned to other tiers.
-Tier 4 applications MUST provide at least **[%97 availability](https://uptime.is/97)** or less than ten days of downtime in-total per year.
+Tier 4 applications MUST provide at least **[%97
+availability](https://uptime.is/97)** or less than ten days of downtime
+in-total per year.
## Availability Calculations
-Availability is calculated based on whether a given service is responding *correctly* and is *communicating*.
+Availability is calculated based on whether a given service is
+responding *correctly* and is *communicating*.
+
+- If a service endpoint is not reachable, but is otherwise returning
+ positive metrics (or no errors), the system is not Available.
+- If a service endpoint can be connected, but only returns error
+ messages, the system is not Available.
+- If a service has a combination of unreachability and erroneous
+ responses for a duration which exceeds the Availability limit, that
+ service has broken its SLA guarantee.i
-- If a service endpoint is not reachable, but is otherwise returning positive metrics (or no errors), the system is not Available.
-- If a service endpoint can be connected, but only returns error messages, the system is not Available.
-- If a service has a combination of unreachability and erroneous responses for a duration which exceeds the Availability limit, that service has broken its SLA guarantee.i