diff options
Diffstat (limited to 'docs/tiers.md')
-rw-r--r-- | docs/tiers.md | 48 |
1 files changed, 48 insertions, 0 deletions
diff --git a/docs/tiers.md b/docs/tiers.md new file mode 100644 index 0000000..60cc7b3 --- /dev/null +++ b/docs/tiers.md @@ -0,0 +1,48 @@ +# Application Tiers + +## Definition + +Platforms and services can have different expectations depending on the technologies used, its support systems, and customer-impact. +This document defines those expectations into four "tiers" from the most-critical (Tier 1) to the least-critical (Tier 4). + +### Base Requirements + +- Teams MUST plan for both course-of-business failures and disaster-level events. +- Teams MUST assign a tier number for each application (service, platform, or system) they support. +- Applications MUST meet the availability and resilience targets of their tier. + +### Tier 1 + +Tier 1 applications are **core/critical systems upon which all else is built**. +Examples include Active Directory, Kubernetes clusters (Dev, Integration, or Production), and Datacenter Firewalls. + +Tier 1 applications MUST provide at least **[%99.95 availability](https://uptime.is/99.95)** or less than four hours of downtime n-total per year. + +### Tier 2 + +Tier 2 applications are **critical and/or time-sensitive**. +Such applications could include a customer-facing billing system, an IAM gateway, or a central code-management platform (Github). + +Tier 2 applications MUST provide at least **[%99.9 availability](https://uptime.is/99.9)** or less than nine hours of downtime in-total per year. + +### Tier 3 + +Tier 3 applications are **important and not time-sensitive**. +These include systems for end-of-month billing, internal (non-customer-impacting) metrics, + +Tier 3 applications MUST provide at least **[%99 availability](https://uptime.is/99)** or less than four days of downtime in-total per year. + +### Tier 4 + +Tier 4 applications have **low impact when delayed**. +Tier 4 applications include everything which is not assigned to other tiers. + +Tier 4 applications MUST provide at least **[%97 availability](https://uptime.is/97)** or less than ten days of downtime in-total per year. + +## Availability Calculations + +Availability is calculated based on whether a given service is responding *correctly* and is *communicating*. + +- If a service endpoint is not reachable, but is otherwise returning positive metrics (or no errors), the system is not Available. +- If a service endpoint can be connected, but only returns error messages, the system is not Available. +- If a service has a combination of unreachability and erroneous responses for a duration which exceeds the Availability limit, that service has broken its SLA guarantee.i |