# Application Tiers ## Definition Platforms and services can have different expectations depending on the technologies used, its support systems, and customer-impact. This document defines those expectations into four "tiers" from the most-critical (Tier 1) to the least-critical (Tier 4). ### Base Requirements - Teams MUST plan for both course-of-business failures and disaster-level events. - Teams MUST assign a tier number for each application (service, platform, or system) they support. - Applications MUST meet the availability and resilience targets of their tier. ### Tier 1 Tier 1 applications are **core/critical systems upon which all else is built**. Examples include Active Directory, Kubernetes clusters (Dev, Integration, or Production), and Datacenter Firewalls. Tier 1 applications MUST provide at least **[%99.95 availability](https://uptime.is/99.95)** or less than four hours of downtime n-total per year. ### Tier 2 Tier 2 applications are **critical and/or time-sensitive**. Such applications could include a customer-facing billing system, an IAM gateway, or a central code-management platform (Github). Tier 2 applications MUST provide at least **[%99.9 availability](https://uptime.is/99.9)** or less than nine hours of downtime in-total per year. ### Tier 3 Tier 3 applications are **important and not time-sensitive**. These include systems for end-of-month billing, internal (non-customer-impacting) metrics, Tier 3 applications MUST provide at least **[%99 availability](https://uptime.is/99)** or less than four days of downtime in-total per year. ### Tier 4 Tier 4 applications have **low impact when delayed**. Tier 4 applications include everything which is not assigned to other tiers. Tier 4 applications MUST provide at least **[%97 availability](https://uptime.is/97)** or less than ten days of downtime in-total per year. ## Availability Calculations Availability is calculated based on whether a given service is responding *correctly* and is *communicating*. - If a service endpoint is not reachable, but is otherwise returning positive metrics (or no errors), the system is not Available. - If a service endpoint can be connected, but only returns error messages, the system is not Available. - If a service has a combination of unreachability and erroneous responses for a duration which exceeds the Availability limit, that service has broken its SLA guarantee.i