# Application Tiers

## Definition

Platforms and services can have different expectations depending on the technologies used, its support systems, and customer-impact.
This document defines those expectations into four "tiers" from the most-critical (Tier 1) to the least-critical (Tier 4).

### Base Requirements

- Teams MUST plan for both course-of-business failures and disaster-level events.
- Teams MUST assign a tier number for each application (service, platform, or system) they support.
- Applications MUST meet the availability and resilience targets of their tier.

### Tier 1

Tier 1 applications are **core/critical systems upon which all else is built**.
Examples include Active Directory, Kubernetes clusters (Dev, Integration, or Production), and Datacenter Firewalls.

Tier 1 applications MUST provide at least **[%99.95 availability](https://uptime.is/99.95)** or less than four hours of downtime n-total per year.

### Tier 2

Tier 2 applications are **critical and/or time-sensitive**.
Such applications could include a customer-facing billing system, an IAM gateway, or a central code-management platform (Github).

Tier 2 applications MUST provide at least **[%99.9 availability](https://uptime.is/99.9)** or less than nine hours of downtime in-total per year.

### Tier 3

Tier 3 applications are **important and not time-sensitive**.
These include systems for end-of-month billing, internal (non-customer-impacting) metrics,

Tier 3 applications MUST provide at least **[%99 availability](https://uptime.is/99)** or less than four days of downtime in-total per year.

### Tier 4

Tier 4 applications have **low impact when delayed**.
Tier 4 applications include everything which is not assigned to other tiers.

Tier 4 applications MUST provide at least **[%97 availability](https://uptime.is/97)** or less than ten days of downtime in-total per year.

## Availability Calculations

Availability is calculated based on whether a given service is responding *correctly* and is *communicating*.

- If a service endpoint is not reachable, but is otherwise returning positive metrics (or no errors), the system is not Available.
- If a service endpoint can be connected, but only returns error messages, the system is not Available.
- If a service has a combination of unreachability and erroneous responses for a duration which exceeds the Availability limit, that service has broken its SLA guarantee.i