aboutsummaryrefslogtreecommitdiffstats
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/alerting.md15
-rw-r--r--docs/core.md25
-rw-r--r--docs/datastores.md33
-rw-r--r--docs/git-recommendations.md25
-rw-r--r--docs/languages/golang.md31
-rw-r--r--docs/languages/java.md3
-rw-r--r--docs/languages/nodejs.md15
-rw-r--r--docs/monitoring.md143
-rw-r--r--docs/observability.md30
-rw-r--r--docs/tiers.md64
10 files changed, 261 insertions, 123 deletions
diff --git a/docs/alerting.md b/docs/alerting.md
index 8f8573d..de8d6a9 100644
--- a/docs/alerting.md
+++ b/docs/alerting.md
@@ -1,12 +1,13 @@
# Alerting
-Alerts are signals from [Monitors][1] to perform actions.
-Alerts MUST be _meaningful_ and _actionable_.
+Alerts are signals from [Monitors][1] to perform actions. Alerts MUST
+be _meaningful_ and _actionable_.
-**Meaningful**: Alert only on montiors which indicate a problem.
-(See [Monitoring][1] subsection "Saturation".)
+**Meaningful**: Alert only on montiors which indicate a problem. (See
+[Monitoring][1] subsection "Saturation".)
-**Actionable**: Alerts MUST always include a corresponding action to resolve or investigate the underlying condition.
+**Actionable**: Alerts MUST always include a corresponding action to
+resolve or investigate the underlying condition.
## Requirements
@@ -14,8 +15,8 @@ Alerts MUST be _meaningful_ and _actionable_.
- Alerts MUST begin with [PagerDuty][2].
- Alert notifications MAY forward to Slack.
- Alert notifications MAY forward via Email.
-- Alerts MUST NOT deduplicate automatically.
- A human MAY aggregate alerts if unknown dependencies provide alerts during an incident.
+- Alerts MUST NOT deduplicate automatically. A human MAY aggregate
+ alerts if unknown dependencies provide alerts during an incident.
[1]: monitoring.md
[2]: https://www.pagerduty.com/
diff --git a/docs/core.md b/docs/core.md
index d73b122..1e16d54 100644
--- a/docs/core.md
+++ b/docs/core.md
@@ -1,23 +1,30 @@
# Core Rules and Guidelines for Standards
-The following contain both requirements and guidelines for producing Standards documents.
+The following contain both requirements and guidelines for producing
+Standards documents.
## Terms
-- Standards review group (*SRG*): assigned group which reviews and approves changes to the Standards.
+- Standards review group (*SRG*): assigned group which reviews and
+ approves changes to the Standards.
## Requirements
-- The SRG MUST review outstanding change requests (PRs) every two weeks (14 days).
-- PRs MUST remain open for comment no less than 5 business days (or one week).
- - PRs MAY remain open for more than 2 weeks if the discussion is active and ongoing.
-- The SRG MUST publish Standards no more frequently than once per week, nor less frequently than once every three months.
+- The SRG MUST review outstanding change requests (PRs) every two weeks
+ (14 days).
+- PRs MUST remain open for comment no less than 5 business days (or one
+ week).
+ - PRs MAY remain open for more than 2 weeks if the discussion is
+ active and ongoing.
+- The SRG MUST publish Standards no more frequently than once per week,
+ nor less frequently than once every three months.
Docuents MUST:
- Follow [RFC-2119](rfc2119.txt).
- Be as specific as possible (e.g., no colloquial langauge).
-- Be concise and use [active voice](https://writing.wisc.edu/handbook/style/ccs_activevoice/).
+- Be concise and use [active
+ voice](https://writing.wisc.edu/handbook/style/ccs_activevoice/).
- Capitalize all words in headings.
- Use American English spellings and conventions.
@@ -25,4 +32,6 @@ Docuents MUST:
All Standards documents:
-- SHOULD follow the [Google Style Guide](https://developers.google.com/style/lists#capitalization-and-end-punctuation) for punctuation and capitalization.
+- SHOULD follow the [Google Style
+ Guide](https://developers.google.com/style/lists#capitalization-and-end-punctuation)
+ for punctuation and capitalization.
diff --git a/docs/datastores.md b/docs/datastores.md
index ef22c72..5d1955b 100644
--- a/docs/datastores.md
+++ b/docs/datastores.md
@@ -2,16 +2,23 @@
## Scope
-This standard prescribes database and data storage technologies used to solve many related data-retention concerns.
-The solutions recommended below are designed to encourage deep expertise in a few stable and well-understood systems, rather than maximal "fit" for each distinct use case.
+This standard prescribes database and data storage technologies used to
+solve many related data-retention concerns. The solutions recommended
+below are designed to encourage deep expertise in a few stable and
+well-understood systems, rather than maximal "fit" for each distinct use
+case.
-As such, the solutions may not be the most optimal but their performance, maintenance, optimizations, and reliability requirements are understood and supported by the engineering community.
+As such, the solutions may not be the most optimal but their
+performance, maintenance, optimizations, and reliability requirements
+are understood and supported by the engineering community.
## Terms
-- _Database_: provides long-term, durable storage for data whose loss or unavailability would mean violating an application's Availability or Business requirements.
-- _Cache_: provides short-term, volatile storage which does not preserve data.
- Caches are not in the scope of this standard.
+- _Database_: provides long-term, durable storage for data whose loss or
+ unavailability would mean violating an application's Availability or
+ Business requirements.
+- _Cache_: provides short-term, volatile storage which does not preserve
+ data. Caches are not in the scope of this standard.
## Capability Matrix
@@ -22,21 +29,27 @@ As such, the solutions may not be the most optimal but their performance, mainte
| [Document-Oriented] | X | X | X |
| [Object-Based] | X | X | X |
-_O_: While KV-stores cannot store relational data, some KV-focused databases provide relational-like "tagging" and other attribute aggregations.
+_O_: While KV-stores cannot store relational data, some KV-focused
+databases provide relational-like "tagging" and other attribute
+aggregations.
## Selection Criteria
### PostgreSQL
-Applications SHOULD use PostgreSQL (Aurora in Cloud environments and the latest stable release in OnPrem environments.)
+Applications SHOULD use PostgreSQL (Aurora in Cloud environments and the
+latest stable release in OnPrem environments.)
-PostgreSQL across all environments supports all storage methods including:
+PostgreSQL across all environments supports all storage methods
+including:
- Simple [Key-Value] stores (via [hstore])
- [Document-Oriented] storage and queries (via [jsonb])
- [Large-object] storage directly within the database
-If the application is running in the Cloud environment and needs a total data size over [64TB (RDS)][1] or [128TB (Aurora)][2], then it MUST use another approved option.
+If the application is running in the Cloud environment and needs a total
+data size over [64TB (RDS)][1] or [128TB (Aurora)][2], then it MUST use
+another approved option.
### DynamoDB
diff --git a/docs/git-recommendations.md b/docs/git-recommendations.md
index fcc19a0..d8c5d3d 100644
--- a/docs/git-recommendations.md
+++ b/docs/git-recommendations.md
@@ -1,25 +1,34 @@
# Git Recommendations
-The following are a collection of suggestions to best use Git as a source control system.
+The following are a collection of suggestions to best use Git as a
+source control system.
- Commits SHOULD represent a logical unit of work.
- Commit frequently
- Push your commits to a branch on your own fork, if possible.
- Write [descriptive commit messages].
-- Keep remote repository up-to-date by committing and pushing your work regularly.
-- Keep your local copies of repositories up-to-date by regularly pulling changes.
- - Frequently pull from upstream to the `main` branch, and rebase your changes on top (see the [rebase workflow]).
-- Coordinate with colleagues to avoid nasty merge conflicts (if you can).
+- Keep remote repository up-to-date by committing and pushing your work
+ regularly.
+- Keep your local copies of repositories up-to-date by regularly pulling
+ changes.
+ - Frequently pull from upstream to the `main` branch, and rebase your
+ changes on top (see the [rebase workflow]).
+- Coordinate with colleagues to avoid nasty merge conflicts (if you
+ can).
- Try the git [rebase workflow].
- Use branches and merge requests.
- Pick a branching strategy that works for you and your team.
- Consider [GitHub Flow](https://guides.github.com/introduction/flow/index.html).
+ Consider [GitHub
+ Flow](https://guides.github.com/introduction/flow/index.html).
- Create a new branch for each feature or bugfix.
- The `main` branch SHOULD always contain releasable code.
- Protect your `main` branch; require merge requests to make changes.
- Configure a group of default reviewers for your pull requests.
- - For teams of 4 or more, require a minimum of 2 approvers for all merge requests.
-- Avoid force operations (`-f` or `--force` option), especially on `main` branch, as this is an indication you are probably doing something wrong.
+ - For teams of 4 or more, require a minimum of 2 approvers for all
+ merge requests.
+- Avoid force operations (`-f` or `--force` option), especially on
+ `main` branch, as this is an indication you are probably doing
+ something wrong.
- Keep your repository neat: always delete merged branches.
[descriptive commit messages]: https://cbea.ms/git-commit/#seven-rules
diff --git a/docs/languages/golang.md b/docs/languages/golang.md
index 6f1cf2f..1a054ec 100644
--- a/docs/languages/golang.md
+++ b/docs/languages/golang.md
@@ -2,16 +2,20 @@
## Definitions
-- A `Program` is a program, service, or application which is NOT a library.
+- A `Program` is a program, service, or application which is NOT a
+ library.
- A `Library` is code designed only for consumption by other programs.
## Requirements
-- Builds MUST use the [Go-provided compiler][1].
- Builds MUST NOT use the [gcc-go][2] compiler or other alternatives.
-- Programs MUST update and commit the `go.mod` and `go.sum` using `go mod tidy`.
-- Programs MUST [vendor dependency code][4] and commit the vendored code to their repository.
-- CI builds MUST use the [golangci-lint][5] linter as a first-stage validation step
+- Builds MUST use the [Go-provided compiler][1]. Builds MUST NOT use
+ the [gcc-go][2] compiler or other alternatives.
+- Programs MUST update and commit the `go.mod` and `go.sum` using `go
+ mod tidy`.
+- Programs MUST [vendor dependency code][4] and commit the vendored code
+ to their repository.
+- CI builds MUST use the [golangci-lint][5] linter as a first-stage
+ validation step
- Programs SHOULD use the [standard project layout][3]
- Programs MUST NOT use CGO unless there is no pure-Go alternative.
Appropriate uses of CGO include Oracle DB drivers, GPGPU computation.
@@ -27,16 +31,19 @@
### Local Environment
-- Run `go build`, `golangci-lint run`, and `go test` before pushing code for your PR.
-- Fork the repository, then commit and push changes to the fork frequently.
- This avoids catastrophic data loss and enables Work In Progress (WIP) sharing.
+- Run `go build`, `golangci-lint run`, and `go test` before pushing code
+ for your PR.
+- Fork the repository, then commit and push changes to the fork
+ frequently. This avoids catastrophic data loss and enables Work In
+ Progress (WIP) sharing.
### Go language
- Errors MUST be handled
-- Programs and Libaries SHOULD NOT use third-party libraries.
- Prefer standard library packages.
-- Use [gofumports](https://github.com/mvdan/gofumpt) for formatting and automatic imports
+- Programs and Libaries SHOULD NOT use third-party libraries. Prefer
+ standard library packages.
+- Use [gofumports](https://github.com/mvdan/gofumpt) for formatting and
+ automatic imports
- Test functions MUST check both the error result and the returned data.
[RFC2119]:https://www.rfc-editor.org/rfc/rfc2119.txt
diff --git a/docs/languages/java.md b/docs/languages/java.md
index b0119f6..d2cca4b 100644
--- a/docs/languages/java.md
+++ b/docs/languages/java.md
@@ -7,7 +7,8 @@ TBD
## Guidelines
- Follow the [Google Style Guide][1] for formatting.
- - Use the automatic formatting rules for IntelliJ IDEA and Eclipse [available here][2].
+ - Use the automatic formatting rules for IntelliJ IDEA and Eclipse
+ [available here][2].
[1]: https://google.github.io/styleguide/javaguide.html
[2]: https://raw.githubusercontent.com/google/styleguide/gh-pages/intellij-java-google-style.xml
diff --git a/docs/languages/nodejs.md b/docs/languages/nodejs.md
index 9eaede6..56507f6 100644
--- a/docs/languages/nodejs.md
+++ b/docs/languages/nodejs.md
@@ -2,12 +2,15 @@
## Requirements
-- Applications MUST target the latest LTS release of NodeJS (currently v16).
+- Applications MUST target the latest LTS release of NodeJS (currently
+ v16).
- Applicatons MUST use ESLint per the _Linting_ guidelines below.
### Linting
-[ESLint](https://eslint.org/) is a linter used with Javascript to detect and enforce code and style guidelines.
-There are several shared, public configs that provide base rules.
-The config SHOULD extend the [standardjs](https://standardjs.com/) ESLint configuration.
-This is different from the general language
-standard to use [Google coding standards](https://github.com/google/eslint-config-google).
+
+[ESLint](https://eslint.org/) is a linter used with Javascript to detect
+and enforce code and style guidelines. There are several shared, public
+configs that provide base rules. The config SHOULD extend the
+[standardjs](https://standardjs.com/) ESLint configuration. This is
+different from the general language standard to use [Google coding
+standards](https://github.com/google/eslint-config-google).
diff --git a/docs/monitoring.md b/docs/monitoring.md
index d868672..56e5008 100644
--- a/docs/monitoring.md
+++ b/docs/monitoring.md
@@ -1,78 +1,111 @@
-# Monitoring
+# Monitoring (or Observability-v1)
-This standard presents guidelines for providing operational montioring and alerting of applications and platforms for internal teams.
-This standard does not provide guidelines for alerting external Customers or Vendors.
+This standard presents guidelines for providing operational montioring
+and alerting of applications and platforms for internal teams. This
+standard does not provide guidelines for alerting external Customers or
+Vendors.
-Terminology will be introduced throughout this document.
-Much of the terminology is drawn from [the Google SRE Book][2].
-If you are unfamiliar with monitoring and alerting, especially for distributed systems, you SHOULD read [this section][1] of the SRE book.
+Terminology will be introduced throughout this document. Much of the
+terminology is drawn from [the Google SRE Book][2]. If you are
+unfamiliar with monitoring and alerting, especially for distributed
+systems, you SHOULD read [this section][1] of the SRE book.
## Requirements
All Applications:
- MUST instrument their primary code paths with APM/tracing.
-- MUST instrument and [alert][7] on the four primary signals: Latency, Traffic, Errors, and Saturation.
-- MUST NOT add unique values to time-series metrics tagging (e.g., request IDs, UUIDs, other high-cardinality values).
+- MUST instrument and [alert][7] on the four primary signals: Latency,
+ Traffic, Errors, and Saturation.
+- MUST NOT add unique values to time-series metrics tagging (e.g.,
+ request IDs, UUIDs, other high-cardinality values).
- SHOULD NOT use logs to track data captured by metrics or APM.
-- SHOULD consolidate logging statements to a single line (e.g., report stacktraces on a single line, multi-line payloads rendered in single-line format, etc).
+- SHOULD consolidate logging statements to a single line (e.g., report
+ stacktraces on a single line, multi-line payloads rendered in
+ single-line format, etc).
## Observability Concepts
-Observability is about understanding how an application performs at runtime, with real-world use cases and data.
+Observability is about understanding how an application performs at
+runtime, with real-world use cases and data.
-There are two ways to gather observability information: instrumenting code which emits data to an aggregator (_Push_), or instrumenting code which exposes the information for an external system to query (_Pull_ / _Poll_).
+There are two ways to gather observability information: instrumenting
+code which emits data to an aggregator (_Push_), or instrumenting code
+which exposes the information for an external system to query (_Pull_ /
+_Poll_).
-Each method has its advantages, but _Push_ allows observability mechanisms to be incorporated within the application itself.
-Using Push mechanisms significantly [reduce the overall complexity][3] of the system.
+Each method has its advantages, but _Push_ allows observability
+mechanisms to be incorporated within the application itself. Using Push
+mechanisms significantly [reduce the overall complexity][3] of the
+system.
### Definitions
-Certain words have different definitions from their common use in conversation.
-For the purposes of monitoring and alerting, the following definitions apply:
-
-- **Observe**: to understand how an application behaves, with real world use cases.
-- **Monitoring**: the act of collecting information used to _Observe_ an application.
-- **Event**: a record of something which happened, produced by a Monitor.
- - A monitoring _event_ is not the same as events used in other systems such as Databases, Cloud Providers, Apache Kafka, etc.
- - Events are records, not signals, and MUST represent something that actually happened.
- Such occurrences represent decisions made by the Development, SRE, and
- Security teams to highlight _meaningful_ occurrences.
+Certain words have different definitions from their common use in
+conversation. For the purposes of monitoring and alerting, the
+following definitions apply:
+
+- **Observe**: to understand how an application behaves, with real world
+ use cases.
+- **Monitoring**: the act of collecting information used to _Observe_ an
+ application.
+- **Event**: a record of something which happened, produced by a
+ Monitor.
+ - A monitoring _event_ is not the same as events used in other systems
+ such as Databases, Cloud Providers, Apache Kafka, etc.
+ - Events are records, not signals, and MUST represent something that
+ actually happened. Such occurrences represent decisions made by the
+ Development, SRE, and Security teams to highlight _meaningful_
+ occurrences.
## Types of Monitoring
Also known as the _Pillars of Observability_.
-Monitoring follows three broad categories: _Time Series_ metrics, _Application Performance Monitoring_ (APM) or _Tracing_, and _Logging_.
-Each of the categories has its advantages and disadvantages, but APM tends to have the best results across all three categories, and often costs the least for a volume of data or events.
+Monitoring follows three broad categories: _Time Series_ metrics,
+_Application Performance Monitoring_ (APM) or _Tracing_, and _Logging_.
+Each of the categories has its advantages and disadvantages, but APM
+tends to have the best results across all three categories, and often
+costs the least for a volume of data or events.
### Time Series
-Time Series metrics are points of data, sometimes aggregated, which MAY answer very simple questions like:
+Time Series metrics are points of data, sometimes aggregated, which MAY
+answer very simple questions like:
- Is the application running?
-- How many requests / transactions per [time value] is the application processing?
-- Does the application need to scale (horizontally) or deploy with more resources?
+- How many requests / transactions per [time value] is the application
+ processing?
+- Does the application need to scale (horizontally) or deploy with more
+ resources?
-Time Series metrics are good at observing trends, handling large volumes of data with minimal infrastructure, and for tracking (normally) slow-changing statistics such as infrastructure and node details.
+Time Series metrics are good at observing trends, handling large volumes
+of data with minimal infrastructure, and for tracking (normally)
+slow-changing statistics such as infrastructure and node details.
-Most Time Series implementations cannot associate data to events beyond key-value tagging.
-Such systems are often insufficient to track intermittent issues or view details regarding particular errors.
+Most Time Series implementations cannot associate data to events beyond
+key-value tagging. Such systems are often insufficient to track
+intermittent issues or view details regarding particular errors.
### Application Performance Monitoring (APM)
-APM data are instrumentation which can operate across code functions, external requests, and across applications.
-APM provides [distributed tracing][4], which can be enhanced on platforms like [NewRelic][5].
-APM can answer all the questions addressed by the Time-Series solution, and additionally:
+APM data are instrumentation which can operate across code functions,
+external requests, and across applications. APM provides [distributed
+tracing][4], which can be enhanced on platforms like [NewRelic][5]. APM
+can answer all the questions addressed by the Time-Series solution, and
+additionally:
- Am I operating within my SLO/SLA for clients?
- How long does a specific endpoint take to respond?
- How frequently is a specific function being called?
- What functions are taking the most time during a request / operation?
-- What were the exact contents of request and response data during a long or erroneous operation?
-- What [additional attributes][6] were present during an erroneous operation?
+- What were the exact contents of request and response data during a
+ long or erroneous operation?
+- What [additional attributes][6] were present during an erroneous
+ operation?
-APM SHOULD be used to track key indicators (KPIs) and other SLO-related values such as:
+APM SHOULD be used to track key indicators (KPIs) and other SLO-related
+values such as:
- Errors and Error Rate.
- Traffic / Throughput rates.
@@ -81,27 +114,37 @@ APM SHOULD be used to track key indicators (KPIs) and other SLO-related values s
### Logging
-Logging is an inherently flexible and robust method of generating event-related information.
-Logging metrics capture event data from a specific point in time during application execution and write to a file (or stream, for collection).
-Logging metrics MAY answer questions like:
+Logging is an inherently flexible and robust method of generating
+event-related information. Logging metrics capture event data from a
+specific point in time during application execution and write to a file
+(or stream, for collection). Logging metrics MAY answer questions like:
- What calculated values were generated by a specific function?
-- What input or payload information was provided for a specific transaction?
-
-Logs are often necessary for auditing purposes, which have security, legal, or regulatory requirements.
-Such requirements supercede declarations here.
-
-- Logging SHOULD be used to capture only events which cannot be captured by APM or Metrics.
- - Suggested events include process startup messages, signal-received hooks, shutdown messages, and reporting failures (such as failure to submit APM/Metrics).
- - Logging MAY _temporarily_ be used to track events which are captured by APM or Metrics during incidents or when debugging.
+- What input or payload information was provided for a specific
+ transaction?
+
+Logs are often necessary for auditing purposes, which have security,
+legal, or regulatory requirements. Such requirements supercede
+declarations here.
+
+- Logging SHOULD be used to capture only events which cannot be captured
+ by APM or Metrics.
+ - Suggested events include process startup messages, signal-received
+ hooks, shutdown messages, and reporting failures (such as failure to
+ submit APM/Metrics).
+ - Logging MAY _temporarily_ be used to track events which are captured
+ by APM or Metrics during incidents or when debugging.
- Logging SHOULD be in a structured format such as JSON.
-- Logging MUST NOT include sensitive data without masking/eliding said data.
+- Logging MUST NOT include sensitive data without masking/eliding said
+ data.
## Top Metrics
Also known as [the Golden Signals][1].
-Applications MUST have at least one dashboard which displays the following metrics for all services or components, as well as metrics for invoked dependencies.
+Applications MUST have at least one dashboard which displays the
+following metrics for all services or components, as well as metrics for
+invoked dependencies.
### Latency
diff --git a/docs/observability.md b/docs/observability.md
new file mode 100644
index 0000000..4efa297
--- /dev/null
+++ b/docs/observability.md
@@ -0,0 +1,30 @@
+# Observability
+
+"Observability v2" is considered an evolution of the [Observability-v1
+(or Monitoring)][1] paradigm. The goal of Observability (v1 or v2) is to
+provide understanding of how an application operates at runtime.
+
+As a common shorthand,
+
+### Definitions
+
+_Copied from the O
+
+Certain words have different definitions from their common use in
+conversation. For the purposes of monitoring and alerting, the
+following definitions apply:
+
+- **Observe**: to understand how an application behaves, with real world
+ use cases.
+- **Monitoring**: the act of collecting information used to _Observe_ an
+ application.
+- **Event**: a record of something which happened, produced by a
+ Monitor.
+ - A monitoring _event_ is not the same as events used in other systems
+ such as Databases, Cloud Providers, Apache Kafka, etc.
+ - Events are records, not signals, and MUST represent something that
+ actually happened. Such occurrences represent decisions made by the
+ Development, SRE, and Security teams to highlight _meaningful_
+ occurrences.
+
+[1]: monitoring.md
diff --git a/docs/tiers.md b/docs/tiers.md
index 60cc7b3..14af482 100644
--- a/docs/tiers.md
+++ b/docs/tiers.md
@@ -2,47 +2,69 @@
## Definition
-Platforms and services can have different expectations depending on the technologies used, its support systems, and customer-impact.
-This document defines those expectations into four "tiers" from the most-critical (Tier 1) to the least-critical (Tier 4).
+Platforms and services can have different expectations depending on the
+technologies used, its support systems, and customer-impact. This
+document defines those expectations into four "tiers" from the
+most-critical (Tier 1) to the least-critical (Tier 4).
### Base Requirements
-- Teams MUST plan for both course-of-business failures and disaster-level events.
-- Teams MUST assign a tier number for each application (service, platform, or system) they support.
-- Applications MUST meet the availability and resilience targets of their tier.
+- Teams MUST plan for both course-of-business failures and
+ disaster-level events.
+- Teams MUST assign a tier number for each application (service,
+ platform, or system) they support.
+- Applications MUST meet the availability and resilience targets of
+ their tier.
### Tier 1
-Tier 1 applications are **core/critical systems upon which all else is built**.
-Examples include Active Directory, Kubernetes clusters (Dev, Integration, or Production), and Datacenter Firewalls.
+Tier 1 applications are **core/critical systems upon which all else is
+built**. Examples include Active Directory, Kubernetes clusters (Dev,
+Integration, or Production), and Datacenter Firewalls.
-Tier 1 applications MUST provide at least **[%99.95 availability](https://uptime.is/99.95)** or less than four hours of downtime n-total per year.
+Tier 1 applications MUST provide at least **[%99.95
+availability](https://uptime.is/99.95)** or less than four hours of
+downtime n-total per year.
### Tier 2
-Tier 2 applications are **critical and/or time-sensitive**.
-Such applications could include a customer-facing billing system, an IAM gateway, or a central code-management platform (Github).
+Tier 2 applications are **critical and/or time-sensitive**. Such
+applications could include a customer-facing billing system, an IAM
+gateway, or a central code-management platform (Github).
-Tier 2 applications MUST provide at least **[%99.9 availability](https://uptime.is/99.9)** or less than nine hours of downtime in-total per year.
+Tier 2 applications MUST provide at least **[%99.9
+availability](https://uptime.is/99.9)** or less than nine hours of
+downtime in-total per year.
### Tier 3
-Tier 3 applications are **important and not time-sensitive**.
-These include systems for end-of-month billing, internal (non-customer-impacting) metrics,
+Tier 3 applications are **important and not time-sensitive**. These
+include systems for end-of-month billing, internal
+(non-customer-impacting) metrics,
-Tier 3 applications MUST provide at least **[%99 availability](https://uptime.is/99)** or less than four days of downtime in-total per year.
+Tier 3 applications MUST provide at least **[%99
+availability](https://uptime.is/99)** or less than four days of downtime
+in-total per year.
### Tier 4
-Tier 4 applications have **low impact when delayed**.
-Tier 4 applications include everything which is not assigned to other tiers.
+Tier 4 applications have **low impact when delayed**. Tier 4
+applications include everything which is not assigned to other tiers.
-Tier 4 applications MUST provide at least **[%97 availability](https://uptime.is/97)** or less than ten days of downtime in-total per year.
+Tier 4 applications MUST provide at least **[%97
+availability](https://uptime.is/97)** or less than ten days of downtime
+in-total per year.
## Availability Calculations
-Availability is calculated based on whether a given service is responding *correctly* and is *communicating*.
+Availability is calculated based on whether a given service is
+responding *correctly* and is *communicating*.
+
+- If a service endpoint is not reachable, but is otherwise returning
+ positive metrics (or no errors), the system is not Available.
+- If a service endpoint can be connected, but only returns error
+ messages, the system is not Available.
+- If a service has a combination of unreachability and erroneous
+ responses for a duration which exceeds the Availability limit, that
+ service has broken its SLA guarantee.i
-- If a service endpoint is not reachable, but is otherwise returning positive metrics (or no errors), the system is not Available.
-- If a service endpoint can be connected, but only returns error messages, the system is not Available.
-- If a service has a combination of unreachability and erroneous responses for a duration which exceeds the Availability limit, that service has broken its SLA guarantee.i