Documentation updates as per review [skip ci]

673f0625 · Andrew Newdigate · f793dad7 · 673f0625 · 673f0625
Commit 673f0625 authored Nov 06, 2018 by Andrew Newdigate
Hide whitespace changes
Inline Side-by-side

Showing with 68 additions and 35 deletions

doc/development/chaos_endpoints.md doc/development/chaos_endpoints.md +66 -33

doc/development/performance.md doc/development/performance.md +2 -2

No files found.
--- a/doc/development/chaos_endpoints.md
+++ b/doc/development/chaos_endpoints.md
-# Generating Chaos in a test GitLab instance
+# Generating chaos in a test GitLab instance

 As [Werner Vogels](https://twitter.com/Werner), the CTO at Amazon Web Services, famously put it, **Everything fails, all the time**.

-As a developer, it's as important to consider the failure modes in which your software will operate as much as normal operation. Doing so can mean the difference between a minor hiccup leading to a scattering of 500 errors experienced by a tiny fraction of users and a full site outage affect all users for an extended period.
+As a developer, it's as important to consider the failure modes in which your software will operate as much as normal operation. Doing so can mean the difference between a minor hiccup leading to a scattering of `500` errors experienced by a tiny fraction of users and a full site outage that affects all users for an extended period.

 To paraphrase [Tolstoy](https://en.wikipedia.org/wiki/Anna_Karenina_principle), _all happy servers are alike, but all failing servers are failing in their own way_. Luckily, there are ways we can attempt to simulate these failure modes, and the chaos endpoints are tools for assisting in this process.

-Currently, there are four endpoints for simulating the following conditions: slow requests, cpu-bound requests, memory leaks and unexpected process crashes.
+Currently, there are four endpoints for simulating the following conditions:

-## Enabling Chaos Endpoints
+- Slow requests.
+- CPU-bound requests.
+- Memory leaks.
+- Unexpected process crashes.

-For obvious reasons, these endpoints are not enabled by default. They can be enabled by setting the `GITLAB_ENABLE_CHAOS_ENDPOINTS` environment variable.
+## Enabling chaos endpoints
+
+For obvious reasons, these endpoints are not enabled by default. They can be enabled by setting the `GITLAB_ENABLE_CHAOS_ENDPOINTS` environment variable to `1`.

 For example, if you're using the [GDK](https://gitlab.com/gitlab-org/gitlab-development-kit) this can be done with the following command:

-```shell
+```bash
 GITLAB_ENABLE_CHAOS_ENDPOINTS=1 gdk run
 ```

-### Securing the Chaos Endpoints
+## Securing the chaos endpoints

-**It is highly recommended that you secure access to the Chaos endpoints using a secret token**. This is recommended when enabling these endpoints locally, and essential when running in a staging or other shared environment. _It goes without saying that you should not enable them in production unless you absolutely know what you're doing._
+DANGER: **Danger:**
+It is highly recommended that you secure access to the chaos endpoints using a secret token. This is recommended when enabling these endpoints locally and essential when running in a staging or other shared environment. You should not enable them in production unless you absolutely know what you're doing.

-A secret can be set through the `GITLAB_CHAOS_SECRET` environment variable. For example, when using the [GDK](https://gitlab.com/gitlab-org/gitlab-development-kit) this can be done with the following command line:
+A secret token can be set through the `GITLAB_CHAOS_SECRET` environment variable. For example, when using the [GDK](https://gitlab.com/gitlab-org/gitlab-development-kit) this can be done with the following command:

-```shell
+```bash
 GITLAB_ENABLE_CHAOS_ENDPOINTS=1 GITLAB_CHAOS_SECRET=secret gdk run
 ```

 Replace `secret` with your own secret token.

-## Invoking Chaos
+## Invoking chaos

-Once you have enabled the chaos endpoints and restarted the application you can start testing using the endpoints.
+Once you have enabled the chaos endpoints and restarted the application, you can start testing using the endpoints.

-### Memory Leaks
+## Memory leaks

 To simulate a memory leak in your application, use the `/-/chaos/leakmem` endpoint.

-For example, if your GitLab instance is listening at `localhost:3000`, you could `curl` the endpoint as follows:
+NOTE: **Note:**
+The memory is not retained after the request finishes. Once the request has completed, the Ruby garbage collector will attempt to recover the memory.

-```shell
-curl http://localhost:3000/-/chaos/leakmem?memory_mb=1024&duration_s=10 --header 'X-Chaos-Secret: secret'
+```
+GET /-/chaos/leakmem
+GET /-/chaos/leakmem?memory_mb=1024
+GET /-/chaos/leakmem?memory_mb=1024&duration_s=50
 ```

-The `memory_mb` parameter tells the application how much memory it should leak. The `duration_s` parameter will ensure the request retains
-the memory for this duration at a minimum (default 30s).
+| Attribute    | Type    | Required | Description                                                                        |
+| ------------ | ------- | -------- | ---------------------------------------------------------------------------------- |
+| `memory_mb`  | integer | no       | How much memory, in MB, should be leaked. Defaults to 100MB.                       |
+| `duration_s` | integer | no       | Minimum duration, in seconds, that the memory should be retained. Defaults to 30s. |

-Note: the memory is not retained after the request finishes. Once the request has completed, the Ruby garbage collector will attempt to recover the memory.
+```bash
+curl http://localhost:3000/-/chaos/leakmem?memory_mb=1024&duration_s=10 --header 'X-Chaos-Secret: secret'
+```

-### CPU Spin
+## CPU spin

 This endpoint attempts to fully utilise a single core, at 100%, for the given period.

-```shell
+Depending on your rack server setup, your request may timeout after a predermined period (normally 60 seconds).
+If you're using Unicorn, this is done by killing the worker process.
+
+```
+GET /-/chaos/cpuspin
+GET /-/chaos/cpuspin?duration_s=50
+```
+
+| Attribute    | Type    | Required | Description                                                           |
+| ------------ | ------- | -------- | --------------------------------------------------------------------- |
+| `duration_s` | integer | no       | Duration, in seconds, that the core will be utilised. Defaults to 30s |
+
+```bash
 curl http://localhost:3000/-/chaos/cpuspin?duration_s=60 --header 'X-Chaos-Secret: secret'
 ```

-The `duration_s` parameter will configure how long the core is utilised.
+## Sleep

-Depending on your rack server setup, your request may timeout after a predermined period (normally 60 seconds). If you're using Unicorn, this is done by killing the worker process.
+This endpoint is similar to the CPU Spin endpoint but simulates off-processor activity, such as network calls to backend services. It will sleep for a given duration.

-### Sleep
+As with the CPU Spin endpoint, this may lead to your request timing out if duration exceeds the configured limit.

-This endpoint is similar to the CPU Spin endpoint but simulates off-processor activity, such backend services of IO. It will sleep for a given duration.
+```
+GET /-/chaos/sleep
+GET /-/chaos/sleep?duration_s=50
+```

-```shell
+| Attribute    | Type    | Required | Description                                                            |
+| ------------ | ------- | -------- | ---------------------------------------------------------------------- |
+| `duration_s` | integer | no       | Duration, in seconds, that the request will sleep for. Defaults to 30s |
+
+```bash
 curl http://localhost:3000/-/chaos/sleep?duration_s=60 --header 'X-Chaos-Secret: secret'
 ```

-The `duration_s` parameter will configure how long the request will sleep for.
+## Kill

-As with the CPU Spin endpoint, this may lead to your request timing out if duration exceeds the configured limit.
+This endpoint will simulate the unexpected death of a worker process using a `kill` signal.

-### Kill
+NOTE: **Note:**
+Since this endpoint uses the `KILL` signal, the worker is not given a chance to cleanup or shutdown.

-This endpoint will simulate the unexpected death of a worker process using a `kill` signal.
+```
+GET /-/chaos/kill
+```

-```shell
+```bash
 curl http://localhost:3000/-/chaos/kill --header 'X-Chaos-Secret: secret'
 ```
-
-Note: since this endpoint uses the `KILL` signal, the worker is not given a chance to cleanup or shutdown.
--- a/doc/development/performance.md
+++ b/doc/development/performance.md
@@ -34,14 +34,14 @@ graphs/dashboards.

 ## Tooling

-GitLab provides built-in tools to aid the process of improving performance:
+GitLab provides built-in tools to help improve performance and availability:

 * [Profiling](profiling.md)
  * [Sherlock](profiling.md#sherlock)
 * [GitLab Performance Monitoring](../administration/monitoring/performance/index.md)
 * [Request Profiling](../administration/monitoring/performance/request_profiling.md)
 * [QueryRecoder](query_recorder.md) for preventing `N+1` regressions
-* [Chaos Endpoints](chaos_endpoints.md) less for performance, more for availability: tools for testing failure scenarios
+* [Chaos endpoints](chaos_endpoints.md) for testing failure scenarios. Intended mainly for testing availability.

 GitLab employees can use GitLab.com's performance monitoring systems located at
 <https://dashboards.gitlab.net>, this requires you to log in using your