object_storage.md 35.2 KB
Newer Older
1
---
2 3
stage: Enablement
group: Distribution
4
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
5 6 7 8 9 10
type: reference
---

# Object Storage

GitLab supports using an object storage service for holding numerous types of data.
Suzanne Selhorn's avatar
Suzanne Selhorn committed
11
It's recommended over NFS and
12 13 14 15 16
in general it's better in larger setups as object storage is
typically much more performant, reliable, and scalable.

## Options

17
GitLab has been tested on a number of object storage providers:
18

19 20
- [Amazon S3](https://aws.amazon.com/s3/)
- [Google Cloud Storage](https://cloud.google.com/storage)
21
- [Digital Ocean Spaces](https://www.digitalocean.com/products/spaces/)
22
- [Oracle Cloud Infrastructure](https://docs.cloud.oracle.com/en-us/iaas/Content/Object/Tasks/s3compatibleapi.htm)
23
- [OpenStack Swift](https://docs.openstack.org/swift/latest/s3_compat.html)
24
- [Azure Blob storage](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction)
25 26 27
- On-premises hardware and appliances from various storage vendors.
- MinIO. We have [a guide to deploying this](https://docs.gitlab.com/charts/advanced/external-object-storage/minio.html) within our Helm Chart documentation.

28 29 30 31 32 33
### Known compatibility issues

- Dell EMC ECS: Prior to GitLab 13.3, there is a [known bug in GitLab Workhorse that prevents
  HTTP Range Requests from working with CI job artifacts](https://gitlab.com/gitlab-org/gitlab/-/issues/223806).
  Be sure to upgrade to GitLab v13.3.0 or above if you use S3 storage with this hardware.

34 35
- Ceph S3 prior to [Kraken 11.0.2](https://ceph.com/releases/kraken-11-0-2-released/) does not support the [Upload Copy Part API](https://gitlab.com/gitlab-org/gitlab/-/issues/300604). You may need to [disable multi-threaded copying](#multi-threaded-copying).

36 37
## Configuration guides

38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
There are two ways of specifying object storage configuration in GitLab:

- [Consolidated form](#consolidated-object-storage-configuration): A single credential is
  shared by all supported object types.
- [Storage-specific form](#storage-specific-configuration): Every object defines its
  own object storage [connection and configuration](#connection-settings).

For more information on the differences and to transition from one form to another, see
[Transition to consolidated form](#transition-to-consolidated-form).

### Consolidated object storage configuration

> Introduced in [GitLab 13.2](https://gitlab.com/gitlab-org/omnibus-gitlab/-/merge_requests/4368).

Using the consolidated object storage configuration has a number of advantages:

- It can simplify your GitLab configuration since the connection details are shared
  across object types.
- It enables the use of [encrypted S3 buckets](#encrypted-s3-buckets).
- It [uploads files to S3 with proper `Content-MD5` headers](https://gitlab.com/gitlab-org/gitlab-workhorse/-/issues/222).

59 60 61 62 63 64 65
Because [direct upload mode](../development/uploads.md#direct-upload)
must be enabled, only the following providers can be used:

- [Amazon S3-compatible providers](#s3-compatible-connection-settings)
- [Google Cloud Storage](#google-cloud-storage-gcs)
- [Azure Blob storage](#azure-blob-storage)

66 67 68 69
When consolidated object storage is used, direct upload is enabled
automatically. Background upload is not supported. For storage-specific
configuration, [direct upload may become the default](https://gitlab.com/gitlab-org/gitlab/-/issues/27331)
because it does not require a shared folder.
70

Craig Norris's avatar
Craig Norris committed
71 72
Consolidated object storage configuration can't be used for backups or
Mattermost. See the [full table for a complete list](#storage-specific-configuration).
73

Craig Norris's avatar
Craig Norris committed
74 75 76
Enabling consolidated object storage enables object storage for all object
types. If you want to use local storage for specific object types, you can
[selectively disable object storages](#selectively-disabling-object-storage).
77

78 79
Most types of objects, such as CI artifacts, LFS files, upload
attachments, and so on can be saved in object storage by specifying a single
80
credential for object storage with multiple buckets.
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105

When the consolidated form is:

- Used with an S3-compatible object storage, Workhorse uses its internal S3 client to
  upload files.
- Not used with an S3-compatible object storage, Workhorse falls back to using
  pre-signed URLs.

See the section on [ETag mismatch errors](#etag-mismatch) for more details.

**In Omnibus installations:**

1. Edit `/etc/gitlab/gitlab.rb` and add the following lines, substituting
   the values you want:

    ```ruby
    # Consolidated object storage configuration
    gitlab_rails['object_store']['enabled'] = true
    gitlab_rails['object_store']['proxy_download'] = true
    gitlab_rails['object_store']['connection'] = {
      'provider' => 'AWS',
      'region' => '<eu-central-1>',
      'aws_access_key_id' => '<AWS_ACCESS_KEY_ID>',
      'aws_secret_access_key' => '<AWS_SECRET_ACCESS_KEY>'
    }
106 107 108
    # OPTIONAL: The following lines are only needed if server side encryption is required
    gitlab_rails['object_store']['storage_options'] = {
      'server_side_encryption' => '<AES256 or aws:kms>',
109
      'server_side_encryption_kms_key_id' => '<arn:aws:kms:xxx>'
110
    }
111 112 113 114 115 116 117
    gitlab_rails['object_store']['objects']['artifacts']['bucket'] = '<artifacts>'
    gitlab_rails['object_store']['objects']['external_diffs']['bucket'] = '<external-diffs>'
    gitlab_rails['object_store']['objects']['lfs']['bucket'] = '<lfs-objects>'
    gitlab_rails['object_store']['objects']['uploads']['bucket'] = '<uploads>'
    gitlab_rails['object_store']['objects']['packages']['bucket'] = '<packages>'
    gitlab_rails['object_store']['objects']['dependency_proxy']['bucket'] = '<dependency-proxy>'
    gitlab_rails['object_store']['objects']['terraform_state']['bucket'] = '<terraform-state>'
118
    gitlab_rails['object_store']['objects']['pages']['bucket'] = '<pages>'
119 120
    ```

121
   For GitLab 9.4 or later, if you're using AWS IAM profiles, be sure to omit the
122 123 124
   AWS access key and secret access key/value pairs. For example:

   ```ruby
125
   gitlab_rails['object_store']['connection'] = {
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146
     'provider' => 'AWS',
     'region' => '<eu-central-1>',
     'use_iam_profile' => true
   }
   ```

1. Save the file and [reconfigure GitLab](restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.

**In installations from source:**

1. Edit `/home/git/gitlab/config/gitlab.yml` and add or amend the following lines:

   ```yaml
   object_store:
     enabled: true
     proxy_download: true
     connection:
       provider: AWS
       aws_access_key_id: <AWS_ACCESS_KEY_ID>
       aws_secret_access_key: <AWS_SECRET_ACCESS_KEY>
       region: <eu-central-1>
147 148
     storage_options:
       server_side_encryption: <AES256 or aws:kms>
149
       server_side_encryption_key_kms_id: <arn:aws:kms:xxx>
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164
     objects:
       artifacts:
         bucket: <artifacts>
       external_diffs:
         bucket: <external-diffs>
       lfs:
         bucket: <lfs-objects>
       uploads:
         bucket: <uploads>
       packages:
         bucket: <packages>
       dependency_proxy:
         bucket: <dependency_proxy>
       terraform_state:
         bucket: <terraform>
165 166
       pages:
         bucket: <pages>
167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215
   ```

1. Edit `/home/git/gitlab-workhorse/config.toml` and add or amend the following lines:

   ```toml
   [object_storage]
     provider = "AWS"

   [object_storage.s3]
     aws_access_key_id = "<AWS_ACCESS_KEY_ID>"
     aws_secret_access_key = "<AWS_SECRET_ACCESS_KEY>"
   ```

1. Save the file and [restart GitLab](restart_gitlab.md#installations-from-source) for the changes to take effect.

#### Common parameters

In the consolidated configuration, the `object_store` section defines a
common set of parameters. Here we use the YAML from the source
installation because it's easier to see the inheritance:

```yaml
    object_store:
      enabled: true
      proxy_download: true
      connection:
        provider: AWS
        aws_access_key_id: <AWS_ACCESS_KEY_ID>
        aws_secret_access_key: <AWS_SECRET_ACCESS_KEY>
      objects:
        ...
```

The Omnibus configuration maps directly to this:

```ruby
gitlab_rails['object_store']['enabled'] = true
gitlab_rails['object_store']['proxy_download'] = true
gitlab_rails['object_store']['connection'] = {
  'provider' => 'AWS',
  'aws_access_key_id' => '<AWS_ACCESS_KEY_ID',
  'aws_secret_access_key' => '<AWS_SECRET_ACCESS_KEY>'
}
```

| Setting | Description |
|---------|-------------|
| `enabled` | Enable/disable object storage |
| `proxy_download` | Set to `true` to [enable proxying all files served](#proxy-download). Option allows to reduce egress traffic as this allows clients to download directly from remote storage instead of proxying all data |
216 217
| `connection` | Various [connection options](#connection-settings) described below |
| `storage_options` | Options to use when saving new objects, such as [server side encryption](#server-side-encryption-headers). Introduced in GitLab 13.3 |
218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235
| `objects` | [Object-specific configuration](#object-specific-configuration)

### Connection settings

Both consolidated configuration form and storage-specific configuration form must configure a connection. The following sections describe parameters that can be used
in the `connection` setting.

#### S3-compatible connection settings

The connection settings match those provided by [fog-aws](https://github.com/fog/fog-aws):

| Setting | Description | Default |
|---------|-------------|---------|
| `provider` | Always `AWS` for compatible hosts | `AWS` |
| `aws_access_key_id` | AWS credentials, or compatible | |
| `aws_secret_access_key` | AWS credentials, or compatible | |
| `aws_signature_version` | AWS signature version to use. `2` or `4` are valid options. Digital Ocean Spaces and other providers may need `2`. | `4` |
| `enable_signature_v4_streaming` | Set to `true` to enable HTTP chunked transfers with [AWS v4 signatures](https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-streaming.html). Oracle Cloud S3 needs this to be `false`. | `true` |
Stan Hu's avatar
Stan Hu committed
236
| `region` | AWS region. | |
237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267
| `host` | S3 compatible host for when not using AWS, e.g. `localhost` or `storage.example.com`. HTTPS and port 443 is assumed. | `s3.amazonaws.com` |
| `endpoint` | Can be used when configuring an S3 compatible service such as [MinIO](https://min.io), by entering a URL such as `http://127.0.0.1:9000`. This takes precedence over `host`. | (optional) |
| `path_style` | Set to `true` to use `host/bucket_name/object` style paths instead of `bucket_name.host/object`. Leave as `false` for AWS S3. | `false` |
| `use_iam_profile` | Set to `true` to use IAM profile instead of access keys | `false`

#### Oracle Cloud S3 connection settings

Note that Oracle Cloud S3 must be sure to use the following settings:

| Setting | Value |
|---------|-------|
| `enable_signature_v4_streaming` | `false` |
| `path_style` | `true` |

If `enable_signature_v4_streaming` is set to `true`, you may see the
following error in `production.log`:

```plaintext
STREAMING-AWS4-HMAC-SHA256-PAYLOAD is not supported
```

#### Google Cloud Storage (GCS)

Here are the valid connection parameters for GCS:

| Setting | Description | example |
|---------|-------------|---------|
| `provider` | The provider name | `Google` |
| `google_project` | GCP project name | `gcp-project-12345` |
| `google_client_email` | The email address of the service account | `foo@gcp-project-12345.iam.gserviceaccount.com` |
| `google_json_key_location` | The JSON key path | `/path/to/gcp-project-12345-abcde.json` |
268
| `google_application_default` | Set to `true` to use [Google Cloud Application Default Credentials](https://cloud.google.com/docs/authentication/production#automatically) to locate service account credentials. |
269

270 271 272
The service account must have permission to access the bucket. Learn more
in Google's
[Cloud Storage authentication documentation](https://cloud.google.com/storage/docs/authentication).
273 274 275 276 277 278 279 280 281 282 283 284 285 286

##### Google example (consolidated form)

For Omnibus installations, this is an example of the `connection` setting:

```ruby
gitlab_rails['object_store']['connection'] = {
  'provider' => 'Google',
  'google_project' => '<GOOGLE PROJECT>',
  'google_client_email' => '<GOOGLE CLIENT EMAIL>',
  'google_json_key_location' => '<FILENAME>'
}
```

287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313
##### Google example with ADC (consolidated form)

> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/275979) in GitLab 13.6.

Google Cloud Application Default Credentials (ADC) are typically
used with GitLab to use the default service account. This eliminates the
need to supply credentials for the instance. For example:

```ruby
gitlab_rails['object_store']['connection'] = {
  'provider' => 'Google',
  'google_project' => '<GOOGLE PROJECT>',
  'google_application_default' => true
}
```

If you use ADC, be sure that:

- The service account that you use has the
[`iam.serviceAccounts.signBlob` permission](https://cloud.google.com/iam/docs/reference/credentials/rest/v1/projects.serviceAccounts/signBlob).
  Typically this is done by granting the `Service Account Token Creator` role to the service account.
- Your virtual machines have the [correct access scopes to access Google Cloud APIs](https://cloud.google.com/compute/docs/access/create-enable-service-accounts-for-instances#changeserviceaccountandscopes). If the machines do not have the right scope, the error logs may show:

  ```markdown
  Google::Apis::ClientError (insufficientPermissions: Request had insufficient authentication scopes.)
  ```

314 315 316 317 318 319 320 321
#### Azure Blob storage

> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/25877) in GitLab 13.4.

Although Azure uses the word `container` to denote a collection of
blobs, GitLab standardizes on the term `bucket`. Be sure to configure
Azure container names in the `bucket` settings.

322 323 324 325 326
Azure Blob storage can only be used with the [consolidated form](#consolidated-object-storage-configuration)
because a single set of credentials are used to access multiple
containers. The [storage-specific form](#storage-specific-configuration)
is not supported. For more details, see [how to transition to consolidated form](#transition-to-consolidated-form).

327 328 329 330 331 332 333 334
The following are the valid connection parameters for Azure. Read the
[Azure Blob storage documentation](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction)
to learn more.

| Setting | Description | Example |
|---------|-------------|---------|
| `provider` | Provider name | `AzureRM` |
| `azure_storage_account_name` | Name of the Azure Blob Storage account used to access the storage | `azuretest` |
335
| `azure_storage_access_key` | Storage account access key used to access the container. This is typically a secret, 512-bit encryption key encoded in base64. | `czV2OHkvQj9FKEgrTWJRZVRoV21ZcTN0Nnc5eiRDJkYpSkBOY1JmVWpYbjJy\nNHU3eCFBJUQqRy1LYVBkU2dWaw==\n` |
336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352
| `azure_storage_domain` | Domain name used to contact the Azure Blob Storage API (optional). Defaults to `blob.core.windows.net`. Set this if you are using Azure China, Azure Germany, Azure US Government, or some other custom Azure domain. | `blob.core.windows.net` |

##### Azure example (consolidated form)

For Omnibus installations, this is an example of the `connection` setting:

```ruby
gitlab_rails['object_store']['connection'] = {
  'provider' => 'AzureRM',
  'azure_storage_account_name' => '<AZURE STORAGE ACCOUNT NAME>',
  'azure_storage_access_key' => '<AZURE STORAGE ACCESS KEY>',
  'azure_storage_domain' => '<AZURE STORAGE DOMAIN>',
}
```

###### Azure Workhorse settings (source installs only)

Craig Norris's avatar
Craig Norris committed
353 354 355
For source installations, Workhorse also needs to be configured with Azure
credentials. This isn't needed in Omnibus installs, because the Workhorse
settings are populated from the previous settings.
356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372

1. Edit `/home/git/gitlab-workhorse/config.toml` and add or amend the following lines:

   ```toml
   [object_storage]
     provider = "AzureRM"

   [object_storage.azurerm]
     azure_storage_account_name = "<AZURE STORAGE ACCOUNT NAME>"
     azure_storage_access_key = "<AZURE STORAGE ACCESS KEY>"
   ```

If you are using a custom Azure storage domain, note that
`azure_storage_domain` does **not** have to be set in the Workhorse
configuration. This information is exchanged in an API call between
GitLab Rails and Workhorse.

373 374
#### OpenStack-compatible connection settings

Craig Norris's avatar
Craig Norris committed
375 376
Although OpenStack Swift provides S3 compatibility, some users may want to use
the [Swift API](https://docs.openstack.org/swift/latest/api/object_api_v1_overview.html).
377

Craig Norris's avatar
Craig Norris committed
378 379 380 381 382
This isn't compatible with the consolidated object storage form. OpenStack Swift
is supported only with the storage-specific form. If you want to use the
consolidated form, see the [S3 settings](#s3-compatible-connection-settings).

Here are the valid connection settings for the Swift API, provided by
383 384 385 386 387 388 389 390 391 392 393 394 395 396
[fog-openstack](https://github.com/fog/fog-openstack):

| Setting | Description | Default |
|---------|-------------|---------|
| `provider` | Always `OpenStack` for compatible hosts | `OpenStack` |
| `openstack_username` | OpenStack username | |
| `openstack_api_key` | OpenStack API key  | |
| `openstack_temp_url_key` | OpenStack key for generating temporary URLs | |
| `openstack_auth_url` | OpenStack authentication endpoint | |
| `openstack_region` | OpenStack region | |
| `openstack_tenant` | OpenStack tenant ID |

#### Rackspace Cloud Files

Craig Norris's avatar
Craig Norris committed
397 398
The following table describes the valid connection parameters for
Rackspace Cloud, provided by [fog-rackspace](https://github.com/fog/fog-rackspace/).
399

Craig Norris's avatar
Craig Norris committed
400 401
This isn't compatible with the consolidated object storage form.
Rackspace Cloud is supported only with the storage-specific form.
402 403 404 405 406 407

| Setting | Description | example |
|---------|-------------|---------|
| `provider` | The provider name | `Rackspace` |
| `rackspace_username` | The username of the Rackspace account with access to the container | `joe.smith` |
| `rackspace_api_key` | The API key of the Rackspace account with access to the container | `ABC123DEF456ABC123DEF456ABC123DE` |
408 409
| `rackspace_region` | The Rackspace storage region to use, a three letter code from the [list of service access endpoints](https://docs.rackspace.com/docs/cloud-files/v1/general-api-info/service-access/) | `iad` |
| `rackspace_temp_url_key` | The private key you have set in the Rackspace API for [temporary URLs](https://docs.rackspace.com/docs/cloud-files/v1/use-cases/public-access-to-your-cloud-files-account/#tempurl). | `ABC123DEF456ABC123DEF456ABC123DE` |
410

Craig Norris's avatar
Craig Norris committed
411 412 413 414 415 416 417
Regardless of whether the container has public access enabled or disabled, Fog
uses the TempURL method to grant access to LFS objects. If you see error
messages in logs that refer to instantiating storage with a `temp-url-key`,
be sure you have set the key properly both in the Rackspace API and in
`gitlab.rb`. You can verify the value of the key Rackspace has set by sending a
GET request with token header to the service access endpoint URL and comparing
the output of the returned headers.
418 419 420 421 422

### Object-specific configuration

The following YAML shows how the `object_store` section defines
object-specific configuration block and how the `enabled` and
423
`proxy_download` flags can be overridden. The `bucket` is the only
424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446
required parameter within each type:

```yaml
  object_store:
      connection:
        ...
      objects:
        artifacts:
          bucket: artifacts
          proxy_download: false
        external_diffs:
          bucket: external-diffs
        lfs:
          bucket: lfs-objects
        uploads:
          bucket: uploads
        packages:
          bucket: packages
        dependency_proxy:
          enabled: false
          bucket: dependency_proxy
        terraform_state:
          bucket: terraform
447 448
        pages:
          bucket: pages
449 450 451 452 453 454 455 456 457 458 459 460 461 462
```

This maps to this Omnibus GitLab configuration:

```ruby
gitlab_rails['object_store']['objects']['artifacts']['bucket'] = 'artifacts'
gitlab_rails['object_store']['objects']['artifacts']['proxy_download'] = false
gitlab_rails['object_store']['objects']['external_diffs']['bucket'] = 'external-diffs'
gitlab_rails['object_store']['objects']['lfs']['bucket'] = 'lfs-objects'
gitlab_rails['object_store']['objects']['uploads']['bucket'] = 'uploads'
gitlab_rails['object_store']['objects']['packages']['bucket'] = 'packages'
gitlab_rails['object_store']['objects']['dependency_proxy']['enabled'] = false
gitlab_rails['object_store']['objects']['dependency_proxy']['bucket'] = 'dependency-proxy'
gitlab_rails['object_store']['objects']['terraform_state']['bucket'] = 'terraform-state'
463
gitlab_rails['object_store']['objects']['pages']['bucket'] = 'pages'
464 465 466 467 468 469 470 471 472 473 474 475 476
```

This is the list of valid `objects` that can be used:

|  Type              | Description |
|--------------------|---------------|
| `artifacts`        | [CI artifacts](job_artifacts.md) |
| `external_diffs`   | [Merge request diffs](merge_request_diffs.md) |
| `uploads`          | [User uploads](uploads.md) |
| `lfs`              | [Git Large File Storage objects](lfs/index.md) |
| `packages`         | [Project packages (e.g. PyPI, Maven, NuGet, etc.)](packages/index.md) |
| `dependency_proxy` | [GitLab Dependency Proxy](packages/dependency_proxy.md) |
| `terraform_state`  | [Terraform state files](terraform_state.md) |
477
| `pages`            | [GitLab Pages](pages/index.md) |
478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528

Within each object type, three parameters can be defined:

| Setting          | Required? | Description |
|------------------|-----------|-------------|
| `bucket`         | Yes       | The bucket name for the object storage. |
| `enabled`        | No        | Overrides the common parameter |
| `proxy_download` | No        | Overrides the common parameter |

#### Selectively disabling object storage

As seen above, object storage can be disabled for specific types by
setting the `enabled` flag to `false`. For example, to disable object
storage for CI artifacts:

```ruby
gitlab_rails['object_store']['objects']['artifacts']['enabled'] = false
```

A bucket is not needed if the feature is disabled entirely. For example,
no bucket is needed if CI artifacts are disabled with this setting:

```ruby
gitlab_rails['artifacts_enabled'] = false
```

### Transition to consolidated form

Prior to GitLab 13.2:

- Object storage configuration for all types of objects such as CI/CD artifacts, LFS
  files, upload attachments, and so on had to be configured independently.
- Object store connection parameters such as passwords and endpoint URLs had to be
  duplicated for each type.

For example, an Omnibus GitLab install might have the following configuration:

```ruby
# Original object storage configuration
gitlab_rails['artifacts_object_store_enabled'] = true
gitlab_rails['artifacts_object_store_direct_upload'] = true
gitlab_rails['artifacts_object_store_proxy_download'] = true
gitlab_rails['artifacts_object_store_remote_directory'] = 'artifacts'
gitlab_rails['artifacts_object_store_connection'] = { 'provider' => 'AWS', 'aws_access_key_id' => 'access_key', 'aws_secret_access_key' => 'secret' }
gitlab_rails['uploads_object_store_enabled'] = true
gitlab_rails['uploads_object_store_direct_upload'] = true
gitlab_rails['uploads_object_store_proxy_download'] = true
gitlab_rails['uploads_object_store_remote_directory'] = 'uploads'
gitlab_rails['uploads_object_store_connection'] = { 'provider' => 'AWS', 'aws_access_key_id' => 'access_key', 'aws_secret_access_key' => 'secret' }
```

Craig Norris's avatar
Craig Norris committed
529
Although this provides flexibility in that it makes it possible for GitLab
530 531 532 533 534
to store objects across different cloud providers, it also creates
additional complexity and unnecessary redundancy. Since both GitLab
Rails and Workhorse components need access to object storage, the
consolidated form avoids excessive duplication of credentials.

Craig Norris's avatar
Craig Norris committed
535 536 537 538
The consolidated object storage configuration is used _only_ if all lines from
the original form is omitted. To move to the consolidated form, remove the
original configuration (for example, `artifacts_object_store_enabled`, or
`uploads_object_store_connection`)
539 540 541 542 543 544 545 546

## Storage-specific configuration

For configuring object storage in GitLab 13.1 and earlier, or for storage types not
supported by consolidated configuration form, refer to the following guides:

|Object storage type|Supported by consolidated configuration?|
|-------------------|----------------------------------------|
547
| [Backups](../raketasks/backup_restore.md#uploading-backups-to-a-remote-cloud-storage) | No |
548
| [Job artifacts](job_artifacts.md#using-object-storage) including archived job logs | Yes |
549
| [LFS objects](lfs/index.md#storing-lfs-objects-in-remote-object-storage) | Yes |
550
| [Uploads](uploads.md#using-object-storage) | Yes |
551
| [Container Registry](packages/container_registry.md#use-object-storage) (optional feature) | No |
552 553
| [Merge request diffs](merge_request_diffs.md#using-object-storage) | Yes |
| [Mattermost](https://docs.mattermost.com/administration/config-settings.html#file-storage)| No |
554
| [Packages](packages/index.md#using-object-storage) (optional feature) | Yes |
555
| [Dependency Proxy](packages/dependency_proxy.md#using-object-storage) (optional feature) **(PREMIUM SELF)** | Yes |
556
| [Pseudonymizer](pseudonymizer.md#configuration) (optional feature) **(ULTIMATE SELF)** | No |
557
| [Autoscale runner caching](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching) (optional for improved performance) | No |
558
| [Terraform state files](terraform_state.md#using-object-storage) | Yes |
559
| [GitLab Pages content](pages/index.md#using-object-storage) | Yes |
560

561
### Other alternatives to file system storage
562

563
If you're working to [scale out](reference_architectures/index.md) your GitLab implementation,
Suzanne Selhorn's avatar
Suzanne Selhorn committed
564
or add fault tolerance and redundancy, you may be
565
looking at removing dependencies on block or network file systems.
566
See the following additional guides:
567 568 569 570

1. Make sure the [`git` user home directory](https://docs.gitlab.com/omnibus/settings/configuration.html#moving-the-home-directory-for-a-user) is on local disk.
1. Configure [database lookup of SSH keys](operations/fast_ssh_key_lookup.md)
   to eliminate the need for a shared `authorized_keys` file.
571
1. [Prevent local disk usage for job logs](job_logs.md#prevent-local-disk-usage).
572 573 574

## Warnings, limitations, and known issues

575
### Use separate buckets
576

577 578 579 580
Using separate buckets for each data type is the recommended approach for GitLab.
This ensures there are no collisions across the various types of data GitLab stores.
There are plans to [enable the use of a single bucket](https://gitlab.com/gitlab-org/gitlab/-/issues/292958)
in the future.
581

582 583
Helm-based installs require separate buckets to
[handle backup restorations](https://docs.gitlab.com/charts/advanced/external-object-storage/#lfs-artifacts-uploads-packages-external-diffs-pseudonymizer)
584

585
### S3 API compatibility issues
586 587

Not all S3 providers [are fully compatible](../raketasks/backup_restore.md#other-s3-providers)
588
with the Fog library that GitLab uses. Symptoms include an error in `production.log`:
589 590 591 592 593 594 595 596

```plaintext
411 Length Required
```

### Incremental logging is required for CI to use object storage

If you configure GitLab to use object storage for CI logs and artifacts,
597
you can avoid [local disk usage for job logs](job_logs.md#data-flow) by enabling
598
[beta incremental logging](job_logs.md#incremental-logging-architecture).
599 600 601

### Proxy Download

602 603 604 605 606
Clients can download files in object storage by receiving a pre-signed, time-limited URL,
or by GitLab proxying the data from object storage to the client.
Downloading files from object storage directly
helps reduce the amount of egress traffic GitLab
needs to process.
607 608

When the files are stored on local block storage or NFS, GitLab has to act as a proxy.
609
This is not the default behavior with object storage.
610

611
The `proxy_download` setting controls this behavior: the default is generally `false`.
612 613
Verify this in the documentation for each use case. Set it to `true` if you want
GitLab to proxy the files.
614 615 616 617 618 619 620

When not proxying files, GitLab returns an
[HTTP 302 redirect with a pre-signed, time-limited object storage URL](https://gitlab.com/gitlab-org/gitlab/-/issues/32117#note_218532298).
This can result in some of the following problems:

- If GitLab is using non-secure HTTP to access the object storage, clients may generate
`https->http` downgrade errors and refuse to process the redirect. The solution to this
621
is for GitLab to use HTTPS. LFS, for example, generates this error:
622 623 624 625 626

   ```plaintext
   LFS: lfsapi/client: refusing insecure redirect, https->http
   ```

627
- Clients need to trust the certificate authority that issued the object storage
628 629 630 631 632 633
certificate, or may return common TLS errors such as:

   ```plaintext
   x509: certificate signed by unknown authority
   ```

634
- Clients need network access to the object storage.
635 636
Network firewalls could block access.
Errors that might result
637 638 639 640 641 642 643 644 645 646
if this access is not in place include:

   ```plaintext
   Received status code 403 from server: Forbidden
   ```

Getting a `403 Forbidden` response is specifically called out on the
[package repository documentation](packages/index.md#using-object-storage)
as a side effect of how some build tools work.

647 648 649 650
Additionally for a short time period users could share pre-signed, time-limited object storage URLs
with others without authentication. Also bandwidth charges may be incurred
between the object storage provider and the client.

651 652 653 654 655 656 657
### ETag mismatch

Using the default GitLab settings, some object storage back-ends such as
[MinIO](https://gitlab.com/gitlab-org/gitlab/-/issues/23188)
and [Alibaba](https://gitlab.com/gitlab-org/charts/gitlab/-/issues/1564)
might generate `ETag mismatch` errors.

658 659
If you are seeing this ETag mismatch error with Amazon Web Services S3,
it's likely this is due to [encryption settings on your bucket](https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html).
660 661 662 663
To fix this issue, you have two options:

- [Use the consolidated object configuration](#consolidated-object-storage-configuration).
- [Use Amazon instance profiles](#using-amazon-instance-profiles).
664

665
The first option is recommended for MinIO. Otherwise, the
666 667 668
[workaround for MinIO](https://gitlab.com/gitlab-org/charts/gitlab/-/issues/1564#note_244497658)
is to use the `--compat` parameter on the server.

669
Without consolidated object store configuration or instance profiles enabled,
670
GitLab Workhorse uploads files to S3 using pre-signed URLs that do
671 672 673 674 675 676 677 678 679
not have a `Content-MD5` HTTP header computed for them. To ensure data
is not corrupted, Workhorse checks that the MD5 hash of the data sent
equals the ETag header returned from the S3 server. When encryption is
enabled, this is not the case, which causes Workhorse to report an `ETag
mismatch` error during an upload.

With the consolidated object configuration and instance profile, Workhorse has
S3 credentials so that it can compute the `Content-MD5` header. This
eliminates the need to compare ETag headers returned from the S3 server.
680 681 682 683 684 685

### Using Amazon instance profiles

Instead of supplying AWS access and secret keys in object storage
configuration, GitLab can be configured to use IAM roles to set up an
[Amazon instance profile](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html).
686
When this is used, GitLab fetches temporary credentials each time an
687 688 689 690 691
S3 bucket is accessed, so no hard-coded values are needed in the
configuration.

#### Encrypted S3 buckets

692 693
> - Introduced in [GitLab 13.1](https://gitlab.com/gitlab-org/gitlab-workhorse/-/merge_requests/466) for instance profiles only and [S3 default encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/bucket-encryption.html).
> - Introduced in [GitLab 13.2](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/34460) for static credentials when [consolidated object storage configuration](#consolidated-object-storage-configuration) and [S3 default encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/bucket-encryption.html) are used.
694

695
When configured either with an instance profile or with the consolidated
696 697
object configuration, GitLab Workhorse properly uploads files to S3
buckets that have [SSE-S3 or SSE-KMS encryption enabled by
698
default](https://docs.aws.amazon.com/kms/latest/developerguide/services-s3.html).
699 700 701 702 703 704 705 706 707 708 709 710 711 712 713
Note that customer master keys (CMKs) and SSE-C encryption are [not
supported since this requires sending the encryption keys in every request](https://gitlab.com/gitlab-org/gitlab/-/issues/226006).

##### Server-side encryption headers

> Introduced in [GitLab 13.3](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/38240).

Setting a default encryption on an S3 bucket is the easiest way to
enable encryption, but you may want to [set a bucket policy to ensure
only encrypted objects are uploaded](https://aws.amazon.com/premiumsupport/knowledge-center/s3-bucket-store-kms-encrypted-objects/).
To do this, you must configure GitLab to send the proper encryption headers
in the `storage_options` configuration section:

|            Setting                  | Description |
|-------------------------------------|-------------|
714
| `server_side_encryption`            | Encryption mode (`AES256` or `aws:kms`) |
715 716 717 718 719 720 721 722 723
| `server_side_encryption_kms_key_id` | Amazon Resource Name. Only needed when `aws:kms` is used in `server_side_encryption`. See the [Amazon documentation on using KMS encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingKMSEncryption.html) |

As with the case for default encryption, these options only work when
the Workhorse S3 client is enabled. One of the following two conditions
must be fulfilled:

- `use_iam_profile` is `true` in the connection settings.
- Consolidated object storage settings are in use.

724
[ETag mismatch errors](#etag-mismatch) occur if server side
725
encryption headers are used without enabling the Workhorse S3 client.
726

727
##### Disabling the feature
728

729
The Workhorse S3 client is enabled by default when the
730 731
[`use_iam_profile` configuration option](#iam-permissions) is set to `true` or consolidated
object storage settings are configured.
732

733 734 735
The feature can be disabled using the `:use_workhorse_s3_client` feature flag. To disable the
feature, ask a GitLab administrator with
[Rails console access](feature_flags.md#how-to-enable-and-disable-features-behind-flags) to run the
736 737 738 739 740
following command:

```ruby
Feature.disable(:use_workhorse_s3_client)
```
741

Evan Read's avatar
Evan Read committed
742
#### IAM Permissions
743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768

To set up an instance profile:

1. Create an Amazon Identity Access and Management (IAM) role with the necessary permissions. The
   following example is a role for an S3 bucket named `test-bucket`:

   ```json
   {
       "Version": "2012-10-17",
       "Statement": [
           {
               "Sid": "VisualEditor0",
               "Effect": "Allow",
               "Action": [
                   "s3:PutObject",
                   "s3:GetObject",
                   "s3:DeleteObject"
               ],
               "Resource": "arn:aws:s3:::test-bucket/*"
           }
       ]
   }
   ```

1. [Attach this role](https://aws.amazon.com/premiumsupport/knowledge-center/attach-replace-ec2-instance-profile/)
   to the EC2 instance hosting your GitLab instance.
769
1. Configure GitLab to use it via the `use_iam_profile` configuration option.
Evan Read's avatar
Evan Read committed
770 771 772 773 774 775 776 777 778 779 780 781 782 783 784

### Multi-threaded copying

GitLab uses the [S3 Upload Part Copy API](https://docs.aws.amazon.com/AmazonS3/latest/API/API_UploadPartCopy.html)
to accelerate the copying of files within a bucket. Ceph S3 [prior to Kraken 11.0.2](https://ceph.com/releases/kraken-11-0-2-released/)
does not support this and [returns a 404 error when files are copied during the upload process](https://gitlab.com/gitlab-org/gitlab/-/issues/300604).

The feature can be disabled using the `:s3_multithreaded_uploads`
feature flag. To disable the feature, ask a GitLab administrator with
[Rails console access](feature_flags.md#how-to-enable-and-disable-features-behind-flags)
to run the following command:

```ruby
Feature.disable(:s3_multithreaded_uploads)
```