Commit 9c2f62cf authored by Stan Hu's avatar Stan Hu

Support Workhorse directly uploading files to S3

This supports the AWS S3 client that will be used by Workhorse in
https://gitlab.com/gitlab-org/gitlab-workhorse/-/merge_requests/466.

This makes it possible to use S3 buckets with default KMS encryption and
proper MD5 checksums.

The Workhorse S3 client is only enabled for instance profiles and V4
signatures.

Since instance profiles are an AWS-specific feature, we should be
reasonably confident that object storage will work with the Workhorse
AWS S3 client.
parent 9af9ec98
---
title: Support Workhorse directly uploading files to S3
merge_request: 29389
author:
type: added
...@@ -141,10 +141,88 @@ Using the default GitLab settings, some object storage back-ends such as ...@@ -141,10 +141,88 @@ Using the default GitLab settings, some object storage back-ends such as
and [Alibaba](https://gitlab.com/gitlab-org/charts/gitlab/-/issues/1564) and [Alibaba](https://gitlab.com/gitlab-org/charts/gitlab/-/issues/1564)
might generate `ETag mismatch` errors. might generate `ETag mismatch` errors.
If you are seeing this ETag mismatch error with Amazon Web Services S3,
it's likely this is due to [encryption settings on your bucket](https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html).
See the section on [using Amazon instance profiles](#using-amazon-instance-profiles) on how to fix this issue.
When using GitLab direct upload, the When using GitLab direct upload, the
[workaround for MinIO](https://gitlab.com/gitlab-org/charts/gitlab/-/issues/1564#note_244497658) [workaround for MinIO](https://gitlab.com/gitlab-org/charts/gitlab/-/issues/1564#note_244497658)
is to use the `--compat` parameter on the server. is to use the `--compat` parameter on the server.
We are working on a fix to GitLab component Workhorse, and also We are working on a fix to the [GitLab Workhorse
a workaround, in the mean time, to component](https://gitlab.com/gitlab-org/gitlab-workhorse/-/issues/222).
[allow ETag verification to be disabled](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/18175).
### Using Amazon instance profiles
Instead of supplying AWS access and secret keys in object storage
configuration, GitLab can be configured to use IAM roles to set up an
[Amazon instance profile](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html).
When this is used, GitLab will fetch temporary credentials each time an
S3 bucket is accessed, so no hard-coded values are needed in the
configuration.
#### Encrypted S3 buckets
> Introduced in [GitLab 13.1](https://gitlab.com/gitlab-org/gitlab-workhorse/-/merge_requests/466) only for instance profiles.
When configured to use an instance profile, GitLab Workhorse
will properly upload files to S3 buckets that have [SSE-S3 or SSE-KMS
encryption enabled by default](https://docs.aws.amazon.com/kms/latest/developerguide/services-s3.html).
Note that customer master keys (CMKs) and SSE-C encryption are not yet
supported since this requires supplying keys to the GitLab
configuration.
Without instance profiles enabled (or prior to GitLab 13.1), GitLab
Workhorse will upload files to S3 using pre-signed URLs that do not have
a `Content-MD5` HTTP header computed for them. To ensure data is not
corrupted, Workhorse checks that the MD5 hash of the data sent equals
the ETag header returned from the S3 server. When encryption is enabled,
this is not the case, which causes Workhorse to report an `ETag
mismatch` error during an upload.
With instance profiles enabled, GitLab Workhorse uses an AWS S3 client
that properly computes and sends the `Content-MD5` header to the server,
which eliminates the need for comparing ETag headers. If the data is
corrupted in transit, the S3 server will reject the file.
#### IAM Permissions
To set up an instance profile, create an Amazon Identity Access and
Management (IAM) role with the necessary permissions. The following
example is a role for an S3 bucket named `test-bucket`:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:AbortMultipartUpload",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::test-bucket/*"
}
]
}
```
Associate this role with your GitLab instance, and then configure GitLab
to use it via the `use_iam_profile` configuration option. For example,
when configuring uploads to use object storage, see the `AWS IAM profiles`
section in [S3 compatible connection settings](uploads.md#s3-compatible-connection-settings).
#### Disabling the feature
The Workhorse S3 client is only enabled when the `use_iam_profile`
configuration flag is `true`.
To disable this feature, ask a GitLab administrator with [Rails console access](feature_flags.md#how-to-enable-and-disable-features-behind-flags) to run the
following command:
```ruby
Feature.disable(:use_workhorse_s3_client)
```
...@@ -46,7 +46,7 @@ module ObjectStorage ...@@ -46,7 +46,7 @@ module ObjectStorage
MultipartUpload: multipart_upload_hash, MultipartUpload: multipart_upload_hash,
CustomPutHeaders: true, CustomPutHeaders: true,
PutHeaders: upload_options PutHeaders: upload_options
}.compact }.merge(workhorse_client_hash).compact
end end
def multipart_upload_hash def multipart_upload_hash
...@@ -60,6 +60,32 @@ module ObjectStorage ...@@ -60,6 +60,32 @@ module ObjectStorage
} }
end end
def workhorse_client_hash
return {} unless aws?
{
UseWorkhorseClient: use_workhorse_s3_client?,
RemoteTempObjectID: object_name,
ObjectStorage: {
Provider: 'AWS',
S3Config: {
Bucket: bucket_name,
Region: credentials[:region],
Endpoint: credentials[:endpoint],
PathStyle: credentials.fetch(:path_style, false),
UseIamProfile: credentials.fetch(:use_iam_profile, false)
}
}
}
end
def use_workhorse_s3_client?
Feature.enabled?(:use_workhorse_s3_client, default_enabled: true) &&
credentials.fetch(:use_iam_profile, false) &&
# The Golang AWS SDK does not support V2 signatures
credentials.fetch(:aws_signature_version, 4).to_i >= 4
end
def provider def provider
credentials[:provider].to_s credentials[:provider].to_s
end end
......
...@@ -3,11 +3,17 @@ ...@@ -3,11 +3,17 @@
require 'spec_helper' require 'spec_helper'
describe ObjectStorage::DirectUpload do describe ObjectStorage::DirectUpload do
let(:region) { 'us-east-1' }
let(:path_style) { false }
let(:use_iam_profile) { false }
let(:credentials) do let(:credentials) do
{ {
provider: 'AWS', provider: 'AWS',
aws_access_key_id: 'AWS_ACCESS_KEY_ID', aws_access_key_id: 'AWS_ACCESS_KEY_ID',
aws_secret_access_key: 'AWS_SECRET_ACCESS_KEY' aws_secret_access_key: 'AWS_SECRET_ACCESS_KEY',
region: region,
path_style: path_style,
use_iam_profile: use_iam_profile
} }
end end
...@@ -57,6 +63,62 @@ describe ObjectStorage::DirectUpload do ...@@ -57,6 +63,62 @@ describe ObjectStorage::DirectUpload do
describe '#to_hash' do describe '#to_hash' do
subject { direct_upload.to_hash } subject { direct_upload.to_hash }
shared_examples 'a valid S3 upload' do
it_behaves_like 'a valid upload'
it 'sets Workhorse client data' do
expect(subject[:UseWorkhorseClient]).to eq(use_iam_profile)
expect(subject[:RemoteTempObjectID]).to eq(object_name)
object_store_config = subject[:ObjectStorage]
expect(object_store_config[:Provider]).to eq 'AWS'
s3_config = object_store_config[:S3Config]
expect(s3_config[:Bucket]).to eq(bucket_name)
expect(s3_config[:Region]).to eq(region)
expect(s3_config[:PathStyle]).to eq(path_style)
expect(s3_config[:UseIamProfile]).to eq(use_iam_profile)
end
context 'when feature flag is disabled' do
before do
stub_feature_flags(use_workhorse_s3_client: false)
end
it 'does not enable Workhorse client' do
expect(subject[:UseWorkhorseClient]).to be false
end
end
context 'when V2 signatures are used' do
before do
credentials[:aws_signature_version] = 2
end
it 'does not enable Workhorse client' do
expect(subject[:UseWorkhorseClient]).to be false
end
end
context 'when V4 signatures are used' do
before do
credentials[:aws_signature_version] = 4
end
it 'enables the Workhorse client for instance profiles' do
expect(subject[:UseWorkhorseClient]).to eq(use_iam_profile)
end
end
end
shared_examples 'a valid Google upload' do
it_behaves_like 'a valid upload'
it 'does not set Workhorse client data' do
expect(subject.keys).not_to include(:UseWorkhorseClient, :RemoteTempObjectID, :ObjectStorage)
end
end
shared_examples 'a valid upload' do shared_examples 'a valid upload' do
it "returns valid structure" do it "returns valid structure" do
expect(subject).to have_key(:Timeout) expect(subject).to have_key(:Timeout)
...@@ -97,6 +159,16 @@ describe ObjectStorage::DirectUpload do ...@@ -97,6 +159,16 @@ describe ObjectStorage::DirectUpload do
end end
end end
shared_examples 'a valid S3 upload without multipart data' do
it_behaves_like 'a valid S3 upload'
it_behaves_like 'a valid upload without multipart data'
end
shared_examples 'a valid S3 upload with multipart data' do
it_behaves_like 'a valid S3 upload'
it_behaves_like 'a valid upload with multipart data'
end
shared_examples 'a valid upload without multipart data' do shared_examples 'a valid upload without multipart data' do
it_behaves_like 'a valid upload' it_behaves_like 'a valid upload'
...@@ -109,13 +181,50 @@ describe ObjectStorage::DirectUpload do ...@@ -109,13 +181,50 @@ describe ObjectStorage::DirectUpload do
context 'when length is known' do context 'when length is known' do
let(:has_length) { true } let(:has_length) { true }
it_behaves_like 'a valid upload without multipart data' it_behaves_like 'a valid S3 upload without multipart data'
context 'when path style is true' do
let(:path_style) { true }
let(:storage_url) { 'https://s3.amazonaws.com/uploads' }
before do
stub_object_storage_multipart_init(storage_url, "myUpload")
end
it_behaves_like 'a valid S3 upload without multipart data'
end
context 'when IAM profile is true' do
let(:use_iam_profile) { true }
let(:iam_credentials_url) { "http://169.254.169.254/latest/meta-data/iam/security-credentials/" }
let(:iam_credentials) do
{
'AccessKeyId' => 'dummykey',
'SecretAccessKey' => 'dummysecret',
'Token' => 'dummytoken',
'Expiration' => 1.day.from_now.xmlschema
}
end
before do
stub_request(:get, iam_credentials_url)
.to_return(status: 200, body: "somerole", headers: {})
stub_request(:get, "#{iam_credentials_url}somerole")
.to_return(status: 200, body: iam_credentials.to_json, headers: {})
end
it_behaves_like 'a valid S3 upload without multipart data'
end
end end
context 'when length is unknown' do context 'when length is unknown' do
let(:has_length) { false } let(:has_length) { false }
it_behaves_like 'a valid upload with multipart data' do it_behaves_like 'a valid S3 upload with multipart data' do
before do
stub_object_storage_multipart_init(storage_url, "myUpload")
end
context 'when maximum upload size is 10MB' do context 'when maximum upload size is 10MB' do
let(:maximum_size) { 10.megabyte } let(:maximum_size) { 10.megabyte }
...@@ -169,12 +278,14 @@ describe ObjectStorage::DirectUpload do ...@@ -169,12 +278,14 @@ describe ObjectStorage::DirectUpload do
context 'when length is known' do context 'when length is known' do
let(:has_length) { true } let(:has_length) { true }
it_behaves_like 'a valid Google upload'
it_behaves_like 'a valid upload without multipart data' it_behaves_like 'a valid upload without multipart data'
end end
context 'when length is unknown' do context 'when length is unknown' do
let(:has_length) { false } let(:has_length) { false }
it_behaves_like 'a valid Google upload'
it_behaves_like 'a valid upload without multipart data' it_behaves_like 'a valid upload without multipart data'
end end
end end
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment