Commit a65f0883 authored by Katie Hockman's avatar Katie Hockman

doc: add section for checksum database to module reference doc

Updates #33637

Change-Id: Ia782b3fdc5a8873606b96120a34c9bf194a1a346
Reviewed-on: https://go-review.googlesource.com/c/go/+/211197Reviewed-by: default avatarJay Conrod <jayconrod@google.com>
parent 73b657e9
......@@ -276,6 +276,10 @@ list. The `go` command does not fall back to later proxies in response to other
4xx and 5xx errors. This allows a proxy to act as a gatekeeper, for example, by
responding with error 403 (Forbidden) for modules not on an approved list.
<!-- TODO(katiehockman): why only fall back for 410/404? Either add the details
here, or write a blog post about how to build multiple types of proxies. e.g.
a "privacy preserving one" and an "authorization one" -->
The table below specifies queries that a module proxy must respond to. For each
path, `$base` is the path portion of a proxy URL,`$module` is a module path, and
`$version` is a version. For example, if the proxy URL is
......@@ -425,12 +429,170 @@ setting `GOPROXY` to `https://example.com/proxy`.
<a id="authenticating"></a>
## Authenticating modules
<!-- TODO: continue this section -->
When deciding whether to trust the source code for a module version just
fetched from a proxy or origin server, the `go` command first consults the
`go.sum` lines in the `go.sum` file of the current module. If the `go.sum` file
does not contain an entry for that module version, then it may consult the
checksum database.
<a id="go.sum-file-format"></a>
### go.sum file format
<a id="checksum-database"></a>
### Checksum database
The checksum database is a global source of `go.sum` lines. The `go` command can
use this in many situations to detect misbehavior by proxies or origin servers.
The checksum database allows for global consistency and reliability for all
publicly available module versions. It makes untrusted proxies possible since
they can't serve the wrong code without it going unnoticed. It also ensures
that the bits associated with a specific version do not change from one day to
the next, even if the module's author subsequently alters the tags in their
repository.
The checksum database is served by [sum.golang.org](https://sum.golang.org),
which is run by Google. It is a [Transparent
Log](https://research.swtch.com/tlog) (or “Merkle Tree”) of `go.sum` line
hashes, which is backed by [Trillian](https://github.com/google/trillian). The
main advantage of a Merkle tree is that independent auditors can verify that it
hasn't been tampered with, so it is more trustworthy than a simple database.
The `go` command interacts with the checksum database using the protocol
originally outlined in [Proposal: Secure the Public Go Module
Ecosystem](https://go.googlesource.com/proposal/+/master/design/25530-sumdb.md#checksum-database).
The table below specifies queries that the checksum database must respond to.
For each path, `$base` is the path portion of the checksum database URL,
`$module` is a module path, and `$version` is a version. For example, if the
checksum database URL is `https://sum.golang.org`, and the client is requesting
the record for the module `golang.org/x/text` at version `v0.3.2`, the client
would send a `GET` request for
`https://sum.golang.org/lookup/golang.org/x/text@v0.3.2`.
To avoid ambiguity when serving from case-insensitive file systems,
the `$module` and `$version` elements are
[case-encoded](https://pkg.go.dev/golang.org/x/mod/module#EscapePath)
by replacing every uppercase letter with an exclamation mark followed by the
corresponding lower-case letter. This allows modules `example.com/M` and
`example.com/m` to both be stored on disk, since the former is encoded as
`example.com/!m`.
Parts of the path surrounded by square brakets, like `[.p/$W]` denote optional
values.
<table>
<thead>
<tr>
<th>Path</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>$base/latest</code></td>
<td>
Returns a signed, encoded tree description for the latest log. This
signed description is in the form of a
<a href="https://pkg.go.dev/golang.org/x/mod/sumdb/note">note</a>,
which is text that has been signed by one or more server keys and can
be verified using the server's public key. The tree description
provides the size of the tree and the hash of the tree head at that
size. This encoding is described in
<code><a href="https://pkg.go.dev/golang.org/x/mod/sumdb/tlog#FormatTree">
golang.org/x/mod/sumdb/tlog#FormatTree</a></code>.
</td>
</tr>
<tr>
<tr>
<td><code>$base/lookup/$module@$version</code></td>
<td>
Returns the log record number for the entry about <code>$module</code>
at <code>$version</code>, followed by the data for the record (that is,
the <code>go.sum</code> lines for <code>$module</code> at
<code>$version</code>) and a signed, encoded tree description that
contains the record.
</td>
</tr>
<tr>
<tr>
<td><code>$base/tile/$H/$L/$K[.p/$W]</code></td>
<td>
Returns a [log tile](https://research.swtch.com/tlog#serving_tiles),
which is a set of hashes that make up a section of the log. Each tile
is defined in a two-dimensional coordinate at tile level
<code>$L</code>, <code>$K</code>th from the left, with a tile height of
<code>$H</code>. The optional <code>.p/$W</code> suffix indicates a
partial log tile with only <code>$W</code> hashes. Clients must fall
back to fetching the full tile if a partial tile is not found.
</td>
</tr>
<tr>
<tr>
<td><code>$base/tile/$H/data/$K[.p/$W]</code></td>
<td>
Returns the record data for the leaf hashes in
<code>/tile/$H/0/$K[.p/$W]</code> (with a literal <code>data</code> path
element).
</td>
</tr>
<tr>
</tbody>
</table>
If the `go` command consults the checksum database, then the first
step is to retrieve the record data through the `/lookup` endpoint. If the
module version is not yet recorded in the log, the checksum database will try
to fetch it from the origin server before replying. This `/lookup` data
provides the sum for this module version as well as its position in the log,
which informs the client of which tiles should be fetched to perform proofs.
The `go` command performs “inclusion” proofs (that a specific record exists in
the log) and “consistency” proofs (that the tree hasn’t been tampered with)
before adding new `go.sum` lines to the main module’s `go.sum` file. It's
important that the data from `/lookup` should never be used without first
authenticating it against the signed tree hash and authenticating the signed
tree hash against the client's timeline of signed tree hashes.
Signed tree hashes and new tiles served by the checksum database are stored
in the module cache, so the `go` command only needs to fetch tiles that are
missing.
The `go` command doesn't need to directly connect to the checksum database. It
can request module sums via a module proxy that
[mirrors the checksum database](https://go.googlesource.com/proposal/+/master/design/25530-sumdb.md#proxying-a-checksum-database)
and supports the protocol above. This can be particularly helpful for private,
corporate proxies which block requests outside the organization.
The `GOSUMDB` environment variable identifies the name of checksum database to use
and optionally its public key and URL, as in:
```
GOSUMDB="sum.golang.org"
GOSUMDB="sum.golang.org+<publickey>"
GOSUMDB="sum.golang.org+<publickey> https://sum.golang.org"
```
The `go` command knows the public key of `sum.golang.org`, and also that the
name `sum.golang.google.cn` (available inside mainland China) connects to the
`sum.golang.org` checksum database; use of any other database requires giving
the public key explicitly. The URL defaults to `https://` followed by the
database name.
`GOSUMDB` defaults to `sum.golang.org`, the Go checksum database run by Google.
See https://sum.golang.org/privacy for the service's privacy policy.
If `GOSUMDB` is set to `off`, or if `go get` is invoked with the `-insecure`
flag, the checksum database is not consulted, and all unrecognized modules are
accepted, at the cost of giving up the security guarantee of verified
repeatable downloads for all modules. A better way to bypass the checksum
database for specific modules is to use the `GOPRIVATE` or `GONOSUMDB`
environment variables. See [Private Modules](#private-modules) for details.
The `go env -w` command can be used to
[set these variables](/pkg/cmd/go/#hdr-Print_Go_environment_information)
for future `go` command invocations.
<a id="privacy"></a>
## Privacy
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment