Commit 970d81e2 authored by Kirill Smelkov's avatar Kirill Smelkov

*: Fix memory corruptions caused by improper git2go usage

Alain reports that lab.nexedi.com backup restoration sometimes fails with error like

    ...
    # file gitlab/misc -> .../srv/backup/backup-gitlab.git/gitlab-backup.Pj0fpp/gitlab_backup/db/database.pgdump/7630.dat/7630.dat.ry main.cmd_restore: main.blob_to_file: write .../srv/backup/backup-gitlab.git/gitlab-backup.Pj0fpp/gitlab_backup/db/database.pgdump/7630.dat/7630.dat.ry: bad address

which means that write system call invoked by writefile at tail of blob_to_file returned EFAULT.

The blob_to_file function is organized approximately as this:

    blob_to_file(blob_sha1, path) {
        blob = ReadObject(blob_sha1, git.ObjectBlob)
        blob_content = blob.Data()
        writefile(path, blob_content)
    }

and getting EFAULT inside writefile means that blob_content points to
some unmapped memory.

How that could be?

The answer is that blob.Data(), as implemented by git2go, returns []byte
that points to Cgo memory owned by blob object, and the blob object has
finalizer that frees that memory, which sometimes leads to libc
allocator to also return freed region completely back to OS by doing
munmap:

    https://github.com/libgit2/git2go/blob/v31.7.9-0-gcbca5b8/odb.go#L345-L359
    https://github.com/libgit2/git2go/blob/v31.7.9-0-gcbca5b8/odb.go#L162-L177
    https://github.com/libgit2/git2go/blob/v31.7.9-0-gcbca5b8/odb.go#L322-L325

and if that happens we see the EFAULT, but if no munmap happens we can
be saving corrupt data to restored file.

The OdbObject.Data even has comment about that - that one needs to keep
the object alive until retrieved data is used:

    // Data returns a slice pointing to the unmanaged object memory. You must make
    // sure the object is referenced for at least as long as the slice is used.
    func (object *OdbObject) Data() (data []byte) {

but this comment was added in 2017 in https://github.com/libgit2/git2go/commit/55a109614151
as part of https://github.com/libgit2/git2go/pull/393 while doing
"KeepAlive all the things" to fix segmentation faults and other
misbehaviours.

I missed all that because we switched blob_to_file from `git cat-file`
to git2go in 2016 in fbd72c02 (Switch file_to_blob() and blob_to_file()
to work without spawning Git subprocesses) and we never actively worked
on that part of code anymore. For the reference the git2go introduction
to git-backup happened on that same day in 2016 in 624393db (Hook in
git2go  (cgo bindings to libgit2)).

The problem of memory corruption inside blob_to_file can be reliably
reproduced via injecting the following patch

    blob_to_file(blob_sha1, path) {
        blob = ReadObject(blob_sha1, git.ObjectBlob)
        blob_content = blob.Data()
   +    runtime.GC()
        writefile(path, blob_content)
    }

which leads to e.g. the following test failure at every test run:

    === RUN   TestPullRestore
    ...
    # file b1	-> /tmp/t-git-backup2575257088/1/symlink.file
        git-backup_test.go:109: git-backup_test.go:297: lab.nexedi.com/kirr/git-backup.cmd_restore: lab.nexedi.com/kirr/git-backup.blob_to_file: symlink ^D<80><8c>þ^@^@^2h space + α /tmp/t-git-backup2575257088/1/symlink.file: invalid argument

and the memory corruption can be fixed reliably by adding proper
runtime.KeepAlive so that the blob object assuredly stays alive during
writefile call:

    blob_to_file(blob_sha1, path) {
        blob = ReadObject(blob_sha1, git.ObjectBlob)
        blob_content = blob.Data()
        writefile(path, blob_content)
   +    runtime.KeepAlive(blob)
    }

However going through git2go code it could be seen that it is full of
Go <-> C interactions and given that there is a track records of catching
many crashes due to not getting lifetime management right (see
e.g. https://github.com/libgit2/git2go/issues/352, https://github.com/libgit2/git2go/issues/334,
https://github.com/libgit2/git2go/issues/553, https://github.com/libgit2/git2go/issues/513,
https://github.com/libgit2/git2go/issues/373, https://github.com/libgit2/git2go/pull/387
and once again https://github.com/libgit2/git2go/pull/393) there is no
guarantee that no any other similar issue is there anywhere else
besides OdbObject.Data().

With that we either need to put a lot of runtime.KeepAlive after every
interaction with git2go, and put it properly, switch back to `git
cat-file` and similar things reverting fbd72c02 and friends, or do
something else.

As fbd72c02 explains switching back to `git cat-file` will slowdown
files restoration by an order of magnitude. Putting runtime.KeepAlive is
also not practical because it is hard to see all the places where we
interact with git2go, even indirectly, and so it is easy to make mistakes.

-> Thus let's keep the code that interacts with git2go well localized
   (done by previous patch), and let's make a copy over every string or
   []byte object we receive from git2go with adding careful
   runtime.KeepAlive post after that.

This fixes the problem of blob_to_file data corruption and it should fix
all other potential memory corruption problems we might ever have with
git2go due to potentially improper usage on git-backup side.

The copy cost is smaller compared to the cost of either spawning e.g. `git
cat-file` for every object, or interacting with `git cat-file --batch`
server spawned once, but still spending context switches on every request
and still making the copy on socket or pipe transfer. But most of all the
copy cost is negligible to the cost of catching hard to reproduce crashes or
data corruptions in the production environment.

For the reference the time it takes to restore "files" part of
lab.nexedi.com backup was ~ 1m51s before this patch, and became ~ 1m55s
after this patch indicating ~ 3.5% slowdown for that part. Which could be
said as noticeable but not big, and since most of the time is spent
during git pack restoration, taking much more time than files, those
several seconds of slowdown become completely negligible.

/reported-by @alain.takoudjou, @tomo
/reported-at https://www.erp5.com/group_section/forum/Gitlab-backup-zDVMZqaMAK/view?list_start=15&reset=1#2074747282
/cc @jerome, @rafael
parent 86f6afce
......@@ -156,11 +156,16 @@ func file_to_blob(g *git.Repository, path string) (Sha1, uint32) {
}
// blob_sha1, mode -> file
var tblob_to_file_mid_hook func()
func blob_to_file(g *git.Repository, blob_sha1 Sha1, mode uint32, path string) {
blob, err := ReadObject(g, blob_sha1, git.ObjectBlob)
exc.Raiseif(err)
blob_content := blob.Data()
if tblob_to_file_mid_hook != nil {
tblob_to_file_mid_hook() // we used to corrupt memory if GC is invoked right here
}
err = os.MkdirAll(pathpkg.Dir(path), 0777)
exc.Raiseif(err)
......
......@@ -27,6 +27,7 @@ import (
"os/exec"
"path/filepath"
"regexp"
"runtime"
"strings"
"syscall"
"testing"
......@@ -447,3 +448,11 @@ func TestRepoRefSplit(t *testing.T) {
}
}
}
// blob_to_file used to corrupt memory if GC triggers inside it
func init() {
tblob_to_file_mid_hook = func() {
runtime.GC()
}
}
// Copyright (C) 2025 Nexedi SA and Contributors.
// Kirill Smelkov <kirr@nexedi.com>
//
// This program is free software: you can Use, Study, Modify and Redistribute
// it under the terms of the GNU General Public License version 3, or (at your
// option) any later version, as published by the Free Software Foundation.
//
// You can also Link and Combine this program with other software covered by
// the terms of any of the Free Software licenses or any of the Open Source
// Initiative approved licenses and Convey the resulting work. Corresponding
// source of such a combination shall include the source code for all other
// software used.
//
// This program is distributed WITHOUT ANY WARRANTY; without even the implied
// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
//
// See COPYING file for full licensing terms.
// See https://www.nexedi.com/licensing for rationale and options.
//go:build !go1.20
// +build !go1.20
package git
func bytesClone(b []byte) []byte {
if b == nil {
return nil
}
b2 := make([]byte, len(b))
copy(b2, b)
return b2
}
// Copyright (C) 2025 Nexedi SA and Contributors.
// Kirill Smelkov <kirr@nexedi.com>
//
// This program is free software: you can Use, Study, Modify and Redistribute
// it under the terms of the GNU General Public License version 3, or (at your
// option) any later version, as published by the Free Software Foundation.
//
// You can also Link and Combine this program with other software covered by
// the terms of any of the Free Software licenses or any of the Open Source
// Initiative approved licenses and Convey the resulting work. Corresponding
// source of such a combination shall include the source code for all other
// software used.
//
// This program is distributed WITHOUT ANY WARRANTY; without even the implied
// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
//
// See COPYING file for full licensing terms.
// See https://www.nexedi.com/licensing for rationale and options.
//go:build go1.20
// +build go1.20
package git
import (
"bytes"
)
func bytesClone(b []byte) []byte {
return bytes.Clone(b)
}
......@@ -17,13 +17,45 @@
// See COPYING file for full licensing terms.
// See https://www.nexedi.com/licensing for rationale and options.
// Package internal/git wraps package git2go.
// Package internal/git wraps package git2go with providing unconditional safety.
//
// For example git2go.Object.Data() returns []byte that aliases unsafe memory
// that can go away from under []byte if original Object is garbage collected.
// The following code snippet is thus _not_ correct:
//
// obj = odb.Read(sha1)
// data = obj.Data()
// ... use data
//
// because obj can be garbage-collected right after `data = obj.Data()` but
// before `use data` leading to either crashes or memory corruption. A
// runtime.KeepAlive(obj) needs to be added to the end of the snippet - after
// `use data` - to make that code correct.
//
// Given that obj.Data() is not "speaking" by itself as unsafe, and that there
// are many similar methods, it is hard to see which places in the code needs
// special attention.
//
// For this reason git-backup took decision to localize git2go-related code in
// one small place here, and to expose only safe things to outside. That is we
// make data copies when reading object data and similar things to provide
// unconditional safety to the caller via that copy cost.
//
// The copy cost is smaller compared to the cost of either spawning e.g. `git
// cat-file` for every object, or interacting with `git cat-file --batch`
// server spawned once, but still spending context switches on every request
// and still making the copy on socket or pipe transfer. But most of all the
// copy cost is negligible to the cost of catching hard to reproduce crashes or
// data corruptions in the production environment.
package git
import (
"runtime"
git2go "github.com/libgit2/git2go/v31"
)
// constants are safe to propagate as is.
const (
ObjectAny = git2go.ObjectAny
ObjectInvalid = git2go.ObjectInvalid
......@@ -34,39 +66,49 @@ const (
)
// types that are safe to propagate as is.
type (
ObjectType = git2go.ObjectType
Oid = git2go.Oid
Signature = git2go.Signature
TreeEntry = git2go.TreeEntry
ObjectType = git2go.ObjectType // int
Oid = git2go.Oid // [20]byte ; cloned when retrieved
Signature = git2go.Signature // struct with strings ; strings are cloned when retrieved
TreeEntry = git2go.TreeEntry // struct with sting, Oid, ... ; strings and oids are cloned when retrieved
)
// types that we wrap to provide safety.
// Repository provides safe wrapper over git2go.Repository .
type Repository struct {
repo *git2go.Repository
References *ReferenceCollection
}
// ReferenceCollection provides safe wrapper over git2go.ReferenceCollection .
type ReferenceCollection struct {
r *Repository
}
// Reference provides safe wrapper over git2go.Reference .
type Reference struct {
ref *git2go.Reference
}
// Commit provides safe wrapper over git2go.Commit .
type Commit struct {
commit *git2go.Commit
}
// Tree provides safe wrapper over git2go.Tree .
type Tree struct {
tree *git2go.Tree
}
// Odb provides safe wrapper over git2go.Odb .
type Odb struct {
odb *git2go.Odb
}
// OdbObject provides safe wrapper over git2go.OdbObject .
type OdbObject struct {
obj *git2go.OdbObject
}
......@@ -125,43 +167,89 @@ func (o *Odb) Read(oid *Oid) (*OdbObject, error) {
}
// wrappers over methods
// wrappers over safe methods
func (c *Commit) ParentCount() uint { return c.commit.ParentCount() }
func (o *OdbObject) Type() ObjectType { return o.obj.Type() }
// wrappers over unsafe, or potentially unsafe methods
func (r *Repository) Path() string {
return r.repo.Path()
path := stringsClone( r.repo.Path() )
runtime.KeepAlive(r)
return path
}
func (r *Repository) DefaultSignature() (*Signature, error) {
return r.repo.DefaultSignature()
s, err := r.repo.DefaultSignature()
if s != nil {
s = &Signature{
Name: stringsClone(s.Name),
Email: stringsClone(s.Email),
When: s.When,
}
}
runtime.KeepAlive(r)
return s, err
}
func (c *Commit) Message() string {
return c.commit.Message()
msg := stringsClone( c.commit.Message() )
runtime.KeepAlive(c)
return msg
}
func (c *Commit) ParentId(n uint) *Oid {
return c.commit.ParentId(n)
pid := oidClone( c.commit.ParentId(n) )
runtime.KeepAlive(c)
return pid
}
func (t *Tree) EntryByName(filename string) *TreeEntry {
return t.tree.EntryByName(filename)
e := t.tree.EntryByName(filename)
if e != nil {
e = &TreeEntry{
Name: stringsClone(e.Name),
Id: oidClone(e.Id),
Type: e.Type,
Filemode: e.Filemode,
}
}
runtime.KeepAlive(t)
return e
}
func (o *Odb) Write(data []byte, otype ObjectType) (*Oid, error) {
return o.odb.Write(data, otype)
oid, err := o.odb.Write(data, otype)
oid = oidClone(oid)
runtime.KeepAlive(o)
return oid, err
}
func (o *OdbObject) Id() *Oid {
return o.obj.Id()
id := oidClone( o.obj.Id() )
runtime.KeepAlive(o)
return id
}
func (o *OdbObject) Data() []byte {
return o.obj.Data()
data := bytesClone( o.obj.Data() )
runtime.KeepAlive(o)
return data
}
// misc
func oidClone(oid *Oid) *Oid {
var oid2 Oid
if oid == nil {
return nil
}
copy(oid2[:], oid[:])
return &oid2
}
// Copyright (C) 2025 Nexedi SA and Contributors.
// Kirill Smelkov <kirr@nexedi.com>
//
// This program is free software: you can Use, Study, Modify and Redistribute
// it under the terms of the GNU General Public License version 3, or (at your
// option) any later version, as published by the Free Software Foundation.
//
// You can also Link and Combine this program with other software covered by
// the terms of any of the Free Software licenses or any of the Open Source
// Initiative approved licenses and Convey the resulting work. Corresponding
// source of such a combination shall include the source code for all other
// software used.
//
// This program is distributed WITHOUT ANY WARRANTY; without even the implied
// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
//
// See COPYING file for full licensing terms.
// See https://www.nexedi.com/licensing for rationale and options.
//go:build !go1.18
// +build !go1.18
package git
func stringsClone(s string) string {
b := make([]byte, len(s))
copy(b, s)
return string(b)
}
// Copyright (C) 2025 Nexedi SA and Contributors.
// Kirill Smelkov <kirr@nexedi.com>
//
// This program is free software: you can Use, Study, Modify and Redistribute
// it under the terms of the GNU General Public License version 3, or (at your
// option) any later version, as published by the Free Software Foundation.
//
// You can also Link and Combine this program with other software covered by
// the terms of any of the Free Software licenses or any of the Open Source
// Initiative approved licenses and Convey the resulting work. Corresponding
// source of such a combination shall include the source code for all other
// software used.
//
// This program is distributed WITHOUT ANY WARRANTY; without even the implied
// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
//
// See COPYING file for full licensing terms.
// See https://www.nexedi.com/licensing for rationale and options.
//go:build go1.18
// +build go1.18
package git
import (
"strings"
)
func stringsClone(s string) string {
return strings.Clone(s)
}
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment