*: Fix memory corruptions caused by improper git2go usage

Alain reports that lab.nexedi.com backup restoration sometimes fails with error like ... # file gitlab/misc -> .../srv/backup/backup-gitlab.git/gitlab-backup.Pj0fpp/gitlab_backup/db/database.pgdump/7630.dat/7630.dat.ry main.cmd_restore: main.blob_to_file: write .../srv/backup/backup-gitlab.git/gitlab-backup.Pj0fpp/gitlab_backup/db/database.pgdump/7630.dat/7630.dat.ry: bad address which means that write system call invoked by writefile at tail of blob_to_file returned EFAULT. The blob_to_file function is organized approximately as this: blob_to_file(blob_sha1, path) { blob = ReadObject(blob_sha1, git.ObjectBlob) blob_content = blob.Data() writefile(path, blob_content) } and getting EFAULT inside writefile means that blob_content points to some unmapped memory. How that could be? The answer is that blob.Data(), as implemented by git2go, returns []byte that points to Cgo memory owned by blob object, and the blob object has finalizer that frees that memory, which sometimes leads to libc allocator to also return freed region completely back to OS by doing munmap: https://github.com/libgit2/git2go/blob/v31.7.9-0-gcbca5b8/odb.go#L345-L359 https://github.com/libgit2/git2go/blob/v31.7.9-0-gcbca5b8/odb.go#L162-L177 https://github.com/libgit2/git2go/blob/v31.7.9-0-gcbca5b8/odb.go#L322-L325 and if that happens we see the EFAULT, but if no munmap happens we can be saving corrupt data to restored file. The OdbObject.Data even has comment about that - that one needs to keep the object alive until retrieved data is used: // Data returns a slice pointing to the unmanaged object memory. You must make // sure the object is referenced for at least as long as the slice is used. func (object *OdbObject) Data() (data []byte) { but this comment was added in 2017 in https://github.com/libgit2/git2go/commit/55a109614151 as part of https://github.com/libgit2/git2go/pull/393 while doing "KeepAlive all the things" to fix segmentation faults and other misbehaviours. I missed all that because we switched blob_to_file from `git cat-file` to git2go in 2016 in fbd72c02 (Switch file_to_blob() and blob_to_file() to work without spawning Git subprocesses) and we never actively worked on that part of code anymore. For the reference the git2go introduction to git-backup happened on that same day in 2016 in 624393db (Hook in git2go (cgo bindings to libgit2)). The problem of memory corruption inside blob_to_file can be reliably reproduced via injecting the following patch blob_to_file(blob_sha1, path) { blob = ReadObject(blob_sha1, git.ObjectBlob) blob_content = blob.Data() + runtime.GC() writefile(path, blob_content) } which leads to e.g. the following test failure at every test run: === RUN TestPullRestore ... # file b1 -> /tmp/t-git-backup2575257088/1/symlink.file git-backup_test.go:109: git-backup_test.go:297: lab.nexedi.com/kirr/git-backup.cmd_restore: lab.nexedi.com/kirr/git-backup.blob_to_file: symlink ^D<80><8c>þ^@^@^2h space + Î± /tmp/t-git-backup2575257088/1/symlink.file: invalid argument and the memory corruption can be fixed reliably by adding proper runtime.KeepAlive so that the blob object assuredly stays alive during writefile call: blob_to_file(blob_sha1, path) { blob = ReadObject(blob_sha1, git.ObjectBlob) blob_content = blob.Data() writefile(path, blob_content) + runtime.KeepAlive(blob) } However going through git2go code it could be seen that it is full of Go <-> C interactions and given that there is a track records of catching many crashes due to not getting lifetime management right (see e.g. https://github.com/libgit2/git2go/issues/352, https://github.com/libgit2/git2go/issues/334, https://github.com/libgit2/git2go/issues/553, https://github.com/libgit2/git2go/issues/513, https://github.com/libgit2/git2go/issues/373, https://github.com/libgit2/git2go/pull/387 and once again https://github.com/libgit2/git2go/pull/393) there is no guarantee that no any other similar issue is there anywhere else besides OdbObject.Data(). With that we either need to put a lot of runtime.KeepAlive after every interaction with git2go, and put it properly, switch back to `git cat-file` and similar things reverting fbd72c02 and friends, or do something else. As fbd72c02 explains switching back to `git cat-file` will slowdown files restoration by an order of magnitude. Putting runtime.KeepAlive is also not practical because it is hard to see all the places where we interact with git2go, even indirectly, and so it is easy to make mistakes. -> Thus let's keep the code that interacts with git2go well localized (done by previous patch), and let's make a copy over every string or []byte object we receive from git2go with adding careful runtime.KeepAlive post after that. This fixes the problem of blob_to_file data corruption and it should fix all other potential memory corruption problems we might ever have with git2go due to potentially improper usage on git-backup side. The copy cost is smaller compared to the cost of either spawning e.g. `git cat-file` for every object, or interacting with `git cat-file --batch` server spawned once, but still spending context switches on every request and still making the copy on socket or pipe transfer. But most of all the copy cost is negligible to the cost of catching hard to reproduce crashes or data corruptions in the production environment. For the reference the time it takes to restore "files" part of lab.nexedi.com backup was ~ 1m51s before this patch, and became ~ 1m55s after this patch indicating ~ 3.5% slowdown for that part. Which could be said as noticeable but not big, and since most of the time is spent during git pack restoration, taking much more time than files, those several seconds of slowdown become completely negligible. /reported-by @alain.takoudjou, @tomo /reported-at https://www.erp5.com/group_section/forum/Gitlab-backup-zDVMZqaMAK/view?list_start=15&reset=1#2074747282 /cc @jerome, @rafael

*: Fix memory corruptions caused by improper git2go usage
Alain reports that lab.nexedi.com backup restoration sometimes fails with error like ... # file gitlab/misc -> .../srv/backup/backup-gitlab.git/gitlab-backup.Pj0fpp/gitlab_backup/db/database.pgdump/7630.dat/7630.dat.ry main.cmd_restore: main.blob_to_file: write .../srv/backup/backup-gitlab.git/gitlab-backup.Pj0fpp/gitlab_backup/db/database.pgdump/7630.dat/7630.dat.ry: bad address which means that write system call invoked by writefile at tail of blob_to_file returned EFAULT. The blob_to_file function is organized approximately as this: blob_to_file(blob_sha1, path) { blob = ReadObject(blob_sha1, git.ObjectBlob) blob_content = blob.Data() writefile(path, blob_content) } and getting EFAULT inside writefile means that blob_content points to some unmapped memory. How that could be? The answer is that blob.Data(), as implemented by git2go, returns []byte that points to Cgo memory owned by blob object, and the blob object has finalizer that frees that memory, which sometimes leads to libc allocator to also return freed region completely back to OS by doing munmap: https://github.com/libgit2/git2go/blob/v31.7.9-0-gcbca5b8/odb.go#L345-L359 https://github.com/libgit2/git2go/blob/v31.7.9-0-gcbca5b8/odb.go#L162-L177 https://github.com/libgit2/git2go/blob/v31.7.9-0-gcbca5b8/odb.go#L322-L325 and if that happens we see the EFAULT, but if no munmap happens we can be saving corrupt data to restored file. The OdbObject.Data even has comment about that - that one needs to keep the object alive until retrieved data is used: // Data returns a slice pointing to the unmanaged object memory. You must make // sure the object is referenced for at least as long as the slice is used. func (object *OdbObject) Data() (data []byte) { but this comment was added in 2017 in https://github.com/libgit2/git2go/commit/55a109614151 as part of https://github.com/libgit2/git2go/pull/393 while doing "KeepAlive all the things" to fix segmentation faults and other misbehaviours. I missed all that because we switched blob_to_file from `git cat-file` to git2go in 2016 in fbd72c02 (Switch file_to_blob() and blob_to_file() to work without spawning Git subprocesses) and we never actively worked on that part of code anymore. For the reference the git2go introduction to git-backup happened on that same day in 2016 in 624393db (Hook in git2go (cgo bindings to libgit2)). The problem of memory corruption inside blob_to_file can be reliably reproduced via injecting the following patch blob_to_file(blob_sha1, path) { blob = ReadObject(blob_sha1, git.ObjectBlob) blob_content = blob.Data() + runtime.GC() writefile(path, blob_content) } which leads to e.g. the following test failure at every test run: === RUN TestPullRestore ... # file b1 -> /tmp/t-git-backup2575257088/1/symlink.file git-backup_test.go:109: git-backup_test.go:297: lab.nexedi.com/kirr/git-backup.cmd_restore: lab.nexedi.com/kirr/git-backup.blob_to_file: symlink ^D<80><8c>þ^@^@^2h space + Î± /tmp/t-git-backup2575257088/1/symlink.file: invalid argument and the memory corruption can be fixed reliably by adding proper runtime.KeepAlive so that the blob object assuredly stays alive during writefile call: blob_to_file(blob_sha1, path) { blob = ReadObject(blob_sha1, git.ObjectBlob) blob_content = blob.Data() writefile(path, blob_content) + runtime.KeepAlive(blob) } However going through git2go code it could be seen that it is full of Go <-> C interactions and given that there is a track records of catching many crashes due to not getting lifetime management right (see e.g. https://github.com/libgit2/git2go/issues/352, https://github.com/libgit2/git2go/issues/334, https://github.com/libgit2/git2go/issues/553, https://github.com/libgit2/git2go/issues/513, https://github.com/libgit2/git2go/issues/373, https://github.com/libgit2/git2go/pull/387 and once again https://github.com/libgit2/git2go/pull/393) there is no guarantee that no any other similar issue is there anywhere else besides OdbObject.Data(). With that we either need to put a lot of runtime.KeepAlive after every interaction with git2go, and put it properly, switch back to `git cat-file` and similar things reverting fbd72c02 and friends, or do something else. As fbd72c02 explains switching back to `git cat-file` will slowdown files restoration by an order of magnitude. Putting runtime.KeepAlive is also not practical because it is hard to see all the places where we interact with git2go, even indirectly, and so it is easy to make mistakes. -> Thus let's keep the code that interacts with git2go well localized (done by previous patch), and let's make a copy over every string or []byte object we receive from git2go with adding careful runtime.KeepAlive post after that. This fixes the problem of blob_to_file data corruption and it should fix all other potential memory corruption problems we might ever have with git2go due to potentially improper usage on git-backup side. The copy cost is smaller compared to the cost of either spawning e.g. `git cat-file` for every object, or interacting with `git cat-file --batch` server spawned once, but still spending context switches on every request and still making the copy on socket or pipe transfer. But most of all the copy cost is negligible to the cost of catching hard to reproduce crashes or data corruptions in the production environment. For the reference the time it takes to restore "files" part of lab.nexedi.com backup was ~ 1m51s before this patch, and became ~ 1m55s after this patch indicating ~ 3.5% slowdown for that part. Which could be said as noticeable but not big, and since most of the time is spent during git pack restoration, taking much more time than files, those several seconds of slowdown become completely negligible. /reported-by @alain.takoudjou, @tomo /reported-at https://www.erp5.com/group_section/forum/Gitlab-backup-zDVMZqaMAK/view?list_start=15&reset=1#2074747282 /cc @jerome, @rafael
970d81e2 · Kirill Smelkov · 86f6afce · 970d81e2 · 970d81e2 · 970d81e2
Commit 970d81e2 authored Feb 07, 2025 by Kirill Smelkov
7 changed files
--- a/git-backup.go
+++ b/git-backup.go
@@ -156,11 +156,16 @@ func file_to_blob(g *git.Repository, path string) (Sha1, uint32) {
 }

 // blob_sha1, mode -> file
+var tblob_to_file_mid_hook func()
 func blob_to_file(g *git.Repository, blob_sha1 Sha1, mode uint32, path string) {
 	blob, err := ReadObject(g, blob_sha1, git.ObjectBlob)
 	exc.Raiseif(err)
 	blob_content := blob.Data()

+	if tblob_to_file_mid_hook != nil {
+		tblob_to_file_mid_hook() // we used to corrupt memory if GC is invoked right here
+	}
+
 	err = os.MkdirAll(pathpkg.Dir(path), 0777)
 	exc.Raiseif(err)


--- a/git-backup_test.go
+++ b/git-backup_test.go
@@ -27,6 +27,7 @@ import (
 	"os/exec"
 	"path/filepath"
 	"regexp"
+	"runtime"
 	"strings"
 	"syscall"
 	"testing"
@@ -447,3 +448,11 @@ func TestRepoRefSplit(t *testing.T) {
 		}
 	}
 }
+
+
+// blob_to_file used to corrupt memory if GC triggers inside it
+func init() {
+	tblob_to_file_mid_hook = func() {
+		runtime.GC()
+	}
+}
--- a/internal/git/bytes_go1.19.go
+++ b/internal/git/bytes_go1.19.go
+// Copyright (C) 2025  Nexedi SA and Contributors.
+//                     Kirill Smelkov <kirr@nexedi.com>
+//
+// This program is free software: you can Use, Study, Modify and Redistribute
+// it under the terms of the GNU General Public License version 3, or (at your
+// option) any later version, as published by the Free Software Foundation.
+//
+// You can also Link and Combine this program with other software covered by
+// the terms of any of the Free Software licenses or any of the Open Source
+// Initiative approved licenses and Convey the resulting work. Corresponding
+// source of such a combination shall include the source code for all other
+// software used.
+//
+// This program is distributed WITHOUT ANY WARRANTY; without even the implied
+// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+//
+// See COPYING file for full licensing terms.
+// See https://www.nexedi.com/licensing for rationale and options.
+
+//go:build !go1.20
+// +build !go1.20
+
+package git
+
+
+func bytesClone(b []byte) []byte {
+	if b == nil {
+		return nil
+	}
+	b2 := make([]byte, len(b))
+	copy(b2, b)
+	return b2
+}
--- a/internal/git/bytes_go1.20.go
+++ b/internal/git/bytes_go1.20.go
+// Copyright (C) 2025  Nexedi SA and Contributors.
+//                     Kirill Smelkov <kirr@nexedi.com>
+//
+// This program is free software: you can Use, Study, Modify and Redistribute
+// it under the terms of the GNU General Public License version 3, or (at your
+// option) any later version, as published by the Free Software Foundation.
+//
+// You can also Link and Combine this program with other software covered by
+// the terms of any of the Free Software licenses or any of the Open Source
+// Initiative approved licenses and Convey the resulting work. Corresponding
+// source of such a combination shall include the source code for all other
+// software used.
+//
+// This program is distributed WITHOUT ANY WARRANTY; without even the implied
+// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+//
+// See COPYING file for full licensing terms.
+// See https://www.nexedi.com/licensing for rationale and options.
+
+//go:build go1.20
+// +build go1.20
+
+package git
+
+import (
+	"bytes"
+)
+
+
+func bytesClone(b []byte) []byte {
+	return bytes.Clone(b)
+}
--- a/internal/git/git.go
+++ b/internal/git/git.go
@@ -17,13 +17,45 @@
 // See COPYING file for full licensing terms.
 // See https://www.nexedi.com/licensing for rationale and options.

-// Package internal/git wraps package git2go.
+// Package internal/git wraps package git2go with providing unconditional safety.
+//
+// For example git2go.Object.Data() returns []byte that aliases unsafe memory
+// that can go away from under []byte if original Object is garbage collected.
+// The following code snippet is thus _not_ correct:
+//
+//	obj = odb.Read(sha1)
+//	data = obj.Data()
+//	... use data
+//
+// because obj can be garbage-collected right after `data = obj.Data()` but
+// before `use data` leading to either crashes or memory corruption. A
+// runtime.KeepAlive(obj) needs to be added to the end of the snippet - after
+// `use data` - to make that code correct.
+//
+// Given that obj.Data() is not "speaking" by itself as unsafe, and that there
+// are many similar methods, it is hard to see which places in the code needs
+// special attention.
+//
+// For this reason git-backup took decision to localize git2go-related code in
+// one small place here, and to expose only safe things to outside. That is we
+// make data copies when reading object data and similar things to provide
+// unconditional safety to the caller via that copy cost.
+//
+// The copy cost is smaller compared to the cost of either spawning e.g. `git
+// cat-file` for every object, or interacting with `git cat-file --batch`
+// server spawned once, but still spending context switches on every request
+// and still making the copy on socket or pipe transfer. But most of all the
+// copy cost is negligible to the cost of catching hard to reproduce crashes or
+// data corruptions in the production environment.
 package git

 import (
+	"runtime"
+
 	git2go "github.com/libgit2/git2go/v31"
 )

+// constants are safe to propagate as is.
 const (
 	ObjectAny     = git2go.ObjectAny
 	ObjectInvalid = git2go.ObjectInvalid
@@ -34,39 +66,49 @@ const (
 )


+// types that are safe to propagate as is.
 type (
-	ObjectType = git2go.ObjectType
-	Oid        = git2go.Oid
-	Signature  = git2go.Signature
-	TreeEntry  = git2go.TreeEntry
+	ObjectType = git2go.ObjectType // int
+	Oid        = git2go.Oid        // [20]byte             ; cloned when retrieved
+	Signature  = git2go.Signature  // struct with strings  ; strings are cloned when retrieved
+	TreeEntry  = git2go.TreeEntry  // struct with sting, Oid, ...  ; strings and oids are cloned when retrieved
 )


+// types that we wrap to provide safety.
+
+// Repository provides safe wrapper over git2go.Repository .
 type Repository struct {
 	repo       *git2go.Repository
 	References *ReferenceCollection
 }

+// ReferenceCollection provides safe wrapper over git2go.ReferenceCollection .
 type ReferenceCollection struct {
 	r *Repository
 }

+// Reference provides safe wrapper over git2go.Reference .
 type Reference struct {
 	ref *git2go.Reference
 }

+// Commit provides safe wrapper over git2go.Commit .
 type Commit struct {
 	commit *git2go.Commit
 }

+// Tree provides safe wrapper over git2go.Tree .
 type Tree struct {
 	tree *git2go.Tree
 }

+// Odb provides safe wrapper over git2go.Odb .
 type Odb struct {
 	odb *git2go.Odb
 }

+// OdbObject provides safe wrapper over git2go.OdbObject .
 type OdbObject struct {
 	obj *git2go.OdbObject
 }
@@ -125,43 +167,89 @@ func (o *Odb) Read(oid *Oid) (*OdbObject, error) {
 }


-// wrappers over methods
+// wrappers over safe methods

 func (c *Commit) ParentCount() uint	{ return c.commit.ParentCount() }
 func (o *OdbObject) Type() ObjectType	{ return o.obj.Type() }


+// wrappers over unsafe, or potentially unsafe methods
+
 func (r *Repository) Path() string {
-	return r.repo.Path()
+	path := stringsClone( r.repo.Path() )
+	runtime.KeepAlive(r)
+	return path
 }

 func (r *Repository) DefaultSignature() (*Signature, error) {
-	return r.repo.DefaultSignature()
+	s, err := r.repo.DefaultSignature()
+	if s != nil {
+		s = &Signature{
+			Name:  stringsClone(s.Name),
+			Email: stringsClone(s.Email),
+			When:  s.When,
+		}
+	}
+	runtime.KeepAlive(r)
+	return s, err
 }


 func (c *Commit) Message() string {
-	return c.commit.Message()
+	msg := stringsClone( c.commit.Message() )
+	runtime.KeepAlive(c)
+	return msg
 }

 func (c *Commit) ParentId(n uint) *Oid {
-	return c.commit.ParentId(n)
+	pid := oidClone( c.commit.ParentId(n) )
+	runtime.KeepAlive(c)
+	return pid
 }

 func (t *Tree) EntryByName(filename string) *TreeEntry {
-	return t.tree.EntryByName(filename)
+	e := t.tree.EntryByName(filename)
+	if e != nil {
+		e = &TreeEntry{
+			Name:     stringsClone(e.Name),
+			Id:       oidClone(e.Id),
+			Type:     e.Type,
+			Filemode: e.Filemode,
+		}
+	}
+	runtime.KeepAlive(t)
+	return e
 }


 func (o *Odb) Write(data []byte, otype ObjectType) (*Oid, error) {
-	return o.odb.Write(data, otype)
+	oid, err := o.odb.Write(data, otype)
+	oid = oidClone(oid)
+	runtime.KeepAlive(o)
+	return oid, err
 }


 func (o *OdbObject) Id() *Oid {
-	return o.obj.Id()
+	id := oidClone( o.obj.Id() )
+	runtime.KeepAlive(o)
+	return id
 }

 func (o *OdbObject) Data() []byte {
-	return o.obj.Data()
+	data := bytesClone( o.obj.Data() )
+	runtime.KeepAlive(o)
+	return data
+}
+
+
+// misc
+
+func oidClone(oid *Oid) *Oid {
+	var oid2 Oid
+	if oid == nil {
+		return nil
+	}
+	copy(oid2[:], oid[:])
+	return &oid2
 }
--- a/internal/git/strings_go1.17.go
+++ b/internal/git/strings_go1.17.go
+// Copyright (C) 2025  Nexedi SA and Contributors.
+//                     Kirill Smelkov <kirr@nexedi.com>
+//
+// This program is free software: you can Use, Study, Modify and Redistribute
+// it under the terms of the GNU General Public License version 3, or (at your
+// option) any later version, as published by the Free Software Foundation.
+//
+// You can also Link and Combine this program with other software covered by
+// the terms of any of the Free Software licenses or any of the Open Source
+// Initiative approved licenses and Convey the resulting work. Corresponding
+// source of such a combination shall include the source code for all other
+// software used.
+//
+// This program is distributed WITHOUT ANY WARRANTY; without even the implied
+// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+//
+// See COPYING file for full licensing terms.
+// See https://www.nexedi.com/licensing for rationale and options.
+
+//go:build !go1.18
+// +build !go1.18
+
+package git
+
+
+func stringsClone(s string) string {
+	b := make([]byte, len(s))
+	copy(b, s)
+	return string(b)
+}
--- a/internal/git/strings_go1.18.go
+++ b/internal/git/strings_go1.18.go
+// Copyright (C) 2025  Nexedi SA and Contributors.
+//                     Kirill Smelkov <kirr@nexedi.com>
+//
+// This program is free software: you can Use, Study, Modify and Redistribute
+// it under the terms of the GNU General Public License version 3, or (at your
+// option) any later version, as published by the Free Software Foundation.
+//
+// You can also Link and Combine this program with other software covered by
+// the terms of any of the Free Software licenses or any of the Open Source
+// Initiative approved licenses and Convey the resulting work. Corresponding
+// source of such a combination shall include the source code for all other
+// software used.
+//
+// This program is distributed WITHOUT ANY WARRANTY; without even the implied
+// warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+//
+// See COPYING file for full licensing terms.
+// See https://www.nexedi.com/licensing for rationale and options.
+
+//go:build go1.18
+// +build go1.18
+
+package git
+
+import (
+	"strings"
+)
+
+
+func stringsClone(s string) string {
+	return strings.Clone(s)
+}