• Yorick Peterse's avatar
    Fix MR commits with missing committers/authors · 9b553e50
    Yorick Peterse authored
    In MR https://gitlab.com/gitlab-org/gitlab/-/merge_requests/63669 we
    introduced a new data format for storing merge request diff commit
    authors and committers. As part of this work we made changes to the
    import/export code to support this new format, and added a set of
    migrations to migrate existing data to this new format. At this time we
    supported reading and writing of data in both the old and new format,
    allowing us to gradually migrate data over to the new format.
    
    In https://gitlab.com/gitlab-org/gitlab/-/merge_requests/72219 we
    ensured all migrations are done, stopped using the old data format, and
    removed the columns storing this data.
    
    Unfortunately, this chain of events uncovered a bug in our import/export
    logic. Consider the following timeline of events:
    
    1. You export project "Cooking Recipes" from a GitLab instance running a
       version earlier than 14.1 (e.g. 14.0).
    2. The instance you intend to import this project into is running 14.1
       or newer. Existing data has been fully migrated already.
    3. You import the project into this new instance.
    
    At this point, the imported data is using the old format, not the
    format. This is because we forgot to take into account users importing
    exports using GitLab 14.0 or older, instead only covering exports
    generated using GitLab 14.1 or newer. Because the background migrations
    finished, or the data imported would fall in a "bucket" (= a chunk or
    rows to migrate) that had already been migrated, the data would never be
    updated to the new format.
    
    In this commit we resolve this problem in two steps. First, we change
    the import/export logic to support importing data in both the old and
    new format. Exports still use the new format. In addition, we include a
    background migration that processes all projects created using a GitLab
    import/export since the first mentioned merge request was introduced.
    For each such project we scan over the merge request diff commits and
    fix any that are missing the commit author or committer details.
    
    For small self-hosted instances this process is unlikely to take more
    than a few minutes. On GitLab.com however we expect this process to take
    a few days, as we have to process around 200 000 projects imported since
    July. This means we'll likely need additional manual intervention
    similar to the manual work needed for
    https://gitlab.com/gitlab-org/gitlab/-/issues/334394.
    
    See https://gitlab.com/gitlab-org/gitlab/-/issues/344080 for additional
    details.
    
    Changelog: fixed
    9b553e50
20211028155449 64 Bytes