-
Stan Hu authored
We were seeing a high number of transient failures in the migration jobs because `with_lock_retries` leaked non-zero, short `lock_timeout` values (e.g. 100 ms) when used inside a Rails `change` method. If the PostgreSQL autovacuum process happened to be running, it would lock the table that it was vacuuming. During the migration rollback, if the DDL operation needed a lock on the table, the short `lock_timeout` would encounter the existing table lock and fail. Even though `SET LOCAL` was used to ensure `lock_timeout` didn't leak outside of the current transaction, the parent transaction would still retain that value. To avoid this issue, we should define separate `up` and `down` methods so that we don't rely on the Rails magic to reverse a migration. This ensures lock retries are used properly in both directions and prevents `lock_timeout` from leaking during a migration rollback. Closes https://gitlab.com/gitlab-org/gitlab/-/issues/207088
60227626