Commit 8ca3f7a7 authored by Ido Schimmel's avatar Ido Schimmel Committed by Jakub Kicinski

mlxsw: spectrum_acl_tcam: Fix memory leak during rehash

The rehash delayed work migrates filters from one region to another.
This is done by iterating over all chunks (all the filters with the same
priority) in the region and in each chunk iterating over all the
filters.

If the migration fails, the code tries to migrate the filters back to
the old region. However, the rollback itself can also fail in which case
another migration will be erroneously performed. Besides the fact that
this ping pong is not a very good idea, it also creates a problem.

Each virtual chunk references two chunks: The currently used one
('vchunk->chunk') and a backup ('vchunk->chunk2'). During migration the
first holds the chunk we want to migrate filters to and the second holds
the chunk we are migrating filters from.

The code currently assumes - but does not verify - that the backup chunk
does not exist (NULL) if the currently used chunk does not reference the
target region. This assumption breaks when we are trying to rollback a
rollback, resulting in the backup chunk being overwritten and leaked
[1].

Fix by not rolling back a failed rollback and add a warning to avoid
future cases.

[1]
WARNING: CPU: 5 PID: 1063 at lib/parman.c:291 parman_destroy+0x17/0x20
Modules linked in:
CPU: 5 PID: 1063 Comm: kworker/5:11 Tainted: G        W          6.9.0-rc2-custom-00784-gc6a05c468a0b #14
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
RIP: 0010:parman_destroy+0x17/0x20
[...]
Call Trace:
 <TASK>
 mlxsw_sp_acl_atcam_region_fini+0x19/0x60
 mlxsw_sp_acl_tcam_region_destroy+0x49/0xf0
 mlxsw_sp_acl_tcam_vregion_rehash_work+0x1f1/0x470
 process_one_work+0x151/0x370
 worker_thread+0x2cb/0x3e0
 kthread+0xd0/0x100
 ret_from_fork+0x34/0x50
 ret_from_fork_asm+0x1a/0x30
 </TASK>

Fixes: 84350051 ("mlxsw: spectrum_acl: Do rollback as another call to mlxsw_sp_acl_tcam_vchunk_migrate_all()")
Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
Tested-by: default avatarAlexander Zubkov <green@qrator.net>
Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
Reviewed-by: default avatarSimon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/d5edd4f4503934186ae5cfe268503b16345b4e0f.1713797103.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parent 5bcf9255
...@@ -1200,6 +1200,8 @@ mlxsw_sp_acl_tcam_vchunk_migrate_start(struct mlxsw_sp *mlxsw_sp, ...@@ -1200,6 +1200,8 @@ mlxsw_sp_acl_tcam_vchunk_migrate_start(struct mlxsw_sp *mlxsw_sp,
{ {
struct mlxsw_sp_acl_tcam_chunk *new_chunk; struct mlxsw_sp_acl_tcam_chunk *new_chunk;
WARN_ON(vchunk->chunk2);
new_chunk = mlxsw_sp_acl_tcam_chunk_create(mlxsw_sp, vchunk, region); new_chunk = mlxsw_sp_acl_tcam_chunk_create(mlxsw_sp, vchunk, region);
if (IS_ERR(new_chunk)) if (IS_ERR(new_chunk))
return PTR_ERR(new_chunk); return PTR_ERR(new_chunk);
...@@ -1334,6 +1336,8 @@ mlxsw_sp_acl_tcam_vregion_migrate(struct mlxsw_sp *mlxsw_sp, ...@@ -1334,6 +1336,8 @@ mlxsw_sp_acl_tcam_vregion_migrate(struct mlxsw_sp *mlxsw_sp,
err = mlxsw_sp_acl_tcam_vchunk_migrate_all(mlxsw_sp, vregion, err = mlxsw_sp_acl_tcam_vchunk_migrate_all(mlxsw_sp, vregion,
ctx, credits); ctx, credits);
if (err) { if (err) {
if (ctx->this_is_rollback)
return err;
/* In case migration was not successful, we need to swap /* In case migration was not successful, we need to swap
* so the original region pointer is assigned again * so the original region pointer is assigned again
* to vregion->region. * to vregion->region.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment