Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
W wendelin
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • nexedi
  • wendelin
  • Merge requests
  • !160

Open
Created Feb 12, 2024 by Levin Zimmermann@levin.zimmermannMaintainer
  • Report abuse
Report abuse

WIP: Add garbage collection for ZBigArray

  • Overview 42
  • Commits 3
  • Changes 6

Hello,

these patches aim to support manual garbage collection of ZBigArrays, so that they can be packed. The idea is to support GC of ZBigArray before a more general implementation within NEO exists.

I tested this approach with NEO and FileStorage (with deactivated GC). With FileStorage I could see that the size of Data.fs shrank after calling .pack(). With NEO I could see that the row count of data shrank after calling .pack().

What still seems non-ideal in this patch to me is how the history of a given object is retrieved. The .history() function may be fine, but with NEO we can't be sure that indeed all revisions of an object are removed. I see two approaches:

  1. We support float('inf') as a size argument in NEO. The mysql query could skip the second LIMIT argument in case size == float('inf'). Maybe we would need a mechanism to avoid too big packages (or it already exists).
  2. We accept the current approximation as sufficient, at least for refreshed Data Array documents there shouldn't be that many transactions.

Any other ideas?

The test is a sketch. It succeeds, but coding style test fails (useless .assertTrue).

If we can agree on the general direction of the underlying mechanism, I'll add patches that apply the ZBigArray GC when the refresh workflow is called.

Best, Levin

/cc @jm @klaus

Assignee
Assign to
Reviewer
Request review from
None
Milestone
None
Assign milestone
Time tracking
Source branch: gc-zbigarray
GitLab Nexedi Edition | About GitLab | About Nexedi | 沪ICP备2021021310号-2 | 沪ICP备2021021310号-7