Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
N neoppod
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 1
    • Issues 1
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • nexedi
  • neoppod
  • Merge requests
  • !15

You need to sign in or sign up before continuing.
Closed
Created Aug 25, 2020 by Julien Muchembled@jmOwner
  • Report abuse
Report abuse

WIP: Scalable implementation to delete data of dropped partitions

  • Overview 1
  • Commits 2
  • Changes 9

This was implemented long time ago and at that time, it was inefficient for the same reason as f4dd4bab was committed. Today, now that the 'data' table is mostly split by partitions and records are somewhat ordered tid within each "partition", it should be purge 'data' mostly sequentially.

This new code is scalable in that it can handle databases of arbitrary size. But maybe not fast enough because its processing is interleaved with normal requests (mainly from client nodes), and it may slow down too much. Ideally, on backends that can do several operations to the underlying storage in parallel (like MySQL, but not SQLite), dropping partitions using a secondary connection should have much less impact on performance. With the current design, the only thing I can do with a secondary connection is read-only analysis whose result can be quickly verified within the main thread, something that does not apply here. Doing more with a secondary connection would require to protect against a lot of race conditions.

I expect that the background task that actually deletes data would be extended to process other kinds of backend housekeeping. For example, pack is being reimplemented and all these tasks should not be processed in parallel. IOW, in neo/storage/database, app.newTask would only be used in a single place. So I do any further work about pack on top of these commits.

Assignee
Assign to
Reviewer
Request review from
None
Milestone
None
Assign milestone
Time tracking
Source branch: drop
GitLab Nexedi Edition | About GitLab | About Nexedi | 沪ICP备2021021310号-2 | 沪ICP备2021021310号-7