Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
W
wendelin.core
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Joshua
wendelin.core
Commits
8a417197
Commit
8a417197
authored
Dec 06, 2019
by
Kirill Smelkov
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
.
parent
c4c25753
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
37 additions
and
34 deletions
+37
-34
bigfile/file_zodb.py
bigfile/file_zodb.py
+37
-34
No files found.
bigfile/file_zodb.py
View file @
8a417197
...
...
@@ -78,22 +78,26 @@ Data format
Due to weakness of current ZODB storage servers, wendelin.core cannot provide
at the same time both fast reads and small database size growth on small data
changes. "Small" here means something like 1-10000 bytes as larger changes
become comparable to 2M block size and are handled efficiently out of the box.
Until the problem is fixed on ZODB server side, users have to explicitly
indicate via environment variable that their workload is "small changes" if
they prefer to prioritize database size over access speed::
changes. "Small" here means something like 1-10000 bytes per transaction as
larger changes become comparable to 2M block size and are handled efficiently
out of the box. Until the problem is fixed on ZODB server side, wendelin.core
provides on-client workaround in the form of specialized block format, and
users have to explicitly indicate via environment variable that their workload
is "small changes" if they prefer to prioritize database size over access
speed::
$WENDELIN_CORE_ZBLK_FMT
ZBlk0 fast reads (default)
ZBlk1 small changes
Description of block formats follow:
To represent BigFile as ZODB objects, each file block is represented separately
either as
1) one ZODB object, or (ZBlk0)
2) group of ZODB objects (ZBlk1)
2) group of ZODB objects (ZBlk1)
XXX wcfs loads in parallel
with top-level BTree directory #blk -> objects representing block.
...
...
@@ -118,36 +122,35 @@ On the other hand, if object management is moved to DB *server* side, it is
possible to deduplicate them there and this way have low-overhead for both
access-time and DB size with just client storing 1 object per file block. This
will be our future approach after we teach NEO about object deduplication.
~~~~
As file pages are changed in RAM with changes being managed by virtmem
subsystem, we need to propagate the changes to ZODB objects back at some time.
Two approaches exist:
1) on every RAM page dirty, in a callback invoked by virtmem, mark
corresponding ZODB object as dirty, and at commit time, in
obj.__getstate__ retrieve memory content.
2) hook into commit process, and before committing, synchronize RAM page
state to ZODB objects state, propagating all dirtied pages to ZODB objects
and then do the commit process as usual.
"1" is more natural to how ZODB works, but requires tight integration between
virtmem subsystem and ZODB (to be able to receive callback on a page dirtying).
"2" is less natural to how ZODB works, but requires less-tight integration
between virtmem subsystem and ZODB, and virtmem->ZODB propagation happens only
at commit time.
Since, for performance reasons, virtmem subsystem is going away and BigFiles
will be represented by real FUSE-based filesystem with virtual memory being
done by kernel, where we cannot get callback on a page-dirtying, it is more
natural to also use "2" here.
"""
# FIXME ^^^ doc is horrible - add top-level up->down overview.
# file_zodb organization
#
# As file pages are changed in RAM with changes being managed by virtmem
# subsystem, we need to propagate the changes to ZODB objects back at some time.
#
# Two approaches exist:
#
# 1) on every RAM page dirty, in a callback invoked by virtmem, mark
# corresponding ZODB object as dirty, and at commit time, in
# obj.__getstate__ retrieve memory content.
#
# 2) hook into commit process, and before committing, synchronize RAM page
# state to ZODB objects state, propagating all dirtied pages to ZODB objects
# and then do the commit process as usual.
#
# "1" is more natural to how ZODB works, but requires tight integration between
# virtmem subsystem and ZODB (to be able to receive callback on a page dirtying).
#
# "2" is less natural to how ZODB works, but requires less-tight integration
# between virtmem subsystem and ZODB, and virtmem->ZODB propagation happens only
# at commit time.
#
# Since, for performance reasons, virtmem subsystem is going away and BigFiles
# will be represented by real FUSE-based filesystem with virtual memory being
# done by kernel, where we cannot get callback on a page-dirtying, it is more
# natural to also use "2" here.
from
wendelin.bigfile
import
BigFile
,
WRITEOUT_STORE
,
WRITEOUT_MARKSTORED
from
wendelin
import
wcfs
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment