Commit 03806d21 authored by Vincent Pelletier's avatar Vincent Pelletier

Add incpozo, the incremental repozo restorer

parent f28417ff
...@@ -12,3 +12,4 @@ __ https://github.com/zopefoundation/ZODB/pull/128#issuecomment-260970932 ...@@ -12,3 +12,4 @@ __ https://github.com/zopefoundation/ZODB/pull/128#issuecomment-260970932
- `zodb cmp` - compare content of two ZODB databases bit-to-bit. - `zodb cmp` - compare content of two ZODB databases bit-to-bit.
- `zodb dump` - dump content of a ZODB database. - `zodb dump` - dump content of a ZODB database.
- `zodb info` - print general information about a ZODB database. - `zodb info` - print general information about a ZODB database.
- `incpozo` - incrementally restore a ZODB database.
...@@ -21,7 +21,7 @@ setup( ...@@ -21,7 +21,7 @@ setup(
packages = find_packages(), packages = find_packages(),
install_requires = ['ZODB', 'zodburi', 'six'], install_requires = ['ZODB', 'zodburi', 'six'],
entry_points= {'console_scripts': ['zodb = zodbtools.zodb:main']}, entry_points= {'console_scripts': ['zodb = zodbtools.zodb:main', 'incpozo = zodbtools.incpozo:main']},
classifiers = [_.strip() for _ in """\ classifiers = [_.strip() for _ in """\
Development Status :: 3 - Alpha Development Status :: 3 - Alpha
......
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Copyright (C) 2002-2017 Zope Foundation + Nexedi + Contributors
# See LICENSE-ZPL.txt for full licensing terms.
# Based on ZODB's repozo do_recover and main functions.
"""
Incremental repozo restore.
Locates the first incremental to start restoring based on output file size.
Checks the previous chunk to detect mismatched backup & destinations.
Restores increments from that point on, following repozo arguments.
"""
import os
import shutil
import sys
from ZODB.scripts.repozo import NoFiles, checksum, find_files, parseargs, log, concat, RECOVER
def do_inc_recover(options):
repofiles = find_files(options)
if not repofiles:
if options.date:
raise NoFiles('No files in repository before %s', options.date)
else:
raise NoFiles('No files in repository')
datfile = os.path.splitext(repofiles[0])[0] + '.dat'
log('Recovering file to %s', options.output)
with open(datfile) as fp, open(options.output, 'r+b') as outfp:
outfp.seek(0, 2)
initial_length = outfp.tell()
previous_chunk = None
for line in fp:
fn, startpos, endpos, _ = chunk = line.split()
startpos = int(startpos)
endpos = int(endpos)
if endpos > initial_length:
break
previous_chunk = chunk
else:
# XXX: log + return so exit status is zero ?
raise NoFiles('Target file is longer than or as large at latest backup, doing nothing')
if previous_chunk is None:
# XXX: trigger a normal restore ?
raise NoFiles('Target file shorter than full backup, doing nothing')
check_start = int(previous_chunk[1])
check_end = int(previous_chunk[2])
outfp.seek(check_start, 0)
if previous_chunk[3] != checksum(outfp, check_end - check_start):
raise NoFiles('Last whole common chunk checksum did not match with backup, doing nothing')
assert outfp.tell() == startpos, (outfp.tell(), startpos)
if startpos < initial_length:
log('Truncating target file %i bytes before its end', initial_length - startpos)
filename = os.path.join(options.repository,
os.path.basename(fn))
first_file_to_restore = repofiles.index(filename)
assert first_file_to_restore > 0, (first_file_to_restore, options.repository, fn, filename, repofiles)
reposz, reposum = concat(repofiles[first_file_to_restore:], outfp)
log('Recovered %s bytes, md5: %s', reposz, reposum)
if options.output is not None:
last_base = os.path.splitext(repofiles[-1])[0]
source_index = '%s.index' % last_base
target_index = '%s.index' % options.output
if os.path.exists(source_index):
log('Restoring index file %s to %s', source_index, target_index)
shutil.copyfile(source_index, target_index)
else:
log('No index file to restore: %s', source_index)
def main(argv=None):
if argv is None:
argv = sys.argv[1:]
options = parseargs(argv)
assert options.mode == RECOVER, 'This tool only supports "recover" (-R|--recover) mode'
assert options.output is not None, 'This tool cannot recover to stdout'
try:
do_inc_recover(options)
except NoFiles as e:
sys.exit(str(e))
if __name__ == '__main__':
main()
  • @kirr: What do you think of this change ? Should I try to push upstream (meaning ZODB's repozo) ? If so, do you have ideas abou the XXX left (or about this code in general) ?

    I'm hesitant to include it in "zodb" main command, but mainly because it allows copying just one file to the server I need to run it on, and getting a working executable (modulo a working python interpreter with ZODB egg), instead of having to install it as an egg. So it's not a strong opposition anyway (more like a side effect of how I developped and bootstraped for first use).

    Edited by Vincent Pelletier
  • @vpelletier thanks for heads up on a useful change.

    Since this is an improvement to repozo, I would suggest to try to push it to ZODB upstream first (with tests) - as imho this way it would be the most useful for everyone. Repozo however seems to be long without maintainer and this way if upstreaming won't work we can keep the tool under zodbtools umbrella. However in such case I suggest we don't add just incpozo but fork full repozo into zodbtools and add our patches on top - the same way we did for analyze (see nexedi/zodbtools!1 (closed), ab17cf2d, 1e506a81).

    If we go zodbtools way my preference would be not to create another top level command but to have only zodb and keep everything under this toolbox as subcommands. By the way for tools that are working not with general ZODB databases (like e.g. zodb dump and zodb cmp do) I would prefer we create a subcommand covering particular topic - e.g. zodb fs1 repozo. We already have zodbtools egg installed as part of ERP5 SR so from deployment point of view it should not be a problem to depend on zodbtools being available:

    https://lab.nexedi.com/nexedi/slapos/blob/04c27ca1/software/neoppod/software-common.cfg#L51

    About code itself, and particular about places marked as XXX I suggest by default recover --incremental to treat situations that catch inconsistencies as error. However with e.g. --increment-full-fallback upon hitting an error it should fallback to recovering file from scratch.

    By the way on recovering I suggest repozo to automatically check checksums of backup parts. I quickly grasped repozo code and while we have them in .dat files they are not checked. The files are read while recovering anyway so it should be cheap to check.

    /cc @kazuhiko, @jm

  • BTW, I don't agree with Jim on https://github.com/zopefoundation/ZODB/pull/128

    If such tool is maintained by ZODB developers, and I think it should be the case, then the ZODB repository is the right place. I see no reason to split the repository. It's important for ZODB and FileStorage to come with maintenance tools (similarly, who'd use a FS without a fsck tool?). And the lack of tests is irrelevant (splitting the repository won't create them magically, or help creating them).

    Scripts that have no place in ZODB.git would be those that are big and rarely used.

  • @jm that was my thinking when I filed above-mentioned pull request. Jim's rejections

    https://github.com/zopefoundation/ZODB/pull/128#issuecomment-260970932

    was the only reason to start zodbtools.

  • mentioned in merge request nexedi/zodbtools!4

    Toggle commit list
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment