Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zodbcmp - Tool to compare two ZODB databases #128

Closed
wants to merge 1 commit into from

Conversation

navytux
Copy link
Contributor

@navytux navytux commented Nov 16, 2016

This is a tool to compare two ZODB databases in between tidmin..tidmax
transaction range with default range being -∞..+∞ - (whole database).

For comparision both databases are scanned at storage layer and every
transaction content is compared bit-to-bit between the two. The program stops
either at first difference found, or when whole requested transaction range is
scanned with no difference detected.

Database storages are specified in files with ZConfig-based storage definition, e.g.

%import neo.client
<NEOStorage>
    master_nodes    ...
    name            ...
</NEOStorage>

Please see https://lab.nexedi.com/nexedi/neoppod/merge_requests/4 for
one of possible contexts.

The tool is generic though and is not NEO-specific. It should be able to
even check two different storages like ZEO & NEO, or FileStorage and NEO
etc and thus can be handy.

@coveralls
Copy link

coveralls commented Nov 16, 2016

Coverage Status

Coverage decreased (-1.08%) to 73.624% when pulling 4fa5107 on navytux:y/zodbcmp into cc1f922 on zopefoundation:master.

@coveralls
Copy link

coveralls commented Nov 16, 2016

Coverage Status

Coverage decreased (-1.09%) to 73.615% when pulling 7a9d595 on navytux:y/zodbcmp into cc1f922 on zopefoundation:master.

mintid, minstor, maxstor = l[0]
print("not-equal: tid %s present in stor%i but not in stor%i" % (
ashex(mintid), minstor, maxstor))
return 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        txn1 = next(iter1, None)
        txn2 = next(iter2, None)
        if txn1 != txn2:
            if verbose:
                mintid, minstor, maxstor = min(
                    (txn1.tid if txn1 else inf, 1, 2),
                    (txn2.tid if txn2 else inf, 2, 1))
                print(...)
            return 1
        if not txn1:
            if verbose:
                print("equal")
            return 0

MAX_TID (instead of inf) may also be enough.

Copy link
Contributor Author

@navytux navytux Nov 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about MAX_TID too but:

  • there is no MAX_TID in ZODB (it is in NEO) - in ZODB it is utils.maxtid
  • in both ZODB and NEO it is 7fff... instead of ffff... for signedness. Still having 7f vs ff looks there can be a risk of skipping transactions.
  • I wanted code here to be generic, not depending on particular MAX value a tid can be. This can be achieved with inf approach.


# actual txn comparison
tcmp = txncmp(txn1, txn2)
if tcmp:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wonder why you like temp variables so much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes the code more structured and clear to me.

if not tidmax:
tidmax = None

return tidmin, tidmax
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    return (tidmin.decode('hex') or None,
            tidmax.decode('hex') or None)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the or None hint. However since .split("..") and .decode("hex") raise different exceptions, for proper error reporting I decided to wrap both & hex-decode actions and reaise dedicated exception so users will get e.g.

E: invalid tidrange: q..

instead of

Traceback (most recent call last):
  File "./zodbcmp.py", line 205, in <module>
    main()
  File "./zodbcmp.py", line 192, in main
    tidmin, tidmax = parse_tidrange(argv[2])
  File "./zodbcmp.py", line 153, in parse_tidrange
    tidmin = tidmin.decode('hex')
  File ".../lib/python2.7/encodings/hex_codec.py", line 42, in hex_decode
    output = binascii.a2b_hex(input)
TypeError: Odd-length string

See the change in interdiff for next patch revision.

if exc_type is SystemExit:
raise # this was sys.exit() call, not an error
traceback.print_exc()
sys.exit(2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    try:
        main2()
    except SystemExit:
        raise
    except:
        traceback.print_exc()
        sys.exit(2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, this was a thinko on my side, thanks for correction.

This is a tool to compare two ZODB databases in between tidmin..tidmax
transaction range with default range being -∞..+∞ - (whole database).

For comparision both databases are scanned at storage layer and every
transaction content is compared bit-to-bit between the two. The program stops
either at first difference found, or when whole requested transaction range is
scanned with no difference detected.

Database storages are specified in files with ZConfig-based storage definition, e.g.

    %import neo.client
    <NEOStorage>
        master_nodes    ...
        name            ...
    </NEOStorage>

Please see https://lab.nexedi.com/nexedi/neoppod/merge_requests/4 for
one of possible contexts.

The tool is generic though and is not NEO-specific. It should be able to
even check two different storages like ZEO & NEO, or FileStorage and NEO
etc and thus can be handy.
@navytux
Copy link
Contributor Author

navytux commented Nov 16, 2016

@jmuchemb thanks for feedback. I've adjusted the patch. The interdiff is below:

diff --git a/src/ZODB/scripts/zodbcmp.py b/src/ZODB/scripts/zodbcmp.py
index f70739e73..bc16a6e24 100755
--- a/src/ZODB/scripts/zodbcmp.py
+++ b/src/ZODB/scripts/zodbcmp.py
@@ -147,19 +147,24 @@ def usage(out):
 """, file=out)

 # tidmin..tidmax -> (tidmin, tidmax)
+class TidRangeInvalid(Exception):
+    pass
+
 def parse_tidrange(tidrange):
-    tidmin, tidmax = tidrange.split("..")
-    tidmin = tidmin.decode('hex')
-    tidmax = tidmax.decode('hex')
+    try:
+        tidmin, tidmax = tidrange.split("..")
+    except ValueError:  # not exactly 2 parts in between ".."
+        raise TidRangeInvalid(tidrange)
+
+    try:
+        tidmin = tidmin.decode("hex")
+        tidmax = tidmax.decode("hex")
+    except TypeError:   # hex decoding error
+        raise TidRangeInvalid(tidrange)

     # empty tid means -inf / +inf respectively
     # ( which is None in IStorage.iterator() )
-    if not tidmin:
-        tidmin = None
-    if not tidmax:
-        tidmax = None
-
-    return tidmin, tidmax
+    return (tidmin or None, tidmax or None)

 def main2():
     verbose = False
@@ -189,8 +194,8 @@ def main2():
     if len(argv) > 2:
         try:
             tidmin, tidmax = parse_tidrange(argv[2])
-        except ValueError:
-            usage(sys.stderr)
+        except TidRangeInvalid as e:
+            print("E: invalid tidrange: %s" % e, file=sys.stderr)
             sys.exit(2)

     stor1 = ZODB.config.storageFromFile(open(storconf1, 'r'))
@@ -202,10 +207,9 @@ def main2():
 def main():
     try:
         main2()
+    except SystemExit:
+        raise   # this was sys.exit() call, not an error
     except:
-        exc_type, _, _ = sys.exc_info()
-        if exc_type is SystemExit:
-            raise   # this was sys.exit() call, not an error
         traceback.print_exc()
         sys.exit(2)

@coveralls
Copy link

coveralls commented Nov 16, 2016

Coverage Status

Coverage decreased (-1.1%) to 73.59% when pulling ffa0352 on navytux:y/zodbcmp into cc1f922 on zopefoundation:master.

@jimfulton
Copy link
Member

This sounds like a useful tool, but I'd rather it not be included in ZODB. This has nothing to do with the tool, which I haven't studied :), but with the weight of ZODB maintenance. This weight has been historically been onerous in the case of utility scripts, which like this one, have lacked tests and tended to rot.

If it were up to me, the scripts directory would go away.

FWIW, I hope that soon there will be some sort of namespace package for the ZODB ecosystem. This could be a tool to release separately within that namespace. (Although I would prefer it have tests.)

@jimfulton jimfulton closed this Nov 16, 2016
@navytux
Copy link
Contributor Author

navytux commented Nov 16, 2016

@jimfulton thanks for feedback. It could eventually have tests like checking database against itself is equal and aginst another db a bit modified - as not equal. It's a pity you do not want core (imho) ZODB-related things to be in ZODB repository itself. Just for the reference scripts/fstest.py and scripts/repozo.py have tests and zodbcmp could have too.

If it were up to me, the scripts directory would go away.

but because of this I assume retrying the pull-request with tests for zodbcmp which are run on every python setup.py test is not the way to go. Or am I maybe wrong?

@jimfulton
Copy link
Member

At a high level, it would be better to release this separately. At the same time, we should create a page on zodb.org (via a ZODB/docs pr :)) that makes it easy for people to find tools like this.

@navytux
Copy link
Contributor Author

navytux commented Nov 16, 2016

ok I'm moving this to https://lab.nexedi.com/kirr/zodbtools . I hesitate to change ZODB/docs about the tool for now.

Thanks once again for feedback,
Kirill

@navytux navytux deleted the y/zodbcmp branch November 16, 2016 17:24
NexediGitlab pushed a commit to Nexedi/zodbtools that referenced this pull request Nov 17, 2016
Project to be a place for various ZODB utilitieis. Jim Fulton suggested
to have it separate from ZODB/scripts/ [1] and now we have it here.

[1] zopefoundation/ZODB#128 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants