-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checksum mismatch in db migration #1517
Comments
Coming from a running v0.34.0 and changing to v0.35.0 results in:
flyway_fixup_history:
Deleting the 1.33.0 entry results in:
flyway_fixup_history:
For reference, latest five entries in flyway_schema_history:
Goging back to v.0.34.0 results in:
Deleting the 1.33.0 entry from flyway_fixup_history results in a new entry after startup, now without any errors and Docspell works fien as before with v0.34.0. flyway_fixup_history:
|
Thank you very much for these good details! I'll take a deeper look , currently I admit I'm quite puzzled. It seems that the fix for the I found one issue flyway/flyway#3431 that seems to describe this problem. |
This also happens with an "empty" database i used goging from postgres 13 to 14. If needed i can provide a dump of it alongside my docker-compose configuration. |
Thanks for all your efforts! For me this is far from urgent, as i was able to go back to v0.34.0 without issues. So I'm fine for now. |
Thanks @enkol ! I'll come back if I need more data, thanks for the proposal. If that happens even to an empty database, there is something very spooky going on :)! One more thing: How/which system are you running this on? Is it the docker arm images? |
I'm running with Docker in an Ubuntu 20.04 VM on a Intel i3 box with Proxmox 7. Postgres is also with Docker |
Just to clarify, i meant 'empty' in terms of no documents uploaded. The database was used to test setting up Docspell with Postgres 14 in Februar. I can reproduce the checksum issues with this db switching between 0.34.0 and 0.35.0. |
Same symptoms here. The flyway ticket mentioned shows the problem appears after moving from JDK 11 to JDK 17... did you change the JRE in the containers? If yes, then reverting to the old version might be a temproary fix. My wild guess would be some strange encoding behaviour when reading text files (like changing the line endings)... |
I didn't change anything in that regard. Could be related to a new flyway version in combination with the alpine jdk. But have to reproduce this first. I don't use docker, but I tested the docker image and that worked here 🤷🏼 |
So, unfortunately I can't reproduce it :( I even installed an Ubuntu 20.04 VM (via virtualbox) to run the docker containers (using From what I can see, flyway uses a CRC32 checksum and I don't see why this would change when upgrading java or anything else in the environment. It should only depend on its input, no? Their code didn't change as well. The docker images didn't change, besides from upgraded packages due to new image creation. Not sure how to proceed :/ Maybe we can try to find common characteristics of the systems where this happens? Are you all on a VM, for example? or all using proxmox? Maybe a different openjdk is also worth a try, there could be a problem with openjdk11 and alpine… (but not for me :)). But since it happens for you only with 0.35.0 a change in the image that is triggered in your environment seems most likely to me for now. Flyway just had another patch release, it could also be worth a try - you could run the nightly docker images that have this version. |
Thanks for having a deeper look into this and spending so much effort to reproduce it yourself! |
Another option is what @Skyr said: charset/encoding changes. This would affect the input to the checksum algo, because flyway converts the bytes to characters and then converts characters back to bytes (line by line) using UTF-8. That means if the encoding to read the migration somehow changed, it could produce a different checksum. From what I saw, flyway is by default using UTF-8 (unless it is set explicitely). You could try to change the default encoding for java using the system property
The migrations are in ascii, it doesn't matter so much (except for some esoteric and non-8-bit charsets) how they are read. But worth a try. |
Nightly build produces the same error, i'll try the encoding thingy next.
|
I've prepared a minimal docker-compose to reproduce the issue: |
It is so strange! On all my machines, the checksum for the fixup migration is Thank you for the example. I tried it as described - but it didn't produce the error for me. What I can do is providing a "repair" option in the config file. This must be used with care, ofc. But it would allow to "fix" all checksums automatically. I would still love to know the cause of the problem. Edit: I also tried out of curiosity to calculate checksums for every available charset on my system. There were a few that produced a different checksum, but none produced 1776159438. |
Indeed, this is very strange. For me it is reproduceable with the given example; as it isn't for you, it must be something docker host related on my side. I didn't change anything besides of the ordinary Ubuntu system security updates in the VM.
Yes, if it failed with 0.35.0 and i delete the fixup_history entry it runs without issues with 0.34.0 again (creates new fixup_history entry). Deleting fixup_history doesn't fix 0.35.0, only works for 0.34.0. |
Yes, and there must be also something in the 0.35.0 package, since it manages to create a different checksum on your host. The only idea I have is that flyway or its dependencies (including the jvm) reads in something from the environment which it didn't before. So on your system, 0.34.0 creates
It includes the java version for example, but not env variables. These could be obtained by running |
Java is not installed on my Docker host. The Docspell-Api system info outputs: {
"id": "rest1",
"pidHost": "1@5b601d9e1e0e",
"ncpu": 2,
"inputArgs": [
"-Dconfig.file=/opt/docspell-restserver-0.34.0/bin/../conf/docspell-server.conf",
"-XX:+UseG1GC"
],
"libraryPath": "/usr/lib/jvm/java-11-openjdk/lib/server:/usr/lib/jvm/java-11-openjdk/lib:/usr/lib/jvm/java-11-openjdk/../lib:/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib",
"specVendor": "Oracle Corporation",
"specVersion": "11",
"startTime": 1650820718629,
"uptime": 667327,
"vmName": "OpenJDK 64-Bit Server VM",
"vmVendor": "Alpine",
"vmVersion": "11.0.14+9-alpine-r0",
"heapUsage": {
"init": 98566144,
"used": 26075648,
"comitted": 142606336,
"max": 1549795328,
"free": 1523719680,
"description": "init=94.00M, used=24.87M, comitted=136.00M, max=1.44G, free=1.42G"
},
"props": {
"java.io.tmpdir": "/tmp",
"line.separator": "\n",
"path.separator": ":",
"user.home": "/root",
"com.zaxxer.hikari.pool_number": "1",
"sun.os.patch.level": "unknown",
"user.country": "US",
"jna.loaded": "true",
"os.name": "Linux",
"sun.management.compiler": "HotSpot 64-Bit Tiered Compilers",
"sun.cpu.endian": "little",
"java.specification.version": "11",
"java.vm.specification.name": "Java Virtual Machine Specification",
"java.vendor": "Alpine",
"java.vm.specification.version": "11",
"sun.arch.data.model": "64",
"sun.boot.library.path": "/usr/lib/jvm/java-11-openjdk/lib",
"user.dir": "/opt",
"java.library.path": "/usr/lib/jvm/java-11-openjdk/lib/server:/usr/lib/jvm/java-11-openjdk/lib:/usr/lib/jvm/java-11-openjdk/../lib:/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib",
"sun.cpu.isalist": "",
"os.arch": "amd64",
"java.vm.version": "11.0.14+9-alpine-r0",
"java.runtime.version": "11.0.14+9-alpine-r0",
"java.vm.info": "mixed mode",
"java.runtime.name": "OpenJDK Runtime Environment",
"java.version.date": "2022-01-18",
"mail.mime.uudecode.ignoreerrors": "true",
"mail.mime.encodefilename": "true",
"file.separator": "/",
"java.class.version": "55.0",
"mail.mime.parameters.strict": "false",
"java.specification.name": "Java Platform API Specification",
"file.encoding": "UTF-8",
"user.timezone": "Europe/Berlin",
"jna.platform.library.path": "/usr/lib:/lib",
"java.specification.vendor": "Oracle Corporation",
"jnidispatch.path": "/root/.cache/JNA/temp/jna10563872938398853065.tmp",
"sun.java.launcher": "SUN_STANDARD",
"java.vm.compressedOopsMode": "32-bit",
"os.version": "5.4.0-109-generic",
"config.file": "/opt/docspell.conf",
"java.vm.specification.vendor": "Oracle Corporation",
"sun.jnu.encoding": "UTF-8",
"user.language": "en",
"mail.mime.splitlongparameters": "false",
"mail.mime.uudecode.ignoremissingbeginend": "true",
"java.vendor.url": "https://alpinelinux.org/",
"java.awt.printerjob": "sun.print.PSPrinterJob",
"java.awt.graphicsenv": "sun.awt.X11GraphicsEnvironment",
"awt.toolkit": "sun.awt.X11.XToolkit",
"java.class.path": "/opt/docspell-restserver-0.34.0/lib/com.github.eikek.docspell-restserver-0.34.0-classpath.jar",
"java.vm.vendor": "Alpine",
"jdk.debug": "release",
"java.vendor.url.bug": "https://gitlab.alpinelinux.org/alpine/aports/issues",
"user.name": "root",
"mail.mime.decodefilename": "true",
"java.vm.name": "OpenJDK 64-Bit Server VM",
"sun.java.command": "docspell.restserver.Main /opt/docspell.conf",
"java.home": "/usr/lib/jvm/java-11-openjdk",
"mail.mime.setcontenttypefilename": "false",
"java.version": "11.0.14",
"sun.io.unicode.encoding": "UnicodeLittle"
}
} Environment in the restserver container: HOSTNAME=5b601d9e1e0e
PWD=/opt
TZ=Europe/Berlin
HOME=/root
TERM=xterm
SHLVL=1
DOCSPELL_HEADER_VALUE=**********
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
_=/usr/bin/env |
I tried to compare checksum generation within For the same file, I get different results:
Tested with: import java.util.*;
import org.flywaydb.core.internal.resource.filesystem.*;
import org.flywaydb.core.internal.resolver.*;
import org.flywaydb.core.internal.resource.*;
import org.flywaydb.core.api.Location;
import java.nio.charset.*;
public class Program {
public static void main( String[] args ) throws Exception{
Location loc = null;
String filepath = args[0];
FileSystemResource r = new FileSystemResource(loc, filepath,Charset.forName("UTF-8"), false);
int cs = ChecksumCalculator.calculate(r);
System.out.println(cs);
}
} I've created the above file as test.java on my docker host, mounted it in the container and changed the containers entrypoint to just run this file to calculate it's checksum:
Running Maybe this can help? Is there anything else to test or compare? |
Both 0.34.0 and 0.35.0 seem to have the same Java version. Printing all Java system properties generates the exact same output: awt.toolkit: sun.awt.X11.XToolkit
java.specification.version: 11
sun.cpu.isalist:
sun.jnu.encoding: UTF-8
java.class.path: /test.java
java.vm.vendor: Alpine
sun.arch.data.model: 64
java.vendor.url: https://alpinelinux.org/
user.timezone:
os.name: Linux
java.vm.specification.version: 11
sun.java.launcher: SUN_STANDARD
user.country: US
sun.boot.library.path: /usr/lib/jvm/java-11-openjdk/lib
sun.java.command: jdk.compiler/com.sun.tools.javac.launcher.Main /test.java
jdk.debug: release
sun.cpu.endian: little
user.home: /root
user.language: en
java.specification.vendor: Oracle Corporation
java.version.date: 2022-01-18
java.home: /usr/lib/jvm/java-11-openjdk
file.separator: /
java.vm.compressedOopsMode: 32-bit
line.separator:
java.specification.name: Java Platform API Specification
java.vm.specification.vendor: Oracle Corporation
java.awt.graphicsenv: sun.awt.X11GraphicsEnvironment
jdk.module.main.class: com.sun.tools.javac.launcher.Main
sun.management.compiler: HotSpot 64-Bit Tiered Compilers
java.runtime.version: 11.0.14+9-alpine-r0
user.name: root
path.separator: :
os.version: 5.4.0-109-generic
java.runtime.name: OpenJDK Runtime Environment
file.encoding: UTF-8
java.vm.name: OpenJDK 64-Bit Server VM
java.vendor.url.bug: https://gitlab.alpinelinux.org/alpine/aports/issues
java.io.tmpdir: /tmp
java.version: 11.0.14
user.dir: /opt
os.arch: amd64
java.vm.specification.name: Java Virtual Machine Specification
java.awt.printerjob: sun.print.PSPrinterJob
sun.os.patch.level: unknown
java.library.path: /usr/lib/jvm/java-11-openjdk/lib/server:/usr/lib/jvm/java-11-openjdk/lib:/usr/lib/jvm/java-11-openjdk/../lib:/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
java.vm.info: mixed mode
java.vendor: Alpine
java.vm.version: 11.0.14+9-alpine-r0
sun.io.unicode.encoding: UnicodeLittle
java.class.version: 55.0 Generated with: import java.util.*;
public class Program {
public static void main( String[] args ) throws Exception{
Properties p = System.getProperties();
Enumeration keys = p.keys();
while (keys.hasMoreElements()) {
String key = (String)keys.nextElement();
String value = (String)p.get(key);
System.out.println(key + ": " + value);
}
}
} |
Found another recent mention about Flyway checksum issues: madler/zlib#618 Comparing Flyway 8.5.4 used in Docspell 0.34.0 with 8.5.8 used in 0.35.0 doesn't give more insights. Maybe the updated AWS S3 dependencies brought in an new zlib version which relates to the above mentioned issue? |
Thank you so much @enkol ! Your test with the java program was great. It's easer to work with an isolated program and It shows clearly that the checksum calculator produces different results for the same input. On my system, it produces the same output. I think the issue you found sounds very much like the cause. I can't imagine anything else than a jvm or native change… which is quite rare in my experience! I didn't know that javas CRC32 class reaches out to a C library - now I know :). It would also explain why it is so hard to reproduce. On my ubuntu vm that I created to reproduce this, there is zlib 1.2.11 installed. |
On my Docker host (Ubuntu VM) zlib is version As mentioned in madler/zlib#618 (comment), it's maybe also cpu dependent, if it happens or not. My system runs on a Intel i3-1005G1, but it could be affected also by Proxmox/Quemu |
Yes, that sounds also interesting to try out. I tried here on AMD Ryzen7, ARM on RPi4 and Intel Atom x5. |
Ok, i can confirm it's definitly zlib related. I've modified the So this issue should be gone, once the base docker apline image contains the fixed zlib version. Checksum testing now succeds: docspell34:
image: docspell/restserver:v0.34.0
entrypoint: java -cp /opt/docspell-restserver-0.34.0/lib/org.flywaydb.flyway-core-8.5.4.jar /test.java /test.java
volumes:
- ./test.java:/test.java
docspell35:
image: docspell/restserver:v0.35.0
entrypoint: java -cp /opt/docspell-restserver-0.35.0/lib/org.flywaydb.flyway-core-8.5.8.jar /test.java /test.java
volumes:
- ./test.java:/test.java
docspell35b:
image: docspell/restserver:v0.35.0
entrypoint: /entry.sh
volumes:
- ./test.java:/test.java
- ./entry.sh:/entry.sh test.java from #1517 (comment) and entry.sh is: #!/bin/bash
echo 'http://dl-cdn.alpinelinux.org/alpine/edge/main' >> /etc/apk/repositories
apk add 'zlib=1.2.12-r1'
java -cp /opt/docspell-restserver-0.35.0/lib/org.flywaydb.flyway-core-8.5.8.jar /test.java /test.java Running This means, for the meantime I can run Docspell docker-compose: service:
docspell:
image: docspell/restserver:v0.35.0
entrypoint: /entry.sh
volumes:
- ./entry.sh:/entry.sh entry.sh: #!/bin/bash
echo 'http://dl-cdn.alpinelinux.org/alpine/edge/main' >> /etc/apk/repositories
apk add 'zlib=1.2.12-r1'
/opt/docspell-restserver/bin/docspell-restserver -J-XX:+UseG1GC /opt/docspell.conf And similar for |
@enkol 🤯 excellent work, Sherlock! |
Amazing @enkol thank you for this work! I think I can add your patch to the current dockerfiles, wdyt? I have to think about the future of the images anyways. The problem is that wkhtmltopdf was removed from alpine, but the replacement didn't show as good results. I'd like to keep wkhtmltopdf for now. So maybe I need to use latest alpine, install wkhtmtopdf from a previous alpine and zlib from edge :) |
I don't know what happens if alpine/edge moves on, if it could then break the fix. I'm also not familiar with alpine and it's package manager apk. The current fix was just a quick try based on some google search. It may have sideeffects or there is a better way to do it. |
Yes, these are valid points. It wouldn't be such a problem I had hoped, because the images are fixed once build and the next time they build it would hopefully fail when the packages are not compatible anymore. But it's a risk and not necessary as you said. The images can be upgraded to 3.15 (when the new zlib arrives) with finding a solution for the wkhtmltopdf problem. |
Alpine 3.14 now also has zlib 1.2.12-r1 - so I think the next image should not have this issue. 🤞🏼 |
Closing it going to make a release soon. If it still persists, we can reopen :) |
With $ apk list zlib
zlib-1.2.12-r0 x86_64 {zlib} (Zlib) [installed]
zlib-1.2.12-r1 x86_64 {zlib} (Zlib) [upgradable from: zlib-1.2.12-r0 |
Edit: Hm, I tried the same test on the latest images. It looks different for me 🤔
|
I had only tested with restserver (smaller download 😺), joex works for me too. $ docker run --rm --entrypoint '' docspell/joex:nightly bash -c 'apk list zlib'
zlib-1.2.12-r1 x86_64 {zlib} (Zlib) [installed]
$ docker run --rm --entrypoint '' docspell/restserver:nightly bash -c 'apk list zlib'
zlib-1.2.12-r0 x86_64 {zlib} (Zlib) [installed]
zlib-1.2.12-r1 x86_64 {zlib} (Zlib) [upgradable from: zlib-1.2.12-r0] |
Indeed, the restserver has a different zlib. What a mess, not sure if I should make a new image. Another way is to not run the migrations from the restserver, only from joex. |
zlib 1.2.12-r0 is not working with openjdk, it affects the checksum calculation of the db migrations. It must be at least 1.2.12-r1. For some reason joex has this newer version, but the restserver image not. They are installed explicitely now on both images. That's why the migration is now disabled on rest-server in the docker-compose file. It is ok if this is run on one server. It can now happen that on first start joex is migrating the db and the restserver tries to do things that don't work yet - it is a corner case. This is removed with the next version. Refs: #1517
So… I decided to take the pragmatic approach. I changed the current docker-compose file so that the db migration is only executed for joex.There the r1 zlib is installed. I'll remove this for the next release. Then I now explicitly added this zlib version to both images. I find it strange that although it is available in the repos, that the old version is present on the restserver image (but not on joex image). Could be some transitive dependency thing. So the nightly images should work fine now - I hope. If someone could check ihis, would be great! |
I can confirm, that it works to run current I stopped joex and restserver containers running with |
Thank you very much @enkol ! I'll close this again. The exceptions on start for joex can be ignored. It is as you said - it can't find the restserver to publish some job-done messages. But these are not important usually. |
This was done to avoid running them on restserver as the image contained a broken zlib package. (#1517)
The explicit install was added earlier due to a broken zlib package (see issue #1517). This has now been fixed for a while in alpine and can be removed.
Spin off from issue #1488 it's better to have a separate issue. It seems to be a longer journey sadly. Context: #1488 (comment) and following comments.
@enkol Can you post me the full logs (from the start)? What is irritating is that 1.34.0 is not part of the fixup changesets. So changing the table
flyway_fixup_history
can only have an effect on the single fixup migration that is included.The text was updated successfully, but these errors were encountered: