-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Importing fails if a WARC file misses some records. #12
Comments
The proper way to do this is ship the packages themselves with a predefined SQL schema, which would require a new version every time the schema changes. This is fine, but the WARC --> relational transformation as-is is just. a personal preference that I'm not convinced that it's' good enough to nail it. Then again we don't want to have a failed import every time a non-complete warc record is supplied. |
I wonder if a stub warcinfo record could be generated if one isn't found when encountering the first record? But I see your point that we probably need to define a schema first? Perhaps we could use the wget schema as canonical? It looks like sqlite-utils lets you create tables without inserting: https://sqlite-utils.datasette.io/en/stable/python-api.html#explicitly-creating-a-table Would it be weird to require users to do a It looks like it's pretty new but I wonder if @simonw's https://github.com/simonw/sqlite-migrate could be useful here? |
This commit adds a new `init` command which will initialize a SQLite database with the schema. Initializing the database is required prior to running `import`. The database schema is managed with sqlite-migrate which is kind of new, but seems to work well. So the process for working with warcdb is to: ```bash $ warcdb init warc.db $ warcdb import warc.db google.warc.gz ``` Then if you you want to update warcdb and apply migrations you can: ``` $ pip install --upgrade warcdb $ warcdb migrate warc.db ``` Closes Florents-Tselai#12
This commit adds a new `init` command which will initialize a SQLite database with the schema. Initializing the database is required prior to running `import`. The database schema is managed with sqlite-migrate which is kind of new, but seems to work well. So the process for working with warcdb is to: ```bash $ warcdb init warc.db $ warcdb import warc.db google.warc.gz ``` Then if you you want to update warcdb and apply migrations you can: ``` $ pip install --upgrade warcdb $ warcdb migrate warc.db ``` Closes Florents-Tselai#12
This commit adds a new `init` command which will initialize a SQLite database with the schema. Initializing the database is required prior to running `import`. The database schema is managed with sqlite-migrate which is kind of new, but seems to work well. So the process for working with warcdb is to: ```bash $ warcdb init warc.db $ warcdb import warc.db google.warc.gz ``` Then if you you want to update warcdb and apply migrations you can: ``` $ pip install --upgrade warcdb $ warcdb migrate warc.db ``` Closes Florents-Tselai#12
This commit adds a new `init` command which will initialize a SQLite database with the schema. Initializing the database is required prior to running `import`. The database schema is managed with sqlite-migrate which is kind of new, but seems to work well. So the process for working with warcdb is to: ```bash $ warcdb init warc.db $ warcdb import warc.db google.warc.gz ``` Then if you you want to update warcdb and apply migrations you can: ``` $ pip install --upgrade warcdb $ warcdb migrate warc.db ``` Closes Florents-Tselai#12
This commit adds a new `init` command which will initialize a SQLite database with the schema. Initializing the database is required prior to running `import`. The database schema is managed with sqlite-migrate which is kind of new, but seems to work well. So the process for working with warcdb is to: ```bash $ warcdb init warc.db $ warcdb import warc.db google.warc.gz ``` Then if you you want to update warcdb and apply migrations you can: ``` $ pip install --upgrade warcdb $ warcdb migrate warc.db ``` Closes Florents-Tselai#12
This commit adds a new `init` command which will initialize a SQLite database with the schema. Initializing the database is required prior to running `import`. The database schema is managed with sqlite-migrate which is kind of new, but seems to work well. So the process for working with warcdb is to: ```bash $ warcdb init warc.db $ warcdb import warc.db google.warc.gz ``` Then if you you want to update warcdb and apply migrations you can: ``` $ pip install --upgrade warcdb $ warcdb migrate warc.db ``` Closes Florents-Tselai#12
Assuming a newly created
archive.db
warcdb import archive.db ./tests/apod.warc.gz
fails with a
sqlite_utils.db.AlterError: No such column: warcinfo.WARC-Record-ID
If however, one does it like this, it works fine
That is because google.warc is a "complete" - ideal warc file and the db schema is appropriately created.
The text was updated successfully, but these errors were encountered: