Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a unique DvObjects/Storageidentifier constraint to legacy databases #7451

Closed
landreev opened this issue Dec 3, 2020 · 0 comments · Fixed by #7741
Closed

Add a unique DvObjects/Storageidentifier constraint to legacy databases #7451

landreev opened this issue Dec 3, 2020 · 0 comments · Fixed by #7741
Assignees

Comments

@landreev
Copy link
Contributor

landreev commented Dec 3, 2020

Opening a new issue for the unfinished work in #6522.
In that issue the following constraint ("storageidentifiers should be unique within parent datasets") was added to DvObject.java:

uniqueConstraints = {...,@UniqueConstraint(columnNames = {"owner_id,storageidentifier"})}

The plan was to add this constraint to the existing databases using our normal flyway script method. We scrapped that plan once we realized that there's a certain number of legacy harvested datafiles in our database - and likely others - that have remote urls stored in the storageidentifier field; some of which are not unique within datasets.
It would be safe to actually add this constraint to older dbs such as ours.
These urls are not being used for any known purpose currently. We still don't want to lose them completely (?) - but we should think of a different place to move them to.
Then we could reuse the flyway script originally created for the issue above:
cd5cf39

@djbrooke djbrooke added the Medium label Jan 6, 2021
@landreev landreev self-assigned this Mar 17, 2021
landreev added a commit that referenced this issue Mar 30, 2021
…geidentifiers, and re-check the local ones for any new dupes, just in case. (#7451)
landreev added a commit that referenced this issue Mar 30, 2021
landreev added a commit that referenced this issue Mar 30, 2021
landreev added a commit that referenced this issue Jun 25, 2021
…onstraint

being enforced on existing databases in the next release. (#7451)
landreev added a commit that referenced this issue Jun 25, 2021
pdurbin added a commit that referenced this issue Jun 30, 2021
landreev added a commit that referenced this issue Jul 6, 2021
Extremely unlikely to be encountered anywhere else; but need to be
included to be able to QA on a copy of the prod. db.
Plus some extr diagnostics. (#7451)
janvanmansum added a commit to DANS-KNAW/dataverse that referenced this issue Jul 14, 2021
* initial semantic API endpoint

* merge new fields with existing ones

* differences from IQSS/develop that break compilation

* Add jsonld lib to compact to local context

* use expand/compact, refactor, add :startmigration endpoint

* try fix for parse error

* log value

* return dataset

* manage versionState, add debug output

* move debug ore generation after configuring dataset

* set versionstate, simplify, move terms init outside loop

* parse version number

* fix toStrings

* debug null pointer in DataverseFieldTypeInputLevel

* add support for fields with their own formal URI

* allow non-published to support debugging and future use

* refactor, use expanded version directly

* add modification time

* expanded has array with 1 val - handle it

* log compound values to start

* compact with no context for decontextualize

* handle appending and compound fields

* sort compound field children by display order

* parse date/time correctly

* Revert "sort compound field children by display order"

This reverts commit 8596ac8.

* typo

* now use Uri instead of label when matching terms

* set dsfield of dsfvalue

* additional debug, always set display order

* generate URIs for child types to match current ore maps

* allow oremap to work w/o modified date for debug

* null check on date itself

* fix compound value iteration

don't replace existing value - always add a new value, but, if not
appending, clear the list of values to start

* fix ttype map for terms with no uri - use title not name

as is done currently in generating the ORE map

* handle date format variations, including DV internal ones

see note in Dataverses - using Date() versus Timestamp() causes a
difference in precision and, perhaps surprisingly, a difference in the
response from version.getLastUpdateTime().toString() in creating the
OREmap.

* and the format in current published bags

* initial endpoint to release a migrated dataset

* create metadataOnOrig field

* add metadataOnOrig to solr

* use Finalize Publication command

Curate is for cases with an existing published version and migrated
datasets only have 1 version

Also - don't want to go through Publish command since it creates new
version numbers, etc.

* add debug, allow more details in 400 responses

* fix date-time issue

* typos

* create transfer bag type with orig files

handle no checksums on orig files

* missing tab

* add type param

* add semantic metadata api call only

* remove OREMap parameter

* fix error handling

FWIW: We have an error handler for the
edu.harvard.iq.dataverse.util.json.JsonParseException class but not for
javax.json.stream.JsonParsingException which was getting caught by the
Throwable handler and returned as a 500 error with json message {}

* append to current terms

* add replace param

* handle append on terms - fix cut/paste errors

* fix logic

* specify default

* make replace still append for multiple val fields

* add migrating switch

* expose uri in datasetField api

* track defined namespaces

and avoid having contexts with specific entries for terms that are in a
namespace already

* define equals, avoid duplicates in list

* replace string with const

* constant for CC0_URI

* GET/DELETE endpoints

* 7130-handle missing contact name

* Fix multiple description logic for info file

* put is always for :draft version

* don't cast to String[]

* add more logging

* handle unpublished versions

* add method that can return JsonObjectBuilder

which can be used with existing AbstractApiBean.ok()

* log details on failure

* multiple updates/fixes, added logging

* fix terms retrieval

* date test fixes for locale

* Java 11 update and test fixes inc. for different exception mesg

* update pom for v11 and running tests under 11

* fix for edu.harvard.iq.dataverse.api.AdminIT test fail in Java 11

The DV code tested in testLoadMetadataBlock_ErrorHandling assumed it
could parse the message of an ArrayOutOfBounds exception as an it to
determine the column that fails. This message is now a String. Rather
than parse it (and fail if it changes), I modified the code so that the
length of the values array is visible in the catch and can be sent
directly (the first out of bounds index is if/when the index is
values.length).

* flyway script adding the new constraint (IQSS#7451)

* A diagnostics script, to check and fix any duplicated harvested storageidentifiers, and re-check the local ones for any new dupes, just in case. (IQSS#7451)

* A pre-release text for the new diagnostics script (will discuss the approach in the PR/dv-tech) (IQSS#7451)

* Arming the script bomb... (IQSS#7451)

* switched to a conditional constraint. (IQSS#7451)

* Update PRE-RELEASE-INFO.txt

* update StringUtils package

* Do not count thumbnails and prep downloads, when redirecting to S3 (similarly to how these downloads are treated when done internally, without redirecting to the remote bucket, in line 457). IQSS#7924

* Implement usage of user supplied handle for authentication. This is an optional parameter.

* Fix mising ";" coding error.

* move metadataOnOrig out of citation block

which makes it optional

* IQSS#7431 remove XML prolog from the individual records of OAI-PMH ListRecords response

* IQSS#7431 adding integration tests

* Bump httpclient from 4.5.5 to 4.5.13

Bumps httpclient from 4.5.5 to 4.5.13.

---
updated-dependencies:
- dependency-name: org.apache.httpcomponents:httpclient
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* More forceful language in the "pre-release note" about the dvobject constraint
being enforced on existing databases in the next release. (IQSS#7451)

* renamed the flyway script for the dvobject constraint (since it didn't make it into 5.5) (IQSS#7451)

* Better check HandleAuthHandle for default value of null. If unequal to null then use it.

* Change "string" to "String" with uppercase first character

* build(ci): atempt to fix coveralls report. IQSS#7977

* Change parameter "HandleAuthHandle" so it start with a lower case character when it is being assigned.

* Update 5.3-release-notes.md

Removed incorrect statement: (If you are using a PostgreSQL server on `localhost:5432`, you can omit `dataverse.db.host` and `dataverse.db.port`.)

* sync with migration api branch (tests, docs, bug fixes)

* rename SQL update script IQSS#7451

* prevent page from blowing up if no remind msg in bundle IQSS#7975

* get "create dataset" working again IQSS#7986

* remove TODO IQSS#7986

Per comment from Jim: "This works and doesn't break the anonymized
access functionality as I thought it might."

* add anonymized access methods

* fix test

* Update doc/release-notes/6497-semantic-api.md

Co-authored-by: Philip Durbin <philipdurbin@gmail.com>

* Update doc/sphinx-guides/source/developers/dataset-semantic-metadata-api.rst

Co-authored-by: Philip Durbin <philipdurbin@gmail.com>

* add create example, remove solr schema copies file

* removed debug logging

* missing header

* Added an extra clause for some IQSS-specific harvested identifiers.
Extremely unlikely to be encountered anywhere else; but need to be
included to be able to QA on a copy of the prod. db.
Plus some extr diagnostics. (IQSS#7451)

* IQSS#7893 link Rserve documentation to necessary files in Dataverse repo

* IQSS#7893 remove redundant script mention per feedback from Leonid

* Adding -H + API token to curl commands

Without ``-H`` and the API token in these curl commands, the native API rejects the user's requests on the ground that they are a 'guest'.

* remove metadataOnOrig per review

* IQSS#7893 use fixedwidthplain text instead, clone master instead of develop

* fixes the small formatting issue with the link (IQSS#7893)

* Update doc/sphinx-guides/source/api/native-api.rst

Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>

* Update doc/sphinx-guides/source/api/native-api.rst

Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>

* Update doc/sphinx-guides/source/api/native-api.rst

Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>

* Update doc/sphinx-guides/source/api/native-api.rst

Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>

* Update documentation to be more consise about the handle and give a better example.

* IQSS#7936 append deaccessionDialog.reasons with periods for proper display

* add missing create method (in migrate PR)

* No "@id" npe fix

* avoid npe in logging

* only require "@id" when migrating

* fix logging in create case

* WIP

* Temporary conflict solution

* Added marker comments where code must be revised/fixed later '// TODO: FIX FOR MULTI-LICENSE'

Co-authored-by: qqmyers <qqmyers@hotmail.com>
Co-authored-by: Leonid Andreev <leonid@hmdc.harvard.edu>
Co-authored-by: Gustavo Durand <scolapasta+github@gmail.com>
Co-authored-by: Robert Verkerk <robert.verkerk@surfsara.nl>
Co-authored-by: pkiraly <pkiraly@gwdg.de>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Oliver Bertuch <o.bertuch@fz-juelich.de>
Co-authored-by: Kevin Condon <kcondon@hmdc.harvard.edu>
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
Co-authored-by: Philip Durbin <philipdurbin@gmail.com>
Co-authored-by: Don Sizemore <don.sizemore@gmail.com>
Co-authored-by: Benjamin Peuch <benjamin.peuch@gmail.com>
Co-authored-by: Jan van Mansum <janvanmansum@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants