Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvesting : Add granularity to ListRecords when using from parameter #11038

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

luddaniel
Copy link
Contributor

What this PR does / why we need it:

It performs an Identify request before a ListRecords request with from parameter to add the correct granularity.

Which issue(s) this PR closes:

Suggestions on how to test this:

Test with YYYY-MM-DD granularity, without PR it will FAIL in 2nd run, with PR it will work :

{
  "nickName": "nakala",
  "dataverseAlias": "root",
  "type": "oai",
  "style": "default",
  "harvestUrl": "https://api.nakala.fr/oai2",
  "archiveUrl": "https://api.nakala.fr",
  "archiveDescription": "This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.",
  "metadataFormat": "oai_dc",
  "set": "doi_10.34847_nkl.002f8m36",
  "schedule": "none",
  "allowHarvestingMissingCVV": true
}
curl -H "X-Dataverse-key: $API_KEY" -H "Content-Type: application/json" -X POST  "http://localhost:8080/api/harvest/clients/nakala" --upload-file "client.json"

Test with YYYY-MM-DDThh:mm:ssZ granularity, non regression test, it work still work with the PR :

{
  "nickName": "ird",
  "dataverseAlias": "root",
  "type": "oai",
  "style": "dataverse",
  "harvestUrl": "https://dataverse.ird.fr/oai",
  "archiveUrl": "https://dataverse.ird.fr",
  "archiveDescription": "This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.",
  "metadataFormat": "dataverse_json",
  "schedule": "none",
  "allowHarvestingMissingCVV": true
}
curl -H "X-Dataverse-key: $API_KEY" -H "Content-Type: application/json" -X POST  "http://localhost:8080/api/harvest/clients/ird" --upload-file "client.json"

Is there a release notes update needed for this change?:
Yes

@pdurbin pdurbin added Type: Bug a defect Size: 3 A percentage of a sprint. 2.1 hours. labels Nov 20, 2024
@ofahimIQSS ofahimIQSS added the FY25 Sprint 11 FY25 Sprint 11 (2024-11-20 - 2024-12-04) label Nov 26, 2024
@cmbz cmbz added the FY25 Sprint 12 FY25 Sprint 12 (2024-12-04 - 2024-12-18) label Dec 5, 2024
@pdurbin pdurbin self-assigned this Dec 16, 2024
@pdurbin
Copy link
Member

pdurbin commented Dec 16, 2024

@luddaniel thanks for discussing this originally at https://dataverse.zulipchat.com/#narrow/channel/375707-community/topic/Harvesting.20with.20from.3D.20parameter/near/481979177 and for making this pull request.

I didn't run the code but it looks reasonable to me. I'll go ahead and request a review from @landreev in case he has time to take a look.

@luddaniel one question for you. Under "how to test" you mention "client.json". I this how you're varying the timestamp format? Would it be helpful for us to have those client.json files when we are testing?

@luddaniel
Copy link
Contributor Author

Hi @pdurbin

@luddaniel one question for you. Under "how to test" you mention "client.json". I this how you're varying the timestamp format? Would it be helpful for us to have those client.json files when we are testing?

I added the json to prove I tested and to help you to test.
Both json should be tested without the PR (develop) and with the PR, expecting to see a improve in nakala with the PR and a still working status for ird. Note that nakala will fail without the PR in the 2nd run for the very reason of this PR because on the 2nd run from string query is used.

@pdurbin
Copy link
Member

pdurbin commented Dec 17, 2024

@luddaniel sorry, I think I confused myself. I understand now that you've put the client.json content inline above the curl command.

I'd say this is ready for some QA. I'll approve it. Thanks!

@ofahimIQSS ofahimIQSS self-assigned this Dec 19, 2024
@ofahimIQSS
Copy link
Contributor

Can we please update the 6.4 to 6.5 in the dataverse/modules/dataverse-parent
/pom.xml

@pdurbin
Copy link
Member

pdurbin commented Dec 20, 2024

@ofahimIQSS sure, I went ahead and merged the latest from develop into this PR.

@coveralls
Copy link

Coverage Status

coverage: 22.687% (-0.002%) from 22.689%
when pulling 9ced7d1 on Recherche-Data-Gouv:11020-unconsidered-harvesting-granularity
into 2ae639d on IQSS:develop.

@cmbz cmbz added the FY25 Sprint 14 FY25 Sprint 14 (2025-01-02 - 2025-01-15) label Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 11 FY25 Sprint 11 (2024-11-20 - 2024-12-04) FY25 Sprint 12 FY25 Sprint 12 (2024-12-04 - 2024-12-18) FY25 Sprint 14 FY25 Sprint 14 (2025-01-02 - 2025-01-15) Size: 3 A percentage of a sprint. 2.1 hours. Type: Bug a defect
Projects
Status: QA ✅
Status: 🙏 Wanted for next version
Development

Successfully merging this pull request may close these issues.

Unconsidered harvesting granularity
5 participants