Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export Errors: DDI, HTML DDI have issues #8036

Closed
kcondon opened this issue Jul 29, 2021 · 10 comments · Fixed by #8039
Closed

Export Errors: DDI, HTML DDI have issues #8036

kcondon opened this issue Jul 29, 2021 · 10 comments · Fixed by #8039

Comments

@kcondon
Copy link
Contributor

kcondon commented Jul 29, 2021

Discovered in 5.6 dev but confirmed on 5.5.

DDI export shows an error at the top of the export:
This page contains the following errors:
error on line 1 at column 30220: Extra content at the end of the document
Below is a rendering of the page up to the first error.

HTML DDI Codebook export just shows a blank page.

@landreev
Copy link
Contributor

This is a bit confusing... The DDI error does appear to be happening on demo.dataverse.org ("confirmed on 5.5"); example: https://demo.dataverse.org/api/datasets/export?exporter=ddi&persistentId=doi%3A10.70122/FK2/BRJLGV - the outer <codeBook> element isn't closed at the end.
But this doesn't seem to be universally happening everywhere. For example, this dataset in prod. that has just been published has a well-formatted DDI: https://dataverse.harvard.edu/api/datasets/export?exporter=ddi&persistentId=doi%3A10.7910/DVN/LPCEXG.

It could be caused by specific metadata of course... but the above dataset on demo doesn't have much metadata at all - it's a bare bones dataset with 3 files.

@landreev
Copy link
Contributor

P.S. Not only the closing </codeBook> is missing, but the closing </dataDscr> element is in the wrong place (after the last </otherMat>, for some reason...)

Hmmmm.

@poikilotherm
Copy link
Contributor

poikilotherm commented Jul 29, 2021

For the record: I refactored lots of tests and XML stuff in #8000. This might have a (future) influence on this one.

Please also note #7127 and #3648 (+others). Good chances to see more errors from where those came from that might be related to this one.

@landreev
Copy link
Contributor

landreev commented Jul 29, 2021

Kevin, check out this dataset I just created and published on internal:
Same files, same amount of metadata (or lack thereof) - but valid DDI export:
https://dataverse-internal.iq.harvard.edu/dataset.xhtml?persistentId=doi%3A10.70122%2FFK2%2FBS7KXQ

I'm having a bad feeling that it's working w/ S3... but not when the export lives on a filesystem. This would explain the fact that it's working in prod.
Can you see anything else that's different, between my test dataset, above, and yours??

(it's not impossible; that we flush some buffers in that export, in a way that fails differently, depending on the physical driver... but that would still be quite weird!)

@landreev
Copy link
Contributor

@poikilotherm Thanks!

I'll take a look, but... When you say "refactored"... how broken were they?

@landreev
Copy link
Contributor

But, I CANNOT reproduce this on my own build (develop branch); these DDI exports are just working for me, regardless of the type of dataset or where they are written.

Is there a chance that both demo and dataverse-internal are broken in some similar way, outside of the application itself? Some metadata block update that was applied to both of them? - idk.

OK, I've already spent more time looking into this than I could afford...

@kcondon
Copy link
Contributor Author

kcondon commented Jul 30, 2021

OK, I've identified the missing step, still trying to isolate issue: I've restricted 2 of 3 files. Doesn't happen with only 1 or 2 restricted files. Doesn't need terms of access.

@landreev So, the special sauce is this: 1 unrestricted tabular file, 1 restricted non tabular file, with or without request access/terms of access. This causes the problem.

Note that I've tried many combinations and if the unrestricted file is non tabular it works.

@landreev
Copy link
Contributor

Thanks! I feel bad now, having missed this part last night.
So I guess our export code tries to skip that restricted file in the DDI, but messes up the formatting in the process.

@qqmyers
Copy link
Member

qqmyers commented Aug 1, 2021

Yep -

should be a continue; not return; Otherwise, the restricted file causes you to exit the loop through files and, if you've already had a tabular file, you never write the closing "dataDscr" element, which means the writeEndElement() that is supposed to close the overall codeBook element just ends the dataDscr element and leaves the codeBook unclosed.

@pdurbin
Copy link
Member

pdurbin commented Aug 10, 2021

Shoot. I added the return in pull request #7642. (So this bug should only be in 5.4 and 5.5). Thanks for fixing, @qqmyers !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants