Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Missing Fillers/Extractors for Supported Fields and Support Day Conversion #8531

Merged
merged 4 commits into from
Mar 9, 2022

Conversation

addak
Copy link
Contributor

@addak addak commented Feb 26, 2022

Changes:-

  • Add missing setters/extractors for COVERAGE, LANGUAGE, RIGHTS, SOURCE, TYPE wherever possible.
  • Support Day in Date Field during extraction

Fixes #8491

Tasks To Complete:-

  • Add Missing Extractor/ Filler functions
  • Add Day Support for Date Field
  • Add some additional test post initial discussion
  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Squash Commits
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

.ifPresent(month -> bibEntry.setMonth(month));

if (Objects.nonNull(calender)) {
if (StringUtils.isNotEmpty(dateFormat)) {
Copy link
Member

@Siedlerchr Siedlerchr Feb 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the method "StringUtil" class from JabRef (note the absence of the "s")

@Siedlerchr Siedlerchr requested a review from btut February 26, 2022 18:22
String date = dates.get(0).trim();

String dateString = dates.get(0).trim();
var dateFormat = ((DublinCoreSchemaCustom) dcSchema).getDateAttributeFormatQualifier();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand why it's necessary to add an additional attribute and modify the existing schema. This will probably break things.
Have you looked at JabRefs' Date.parse() it can already treat date ranges

Copy link
Contributor Author

@addak addak Feb 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it did break stuff in the Adobe Acrobat DC viewer( It couldn't find the seq entry for dc:date even though it was present )

As for Date.parse(), yeah you're right. I'll go with that then.
Edit : As was mentioned by the original method, the reason I wanted the format is due to the attached image ( while the rawValue is indeed 2000-05, the getElementsasString returns the date time value )
image

One way I can think of to get the rawValue is to override the dcSchema object's getUnqualifiedSequenceValueList ( and since the dcSchema object is already created and not instantiated by us, we can't use an anonymous class instantiation, I assume )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah now I see the issue. You need to adjust the Date parser https://www.baeldung.com/java-datetimeformatter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to add a pattern like uuuu-MM-dd'T'HH:mm:ssxxx not sure about the xx

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently testing this if this could work

Copy link
Contributor Author

@addak addak Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I get the heauristic approach part but I would like to know about whether the creation of the custom DcSchema class is fine or not. It uses FieldUtils because there aren't any setters for those fields.
I use this to get the raw value of the xmp date:dc li field, and not the getElementsAsString version ( which, as you said, provides the ISO8601 converted date, hence the defaults). While getElementsAsStrijg provides us the iso8601 date time string, the raw value is as is from the xmp data. Could be iso8601, yyyy-mm, yyyy, yyyy-mm-dd, etc. Since we're handling parsing anyway, we might as well disallow the standard parsing done by dcSchema.
In this case, we wouldn't need to assume/go about a heuristic approach. The only reason we needed that was because during fetching of date value from dcSchema, it would apply a DateConverter.toISO8601 on the rawValue and provide it to us, resulting in defaults for non-available parts of the date.

An Example:-
say we did (refer to XmpUtilWriterTest.testWriteXmp)

bibEntry.setDate(new Date(2000, 5));

when we convert this into xmp , this is how it looks

<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?><x:xmpmeta xmlns:x="adobe:ns:meta/">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about="">
      <dc:creator>
        <rdf:Seq>
          <rdf:li>Vladimir N. Vapnik</rdf:li>
        </rdf:Seq>
      </dc:creator>
      <dc:relation>
        <rdf:Bag>
          <rdf:li>bibtex/citationkey/WriteXMPTest</rdf:li>
          <rdf:li>bibtex/owner/Ich</rdf:li>
        </rdf:Bag>
      </dc:relation>
      <dc:coverage>coverageField</dc:coverage>
      <dc:identifier>10.1007/978-1-4757-3264-1</dc:identifier>
      <dc:language>
        <rdf:Bag>
          <rdf:li>English</rdf:li>
          <rdf:li> Japanese</rdf:li>
        </rdf:Bag>
      </dc:language>
      <dc:publisher>
        <rdf:Bag>
          <rdf:li>Springer Science + Business Media</rdf:li>
        </rdf:Bag>
      </dc:publisher>
      <dc:rights>
        <rdf:Alt>
          <rdf:li xml:lang="x-default">Right To X</rdf:li>
        </rdf:Alt>
      </dc:rights>
      <dc:source>JabRef</dc:source>
      <dc:title>
        <rdf:Alt>
          <rdf:li xml:lang="x-default">The Nature of Statistical Learning Theory</rdf:li>
        </rdf:Alt>
      </dc:title>
      <dc:date>
        <rdf:Seq>
          <rdf:li>2000-05</rdf:li>
        </rdf:Seq>
      </dc:date>
      <dc:format>application/pdf</dc:format>
      <dc:type>
        <rdf:Bag>
          <rdf:li>Book</rdf:li>
        </rdf:Bag>
      </dc:type>
    </rdf:Description>
  </rdf:RDF>
</x:xmpmeta><?xpacket end="w"?>

Now, after parsing the xmp into the dsSchema, the raw value is 2005-05, as shown in the screenshot
image
The DateType Object ( which dc:date is ) , when a getElementsAsString() call is made, uses this function

@Override
    public String getStringValue()
    {
        return DateConverter.toISO8601(dateValue);
    }

and that's the value we get usually. After getting the raw value, this is what the parsing looks like
image

This is the raw output of the xml before any interpretation is happening.

If that is indeed the rawValue that's present as is in the xmp data, then with respect to the above example I've shown, I think we should parse it as is, without making assumptions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I just checked that as well. So your approach looks good with the custom schema

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I just checked that as well. So your approach looks good with the custom schema

Okay then, if everything else is good to go, I'll begin squashing commits, adding the "Support Date conversion from XMP to bibTex" in changelog. I don't think I need to add more tests honestly, since modifications to current Test Cases should cover that. There are no UI changes present. I guess the only thing remaining would be to run the Jabref application

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add one test for the date thing now. Maybe in the Writer Tests
You also don't need to squash your commits. We do that usually on merge

Copy link
Contributor Author

@addak addak Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh...I already squashed them...
Anyways here's a screenshot of it working
image
image

Sure, I'll add another test for that

@@ -299,7 +351,19 @@ private void fillCreator(String creators) {
*/
private void fillDate() {
bibEntry.getFieldOrAlias(StandardField.DATE)
.ifPresent(publicationDate -> dcSchema.addUnqualifiedSequenceValue("date", publicationDate));
.ifPresent(publicationDate -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Jabref's inbuilt Date parsing should be sufficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public static Optional<Date> parse(String dateString) {

@addak addak marked this pull request as ready for review February 28, 2022 04:53
@addak addak changed the title Add Missing Fillers/Extractors for Supported Fields and Supoort Day Conversion Add Missing Fillers/Extractors for Supported Fields and Support Day Conversion Feb 28, 2022
dateValue.getYear().ifPresent(year -> bibEntry.setField(StandardField.YEAR, Integer.toString(year)));
});
} catch (RuntimeException e) {
LOGGER.error("Failed To Parse Date\n {}", ExceptionUtils.getStackTrace(e));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to catch a RuntimeEception here? You have already optionals that will evaluate to empty if they cannot parse a date.
A Runtime Exception is an unchecked exception that should never be caught. An exception should always be as precise as possible.
https://www.baeldung.com/java-exceptions

Second, every logger takes an exception/throwable as second parameter. Stack trace etc are handled automatically

FieldUtils.writeField(dublinCoreSchemaCustom, "container", dcSchema.getContainer(), true);
FieldUtils.writeField(dublinCoreSchemaCustom, "attributes", FieldUtils.readField(dcSchema, "attributes", true), true);
} catch (Exception e) {
LOGGER.error("Error making custom DC Schema\n {}", ExceptionUtils.getStackTrace(e));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StackTrace etc is handled automatically by the logger. Every Logger takes an exception/trowable as last argument

Suggested change
LOGGER.error("Error making custom DC Schema\n {}", ExceptionUtils.getStackTrace(e));
LOGGER.error("Error making custom DC Schema", e);

@Siedlerchr
Copy link
Member

In general looks already good, just some code style improvements and it's okay from my side! 👍

Btw, as you are now into the XMP stuff, would you mind taking a look at this issue as well? #7925
But a fix should be a follow up PR then.

@ThiloteE
Copy link
Member

ThiloteE commented Mar 1, 2022

If you need me for testing, just say when and what i have to do.

} catch (Exception e) {
LOGGER.error("Error making custom DC Schema\n {}", ExceptionUtils.getStackTrace(e));
LOGGER.error("Error making custom DC Schema. Using the default", e);
return dcSchema;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it will never end up here. but just in case, I've left it here

@addak
Copy link
Contributor Author

addak commented Mar 3, 2022

Please Review!

Btw, as you are now into the XMP stuff, would you mind taking a look at this issue as well? #7925
But a fix should be a follow up PR then.

Sure, I'll look into it once I wrap this one up.

@Siedlerchr I forgot to mention but since dc:Type is mapped to the entityType ( say article), It's already implemented and hence I haven't covered it.

@ThiloteE Please read the Issue description. Verify that those fields don't end up as bibtex/{fieldname}/{value} in XMP and that partial dates don't have some default value when XMP dc:date is converted to bibTex and that day is present if it is present in the XMP.
I've kinda tested it in the UI but I'm not really familiar with the application in terms of functionality. Would be quite helpful if you could confirm it works as intended.

@@ -78,25 +81,17 @@ private void extractAuthor() {
* property filled with jan will override this behavior and no data is lost. In the cases, where xmp is written by
* another service, the assumption is, that the 1st January is not a publication date at all.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw this, I guess the javadoc needs to be updated to reflect the new behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right, and it looks I've forgotten to change the description. Will change that as part of the last task remaining

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The xmp one should be the only relevant one. The findunlinked files is a different feature,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/JabRef/user-documentation/blob/main/en/advanced/xmp.md

Okay I looked into the doc and there's no warning note, or any in particular, regarding JabRef's previous inability to convert day from XMP to bibtex

Copy link
Member

@Siedlerchr Siedlerchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again!

@Siedlerchr Siedlerchr added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Mar 3, 2022
CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general: looks good. I have small nitpick comments 😇

src/main/java/org/jabref/model/entry/Date.java Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@Siedlerchr Siedlerchr merged commit 7bc6766 into JabRef:main Mar 9, 2022
@Siedlerchr
Copy link
Member

Thanks again!

Siedlerchr added a commit that referenced this pull request Mar 9, 2022
* upstream/main:
  Add Missing Fillers/Extractors for Supported Fields and Support Day Conversion (#8531)
  Bump checkstyle from 9.3 to 10.0 (#8544)
  Fix online link detection in entry editor (#8514)
  Add some JavaDoc to Fetchers
  Support two argument form of \abx@aux@cite macro in DefaultAuxParser (#8549)
  Bump guava from 31.0.1-jre to 31.1-jre (#8543)
  Bump org.beryx.jlink from 2.24.4 to 2.25.0 (#8548)
  Bump postgresql from 42.3.2 to 42.3.3 (#8546)
  Bump richtextfx from 0.10.7 to 0.10.9 (#8547)
  Bump archunit-junit5-engine from 0.22.0 to 0.23.1 (#8545)
  Bump actions/checkout from 2 to 3 (#8542)
  Squashed 'buildres/csl/csl-styles/' changes from eb97405..8f69d4e
  Bump classgraph from 4.8.139 to 4.8.141 (#8535)
  Bump archunit-junit5-api from 0.22.0 to 0.23.1 (#8536)

# Conflicts:
#	src/test/java/org/jabref/logic/xmp/XmpUtilReaderTest.java
Siedlerchr added a commit that referenced this pull request Mar 13, 2022
* upstream/main:
  Update RemoteSetupTest.java, adding eq() function from mockito (#8561)
  readd encoding after merge
  Open office refactor finalization (formerly OObranch J cleanup) (#7795)
  Revert "Refine documentation (#8551)" (#8560)
  Refine documentation on logging (#8550)
  Revert "Refine documentation (#8551)" (#8559)
  Refine documentation (#8551)
  New Crowdin updates (#8557)
  New Crowdin updates (#8553)
  Fix missing metadata in BibDatabaseContext (#8556)
  Add encoding detection (and pin export to UTF-8) (#8506)
  Add Missing Fillers/Extractors for Supported Fields and Support Day Conversion (#8531)
  Bump checkstyle from 9.3 to 10.0 (#8544)
  Fix online link detection in entry editor (#8514)
  Add some JavaDoc to Fetchers
  Support two argument form of \abx@aux@cite macro in DefaultAuxParser (#8549)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

XMP Writer incomplete
4 participants