-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IEEE Xplore title em dash cleanup #11078
Conversation
Oops, sorry. The test should be in field format test, but not in |
src/main/java/org/jabref/logic/bibtex/FieldContentFormatter.java
Outdated
Show resolved
Hide resolved
I've created a new formatter, but can't understand where to add it to format fields. Currently, the title for article https://doi.org/10.1109/PERCOMW.2015.7133989 is not formatted. But I've made the class and tests. Maybe I should've not added it to the |
There should be a method doPostcleaup where you can add the formatter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, running https://github.com/JabRef/jabref/blob/main/src/main/java/org/jabref/logic/formatter/bibtexfields/HtmlToUnicodeFormatter.java twice should be tested instead of introducing a new cleaning
src/main/java/org/jabref/logic/formatter/bibtexfields/NormalizeEmDashesIEEEFormatter.java
Outdated
Show resolved
Hide resolved
|
||
@Override | ||
public String format(String value) { | ||
return value.replaceAll("—", "—"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing the input, isnt that a double encoded unicode character?
Should the importer run https://github.com/JabRef/jabref/blob/main/src/main/java/org/jabref/logic/formatter/bibtexfields/HtmlToUnicodeFormatter.java twice for the title field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that will leave the ampersand.
I've tried to add additional HtmlToUnicodeFormatter.java to DoiFetcher.doPostCleanup
. Is that the right place?
It replaced the &
with ampersand, but #x2014;
was left as is. So, probably there should be a separate formatter?
I read the original issue and you have provided a sketch of the solution. You said to add a new cleanup class and integrate it with IEEE
class. But how to add the cleanup to the IEEE
? There seems to be no doPostCleanup
method. What if we just change the parseJsonResponse
and directly cleanup from there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the IEEE doPostCleanup method, it's just a single method that is called manually because only the EntryBasedParserFetcher has it defined as interface method.
See thee DoiFetcher for example, it defines it manually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the JabRef doesn't use IEEE
class when importing the DOI https://doi.org/10.1109/PERCOMW.2015.7133989
I've changed the class for the formatter. Calling And it seems I've added the formatter to |
# Conflicts: # src/main/resources/l10n/JabRef_en.properties
|
||
@Override | ||
public String format(String value) { | ||
return value.replace("—", "—"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is Unicode or plain ascii? I think, we fetch in plain BibTeX, thus, it should be --, shouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The source code is in Unicode.
In NormalizeEnDashesFormatter
en dash was replaced with --
, em dash should be replaced with it too or something longer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep --
.
Normally, it would be ---
, but this is very strange in Paper titles. Never saw it. Therefore, I proposed --
.
There needed to be a small fix. I coded it in #11091. Since the new code is much more smaller than the code proposed in this PR (and also handles more cases), I close this PR. Thank you for digging out where to call the formatter! 👍 |
Closes JabRef#286.
I added a test to DoiFetcherTest for doi https://doi.org/10.1109/PERCOMW.2015.7133989.
The parsed title of the article should be:
But previously it was:
In order to make the cleanup, I added some code to the
format
method of theFieldContentFormatter
. (Is that the right file for cleanup?).Mandatory checks