Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert latex encoding - code in entry that crashes jabref #6399

Open
1 task done
ilippert opened this issue May 3, 2020 · 23 comments
Open
1 task done

convert latex encoding - code in entry that crashes jabref #6399

ilippert opened this issue May 3, 2020 · 23 comments
Labels
bug Confirmed bugs or reports that are very likely to be bugs unicode unicode related issues

Comments

@ilippert
Copy link
Contributor

ilippert commented May 3, 2020

JabRef 5.1--2020-05-02--1d9957b
Linux 5.6.8-200.fc31.x86_64 amd64
Java 14.0.1

In a test database with >10500 entries, add an entry (ctrl n, or with button): this crashes jabref. No log message.

In a smaller database, no problem adding a new entry. I can copy and paste it into the larger database.

@AEgit
Copy link

AEgit commented May 5, 2020

JabRef 5.1--2020-05-04--b5599c9
Windows 10 10.0 amd64
Java 14.0.1

AND

JabRef 5.1--2020-05-04--b5599c9
Linux 5.3.0-51-generic amd64
Java 14.0.1

using a database with >19,000 entries

Cannot reproduce this issue. Might be related to specific database, preferences or hardware? @ilippert : Can you reliably reproduce this problem? Or does it appear only sometimes?

@ilippert
Copy link
Contributor Author

ilippert commented May 5, 2020

JabRef 5.1--2020-05-04--7bb1e24
Linux 5.6.8-200.fc31.x86_64 amd64
Java 14.0.1

yes, I can still reproduce this reliably.

@calixtus
Copy link
Member

calixtus commented May 5, 2020

Can you tell us, what the ram usage is of JabRef when this happens?

@ilippert
Copy link
Contributor Author

ilippert commented May 5, 2020

before adding new entry:
memory 1,7gb
virtual memory 107,1gb
Resident memory 1,9 gb
Shared memory mb

when adding: shared memory goes up to 250mb; memory and resident memory each go up by +100mb.

@calixtus
Copy link
Member

calixtus commented May 5, 2020

I don't really know much about java memory usage, but a 250 mb of ram usage rise when adding one entry seems not normal...
@tobiasdiez @koppor @Siedlerchr ?

@ilippert
Copy link
Contributor Author

ilippert commented May 5, 2020

sorry, I just checked something else: i created an entirely new bib file, copying in 10500 entries. Then adding an entry succeeded.

@ilippert
Copy link
Contributor Author

ilippert commented May 5, 2020

ok, comparing the original database file and the newly created one, with the same entries: I see a difference of file size: 1.5mb difference file size.
i tried to compare both files - but the order of the entries are entirely differently stored.

I have now copied the group structure from the original bib file and copied into the new bib file. That new bib file is still by 1.5mb smaller than the original file. I can still add entries to this new file.

Upps, the new jabref file has, of course, changed all my timestamps. that's not good. i need the old timestamps....

@ilippert
Copy link
Contributor Author

ilippert commented May 5, 2020

So, maybe we can close the issue at this point - as in, it was an "artefact" of that original database.

However, the original database is simply one that has grown throughout the years and versions of jabref. Maybe other users have also such naturally growing biblatex databases.

@ilippert
Copy link
Contributor Author

ilippert commented May 5, 2020

JabRef 5.1--2020-05-04--7bb1e24
Linux 5.6.8-200.fc31.x86_64 amd64
Java 14.0.1

wait, now, having deactivated the timestamp update, creating a new file and pasting the entries results (tested in two instances) in a new database of the equivalent size as the original bib file.

And now adding a new entry results in crashing jabref.

@ilippert
Copy link
Contributor Author

ilippert commented May 5, 2020

this bug is quite new, it started to emerge around last weekend. before i was able to add entries to the original database file without crash.

@koppor
Copy link
Member

koppor commented May 6, 2020

Think, we need your database to be able to reproduce the issue. Would it be possible that you share it? Only the core developers will have access to the file - it won't be published, ...

@ilippert
Copy link
Contributor Author

ilippert commented May 6, 2020

Yes, I am happy to share, please advise how you like to receive the file

@koppor
Copy link
Member

koppor commented May 7, 2020

You'll see my email address at my GitHub profile. Could you try sending it there?

@ilippert
Copy link
Contributor Author

I now noted, that regularly, with my Intel® Core™ i7-6700HQ CPU @ 2.60GHz × 8 system, if the said file is open, jabref needs 40-60% of my CPU. If I close the file, Jabref needs only about 2%.

@ilippert
Copy link
Contributor Author

I investigated the database and identified one entry that reliably breaks jabref. However, I cannot detect what is wrong with it.

@Article{Kolb2003,
  Title                    = {Protest, \"{O}ffentlichkeitsarbeit und {L}obbying schlie{\ss}en sich nicht aus. {D}ie {M}assen als {S}chl\"{u}ssel zur {M}acht. {F}elix {K}olb vergleicht die politischen {S}trategien von {U}mweltbewegung und {G}lobalisierungskritikern, extract from `\textit{politische \"{o}kologie}' (85) 2003},
  Author                   = {Felix Kolb},
  Year                     = {2003},
  Month                    = {14. Aug.},
  Number                   = {188},
  Pages                    = {7},

  Journal                  = {Frankfurter Rundschau}
}

Without this entry, jabref seems to run more smoothly...

@AEgit
Copy link

AEgit commented May 25, 2020

Just an idea and possible workaround (?). Have you tried making the following changes:

\"{O}ffentlichkeitsarbeit to {\"{O}}ffentlichkeitsarbeit
{S}chl\"{u}ssel to {S}chl{\"{u}}ssel
\"{o}kologie} to {\"{o}}kologie} (on a side note: {\"{O}}kologie} should be upper case)

Does that make any difference?

@calixtus
Copy link
Member

Maybe a parsing error in the month field? Is there maybe a max length for the title?

@AEgit
Copy link

AEgit commented May 25, 2020

Actually, it might be best to write the umlauts differently (see https://tex.stackexchange.com/questions/366546/jabref-cant-read-bib-file-created-by-jabref-3-0/434268#434268

and

https://tex.stackexchange.com/questions/57743/how-to-write-%c3%a4-and-other-umlauts-and-accented-letters-in-bibliography):

So change \"{O} to {\"O}
and
\"{u} to {\"u}
and
\"{o} to {\"o} (or {\"O} if you are allowed to correct the capitalization)

@ilippert
Copy link
Contributor Author

the problem is in inserting \"{o} in

`\textit{}'

this breaks jabref.

@calixtus
Copy link
Member

calixtus commented May 26, 2020

Thank you for triangulating this.
JabRef uses internally an extern library (latex2unicode) to convert the latex encoding. Sadly, this library seems no more in active development, so we already started to think of a teplacement. But this is going to be a larger project.
I don't know yet if there is a quick fix possible.

Refs #5547
Refs #6155

@calixtus calixtus added the bug Confirmed bugs or reports that are very likely to be bugs label May 26, 2020
@ilippert
Copy link
Contributor Author

issue topic -
Now I am on
JabRef 5.1--2020-05-25--6f34de3
Linux 5.6.13-300.fc32.x86_64 amd64
Java 14.0.1

I have moved all my old entries from my 15y old database to a new database (and in that process caught #6399 (comment) with this bug #6399 (comment)).
Now I do not have the problem of the crash anymore - crash when adding new entry in database with 10500 entries. Therefore I am changing the title of this issue. please feel free to alter, if this does not fit.

@ilippert ilippert changed the title crash when adding new entry in database with 10500 entries convert latex encoding - code in entry that crashes jabref May 26, 2020
@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member

I can't shed much light on the underlying issue, but I don't think it should be the latex2unicode converter. Adding the following test case to LatexToUnicodeFormatterTest.java works for me,

@Test
void formatUmlautsInTextit() {
    assertEquals("\uD835\uDC5D\uD835\uDC5C\uD835\uDC59\uD835\uDC56\uD835\uDC61\uD835\uDC56\uD835\uDC60\uD835\uDC50ℎ\uD835\uDC52 \uD835\uDC5C̈\uD835\uDC58\uD835\uDC5C\uD835\uDC59\uD835\uDC5C\uD835\uDC54\uD835\uDC56\uD835\uDC52",
            formatter.format("\\textit{politische \\\"{o}kologie}"));
}

where the unicode on the left comes from yaytext.com.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2021

This issue has been inactive for half a year. Since JabRef is constantly evolving this issue may not be relevant any longer and it will be closed in two weeks if no further activity occurs.

As part of an effort to ensure that the JabRef team is focusing on important and valid issues, we would like to ask if you could update the issue if it still persists. This could be in the following form:

  • If there has been a longer discussion, add a short summary of the most important points as a new comment (if not yet existing).
  • Provide further steps or information on how to reproduce this issue.
  • Upvote the initial post if you like to see it implemented soon. Votes are not the only metric that we use to determine the requests that are implemented, however, they do factor into our decision-making process.
  • If all information is provided and still up-to-date, then just add a short comment that the issue is still relevant.

Thank you for your contribution!

@ThiloteE ThiloteE added the unicode unicode related issues label Apr 18, 2022
koppor pushed a commit that referenced this issue May 1, 2023
a985762505 Update environmental-and-engineering-geoscience.csl (#6512)
5118058ea0 Update norsk-henvisningsstandard-for-rettsvitenskapelige-tekster.csl (#6515)
e9830d3f5e Create polish-archives-of-internal-medicine.csl (#6399)
05ef543bd6 Update ieee.csl (#6511)
b6e6292e4b Update universite-de-bordeaux-ecole-doctorale-de-droit.csl (#6510)
af38aba0e9 Create la-nouvelle-revue-du-travail.csl (#6400)
4b23d7a03e Create north-pacific-anadromous-fish-commission-bulletin.csl (#6436)
77ea82a242 Create journal-of-dental-traumatology.csl (#6403)
af4578d1a7 Make magnetic-resonance-in-medicine.csl AMA dependent (#6433)
5467a4f901 Create medizinische-universitaet-innsbruck-vancouver.csl (#6484)
8a3c0a2b9b Update united-states-international-trade-commissio (#6487)
789267a9cb Update cardiff-university-harvard.csl (#6482)
252a5b5c08 Locators in palaeontology journal styles (#6496)
3d2bff0794 Update ecosistemas.csl (#6503)
199baca2c6 Bump nokogiri from 1.13.10 to 1.14.3 (#6504)
feffe61ae4 Update universite-du-quebec-a-montreal-etudes-litteraires-et-semiologie.csl (#6505)

git-subtree-dir: buildres/csl/csl-styles
git-subtree-split: a985762505418bd63c26a54c59b48e3ed7426953
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bugs or reports that are very likely to be bugs unicode unicode related issues
Projects
None yet
Development

No branches or pull requests

6 participants