Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to parsing large xml inputs (https://github.com/RDFLib/rdflib/issues/749) #750

Merged
merged 2 commits into from
May 14, 2018

Conversation

artreven
Copy link
Contributor

@artreven artreven commented Jun 7, 2017

No description provided.

@gromgull
Copy link
Member

gromgull commented Nov 3, 2017

This fails with huge_tree is not a valid keyword argument on all python versions.

We are quite lax with where we get elementtree from: https://github.com/RDFLib/rdflib/blob/master/rdflib/compat.py#L16-L36

I guess the default version doesn't support this?

@artreven
Copy link
Contributor Author

I just checked what happens at my end: I use Python 3.5.2 and lxml 3.5.0. I get etree from lxml and the line etree.XMLParser(huge_tree=True) returns my the parser without any exceptions.
What could be the issue?

@niklasl
Copy link
Member

niklasl commented Nov 13, 2017

As @gromgull says, the default etree from the standard library does not support this option:

import xml.etree.ElementTree as etree

etree.XMLParser(huge_tree=True)

On Python 3, this fails with:

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
TypeError: 'huge_tree' is an invalid keyword argument for this function

Since lxml is an optional package for RDFLib, you cannot rely on it being available to support the non-standard keyword (huge_tree).

@coveralls
Copy link

coveralls commented Nov 14, 2017

Coverage Status

Coverage increased (+1.2%) to 65.296% when pulling 85309a5 on artreven:master into ba4c542 on RDFLib:master.

@artreven
Copy link
Contributor Author

Sorry, I did not get the original comment from @gromgull at first, I thought lxml is the default version. I have added an except clause to provide a fallback in case of default version.

@gromgull gromgull merged commit 6cb0dab into RDFLib:master May 14, 2018
@gromgull
Copy link
Member

Thanks! (and sorry about the delay)

@joernhees joernhees added enhancement New feature or request parsing Related to a parsing. performance labels Oct 27, 2018
@joernhees joernhees added this to the rdflib 5.0.0 milestone Oct 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request parsing Related to a parsing. performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants