Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support JMESPath now #181

Merged
merged 126 commits into from
Apr 11, 2023
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
126 commits
Select commit Hold shift + click to select a range
63af8c5
Support jpath now
EchoShoot Jan 2, 2020
ddc8e73
Update test_selector_jpath.py
EchoShoot Jan 2, 2020
17f14d5
Update selector.py
EchoShoot Jan 2, 2020
4590544
Update test_selector_jpath.py
EchoShoot Jan 2, 2020
6151e3b
Update test_selector_jpath.py
EchoShoot Jan 2, 2020
546fcc9
Update test_selector_jpath.py
EchoShoot Jan 2, 2020
0a70390
Update selector.py
EchoShoot Jan 2, 2020
85c7b58
rename
EchoShoot Jan 3, 2020
3846ea7
rename
EchoShoot Jan 3, 2020
2741727
Merge branch 'master' into master
Gallaecio Jun 2, 2020
9235304
Update parsel/selector.py
Gallaecio Jun 2, 2020
03289a6
Restore import separation line
Gallaecio Jun 2, 2020
ecf37b8
Improve the API documentation of Selector.jmespath
Gallaecio Jun 2, 2020
c731c35
Remove pointless exception handling
Gallaecio Jun 2, 2020
24cf915
Do not ignore jmespath-found None values when possible
Gallaecio Jun 2, 2020
9508f0e
Improve jmespath documentation
Gallaecio Jun 2, 2020
fe44a3d
Revert unrelated setup.py changes
Gallaecio Jun 2, 2020
d84e957
Remove unorthodox attribution text
Gallaecio Jun 2, 2020
5a57a3c
Style changes to tests
Gallaecio Jun 2, 2020
8ad0330
datas → data
Gallaecio Jun 2, 2020
b01f5a1
Simplify jmespath implementation
Gallaecio Jun 2, 2020
ca3bc23
Fix or silence Pylint issues
Gallaecio Jun 2, 2020
15c9118
Fix test expectation
Gallaecio Jun 2, 2020
344d4df
Refactor jmespath support
Gallaecio Jun 3, 2020
fa7c32e
Simplify Selector.__init__
Gallaecio Jun 3, 2020
7fe4d5e
Return None in case of invalid JSON
Gallaecio Jun 17, 2020
3f9b779
Complete test coverage
Gallaecio Jun 17, 2020
f7cd122
Fix backward compatibility
Gallaecio Jun 17, 2020
58f5b77
Fix Python 2 support in new tests
Gallaecio Jun 17, 2020
a01ebfb
Fix the documentation build
Gallaecio Jun 17, 2020
4e3ce0f
Complete test coverage
Gallaecio Jun 17, 2020
aba3c40
Fix tests in Python 2, again
Gallaecio Jun 17, 2020
ed9e2b5
Do not set a minimum jmespath version
Gallaecio Jun 18, 2020
9cceac2
Merge branch 'master' into master
Gallaecio Mar 20, 2021
bb4f994
Merge remote-tracking branch 'upstream/master' into jmespath
Gallaecio Mar 14, 2022
d0c98b7
Apply Black
Gallaecio Mar 14, 2022
50e18e7
Remove six usage
Gallaecio Mar 14, 2022
ccf6b63
Apply Black
Gallaecio Mar 14, 2022
8a10f23
format → f-string
Gallaecio Mar 14, 2022
9e22492
Apply Black
Gallaecio Mar 14, 2022
2da08ea
Add tests for jmespath functions
felipeboffnunes Jun 14, 2022
5f03cea
Merge branch 'master' into master
felipeboffnunes Jun 14, 2022
fa25f75
Merge branch 'master' into master
felipeboffnunes Jun 14, 2022
06ffdc3
Test for jmespath functions.
felipeboffnunes Jun 14, 2022
f854cc0
Docs for JMESPath Selector
felipeboffnunes Jun 17, 2022
ca14d63
end of line
felipeboffnunes Jun 17, 2022
2465f4e
black
felipeboffnunes Jun 17, 2022
918089c
instantiate jmespath_selector on usage doc
felipeboffnunes Jun 17, 2022
f0d42d3
instantiate jmespath_selector on usage doc
felipeboffnunes Jun 17, 2022
72e7b3c
black, again
felipeboffnunes Jun 17, 2022
51042ca
remove jmes function test
felipeboffnunes Jun 17, 2022
f6a316c
bring jmespath closer to other selectors
felipeboffnunes Jun 17, 2022
5954e79
black
felipeboffnunes Jun 17, 2022
f6e59ce
list format for jmespath selector example
felipeboffnunes Jun 17, 2022
c1c98f0
small adjust on wording for jmespath introduction
felipeboffnunes Jun 17, 2022
b374384
Readme example
felipeboffnunes Jun 20, 2022
6b60a8b
Readme separated examples
felipeboffnunes Jun 20, 2022
98daa08
usage adjusts from feedback
felipeboffnunes Jun 20, 2022
84751de
missing ! char
felipeboffnunes Jun 20, 2022
ef5a6ea
adjustments
felipeboffnunes Jun 20, 2022
475bdf5
adjustments
felipeboffnunes Jun 20, 2022
33e8d38
adjustments
felipeboffnunes Jun 20, 2022
2e3b633
adjustments
felipeboffnunes Jun 20, 2022
654fb26
adjustments
felipeboffnunes Jun 20, 2022
6226397
adjustments
felipeboffnunes Jun 20, 2022
6780a06
Update docs/usage.rst
felipeboffnunes Jun 20, 2022
e1c6838
Update docs/usage.rst
felipeboffnunes Jun 20, 2022
aa803c7
Update docs/usage.rst
felipeboffnunes Jun 20, 2022
2efc71d
adjustments from revisions
felipeboffnunes Jun 20, 2022
3b74896
Merge remote-tracking branch 'felipeboffnunes/jmespath' into jmespath
felipeboffnunes Jun 20, 2022
b6f1e3a
change type on __str__ to `query` for all cases
felipeboffnunes Jun 25, 2022
5034df0
adjusting usage.rst tests to reflect changes
felipeboffnunes Jun 25, 2022
835458c
black selector.py
felipeboffnunes Jun 25, 2022
6b1729b
black selector.py
felipeboffnunes Jun 25, 2022
54065be
missing adjustments
felipeboffnunes Jun 25, 2022
f6f3bf9
removing logic for previous solution
felipeboffnunes Jun 25, 2022
497ecba
black selector.py
felipeboffnunes Jun 25, 2022
045ad12
test_selector update xpath to query
felipeboffnunes Jun 25, 2022
3c81e3f
black selector.py
felipeboffnunes Jun 25, 2022
b424483
go back to original usage.rst get elements examples
felipeboffnunes Jun 25, 2022
276cf9f
black selector.py???
felipeboffnunes Jun 25, 2022
8fe06ce
html_selector
felipeboffnunes Jun 25, 2022
ccc7abb
Readme feedback
felipeboffnunes Jun 25, 2022
a86ca2b
Cover JMESPath support in the documentation
felipeboffnunes Jun 27, 2022
be8dbb3
Merge remote-tracking branch 'upstream/master'
Gallaecio Jun 27, 2022
fa2085e
Remove deprecated-method from pylintsrc
felipeboffnunes Jul 1, 2022
a68fecf
Simpler approach to examples for extracting with selectors
felipeboffnunes Jul 1, 2022
05cec44
Removing type arg for jmespath
felipeboffnunes Jul 1, 2022
b53e71f
JSON encoding jmespath selector results + unit test
felipeboffnunes Jul 1, 2022
4349ce4
Merge branch 'echoshoot-master' into jmespath
felipeboffnunes Jul 1, 2022
14d17fd
merge
felipeboffnunes Jul 1, 2022
470a31b
Merge branches 'master' and 'master' of https://github.com/EchoShoot/…
felipeboffnunes Jul 1, 2022
5f98857
Merge branch 'echoshoot-master' into jmespath
felipeboffnunes Jul 1, 2022
305306f
adjust json encoding to reg with type json
felipeboffnunes Jul 1, 2022
01a4fb2
black
felipeboffnunes Jul 1, 2022
1c519d9
prepending r for re
felipeboffnunes Jul 1, 2022
3c00325
typo
felipeboffnunes Jul 1, 2022
cf8a693
black: shippuden
felipeboffnunes Jul 1, 2022
ecce282
raise ValueError if data from selector is not str when using re
felipeboffnunes Jul 20, 2022
58dedc2
tests for raiseValue and to_string
felipeboffnunes Jul 20, 2022
8895fd7
remove if treatment
felipeboffnunes Aug 4, 2022
f4d19de
Adjust test
felipeboffnunes Aug 4, 2022
599cc79
test for None on data
felipeboffnunes Aug 4, 2022
c75723b
black
felipeboffnunes Aug 4, 2022
39cdad3
Remove lazy loading... I guess?
felipeboffnunes Sep 12, 2022
69a8517
merge isinstance
felipeboffnunes Sep 12, 2022
d000b54
Merge remote-tracking branch 'scrapy/master'
Gallaecio Oct 27, 2022
7536aeb
README: add missing link
Gallaecio Oct 28, 2022
321a3d1
README: simplify the code example
Gallaecio Oct 28, 2022
bddbc6d
Usage: mention cssselect CSS support
Gallaecio Oct 28, 2022
d20b4ba
JMESPATH → JMESPath
Gallaecio Oct 28, 2022
e9d5ada
Update the Selector docstring
Gallaecio Oct 28, 2022
812b2bf
Selector.css: do not murate state
Gallaecio Oct 28, 2022
c7f53d4
Address coverage warning about mixing --include and --source
Gallaecio Oct 28, 2022
746a0da
Warn about passing both text and root
Gallaecio Oct 28, 2022
ff8e946
Reorganize Selector.__init__
Gallaecio Oct 28, 2022
bd0ad80
Raise ValueError if root=etree and type in {'json', 'text'}
Gallaecio Oct 28, 2022
0fcff9a
Make the start of the jmespath() implementation more readable
Gallaecio Oct 28, 2022
10b78a8
Address issues reported by CI
Gallaecio Oct 28, 2022
dd0a4c9
Optimize _load_json_or_none usage
Gallaecio Oct 28, 2022
c69e0c6
Support text="null" as JSON input
Gallaecio Oct 28, 2022
64569a7
Fix scenario of invalid JSON
Gallaecio Oct 28, 2022
7232241
Solve issues reported by CI
Gallaecio Oct 28, 2022
3b3ec90
Merge remote-tracking branch 'scrapy/master' into jmespath
Gallaecio Apr 10, 2023
9f18a37
Merge remote-tracking branch 'scrapy/master' into jmespath
Gallaecio Apr 10, 2023
53d5146
Add a missing period
Gallaecio Apr 11, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,8 @@

# nitpicky = True # https://github.com/scrapy/cssselect/pull/110
nitpick_ignore = [
('py:class', 'ExpressionError'),
('py:class', 'SelectorSyntaxError'),
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved
('py:class', 'cssselect.xpath.GenericTranslator'),
('py:class', 'cssselect.xpath.HTMLTranslator'),
('py:class', 'cssselect.xpath.XPathExpr'),
Expand Down
171 changes: 134 additions & 37 deletions parsel/selector.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
"""
XPath selectors based on lxml
XPath and JMESPath selectors based on lxml and jmespath
"""

import json
import sys

import jmespath
import six
from lxml import etree, html

Expand Down Expand Up @@ -35,15 +37,6 @@ def __init__(self, *args, **kwargs):
}


def _st(st):
if st is None:
return 'html'
elif st in _ctgroup:
return st
else:
raise ValueError('Invalid type: %s' % st)


def create_root_node(text, parser_cls, base_url=None):
"""Create root node for text using given parser class.
"""
Expand Down Expand Up @@ -73,12 +66,26 @@ def __getitem__(self, pos):
def __getstate__(self):
raise TypeError("can't pickle SelectorList objects")

def jmespath(self, query, **kwargs):
"""
Call the ``.jmespath()`` method for each element in this list and return
their results flattened as another :class:`SelectorList`.

``query`` is the same argument as the one in :meth:`Selector.jmespath`
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved

Any additional named arguments are passed to the underlying
``jmespath.search`` call, e.g.::

selector.jmespath('author.name', options=jmespath.Options(dict_cls=collections.OrderedDict))
"""
return self.__class__(flatten([x.jmespath(query, **kwargs) for x in self]))

Gallaecio marked this conversation as resolved.
Show resolved Hide resolved
def xpath(self, xpath, namespaces=None, **kwargs):
"""
Call the ``.xpath()`` method for each element in this list and return
their results flattened as another :class:`SelectorList`.

``query`` is the same argument as the one in :meth:`Selector.xpath`
``xpath`` is the same argument as the one in :meth:`Selector.xpath`

``namespaces`` is an optional ``prefix: namespace-uri`` mapping (dict)
for additional prefixes to those registered with ``register_namespace(prefix, uri)``.
Expand Down Expand Up @@ -135,6 +142,7 @@ def getall(self):
their results flattened, as a list of unicode strings.
"""
return [x.get() for x in self]

extract = getall

def get(self, default=None):
Expand All @@ -145,6 +153,7 @@ def get(self, default=None):
for x in self:
return x.get()
return default

extract_first = get

@property
Expand All @@ -164,24 +173,32 @@ def remove(self):
x.remove()


_NOTSET = object()


def _load_json_or_none(text):
try:
return json.loads(text)
except ValueError:
return None


class Selector(object):
"""
:class:`Selector` allows you to select parts of an XML or HTML text using CSS
or XPath expressions and extract data from it.
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved

``text`` is a ``unicode`` object in Python 2 or a ``str`` object in Python 3

``type`` defines the selector type, it can be ``"html"``, ``"xml"`` or ``None`` (default).
If ``type`` is ``None``, the selector defaults to ``"html"``.
``type`` defines the selector type. It can be ``"html"`` (default),
``"json"``, or ``"xml"``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"text" is also supported; is it intentional that we don't document it? I also wonder if there is a better terminology available than "selector type", though no suggestions :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is on purpose. "text" is meant to be an interim value until a more specific value can be determined. If you are setting the type parameter to begin with, you should set it to the right, more-specific value or not set it at all. Maybe we could let self.type be None instead of "text", I don‘t recall if there was a technical reason to do otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't get the benefit of the type as a user argument. The selector is a wrapper that has its logic figured out by which methods it uses and which data it finds per query. Allowing the user to set the type (besides purely for backward compatibility) seems to only allow it to break if this config was done poorly by the user. Without it, the Selector will either work or not depending purely on the methods used (unless I am missing out on something).

After saying that, I would say I don't find the "text" type worth documenting and if @Gallaecio idea of just letting it be None has no pending technical issues, then we should remove "text" altogether.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After fiddling a bit with the code again, I believe "text" is needed, because None needs to imply one of "html" or "xml" for backward compatibility. But I still would not document it for the reasons I explained in my earlier comment, i.e. is not meant for users to use.


``base_url`` allows setting a URL for the document. This is needed when looking up external entities with relative paths.
See [`lxml` documentation](https://lxml.de/api/index.html) ``lxml.etree.fromstring`` for more information.
"""

__slots__ = ['text', 'namespaces', 'type', '_expr', 'root',
'__weakref__', '_parser', '_csstranslator', '_tostring_method']
__slots__ = ['namespaces', 'type', '_expr', 'root', '_text', '__weakref__']

_default_type = None
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved
_default_namespaces = {
"re": "http://exslt.org/regular-expressions",

Expand All @@ -196,33 +213,95 @@ class Selector(object):
_lxml_smart_strings = False
selectorlist_cls = SelectorList

def __init__(self, text=None, type=None, namespaces=None, root=None,
def __init__(self, text=None, type=None, namespaces=None, root=_NOTSET,
base_url=None, _expr=None):
self.type = st = _st(type or self._default_type)
self._parser = _ctgroup[st]['_parser']
self._csstranslator = _ctgroup[st]['_csstranslator']
self._tostring_method = _ctgroup[st]['_tostring_method']
if type not in ('html', 'json', 'text', 'xml', None):
raise ValueError('Invalid type: %s' % type)

if text is not None:
if not isinstance(text, six.text_type):
msg = "text argument should be of type %s, got %s" % (
six.text_type, text.__class__)
raise TypeError(msg)
root = self._get_root(text, base_url)
elif root is None:
self._text = text
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved

if text is None and root is _NOTSET:
raise ValueError("Selector needs either text or root argument")

if text is not None and not isinstance(text, six.text_type):
msg = "text argument should be of type %s, got %s" % (
six.text_type, text.__class__)
raise TypeError(msg)

if text is not None:
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved
if type in ('html', 'xml', None):
self._load_lxml_root(text, type=type or 'html', base_url=base_url)
elif type == 'json':
self.root = _load_json_or_none(text)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we preserve base_url somewhere, so that if html is loaded from jmespath, base_url is applied?

TBH, I'm not exactly sure why base_url is needed, but if it's needed, it'd be better to have it consistent.

Copy link
Member

@kmike kmike Aug 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with not fixing it in this PR though.

Copy link
Member

@Gallaecio Gallaecio Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with not fixing it in this PR though.

Yes, since it looks like a problem (if it is one) in the original code as well, it may be best to address it separately.

Also, from the lxml docs:

The base_url keyword argument allows to set the original base URL of the document to support relative Paths when looking up external entities (DTD, XInclude, ...).

self.type = type
else:
self.root = text
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved
self.type = type
else:
self.root = root
if type is None and isinstance(self.root, etree._Element):
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved
type = 'html'
self.type = type or 'json'

self._expr = _expr
self.namespaces = dict(self._default_namespaces)
if namespaces is not None:
self.namespaces.update(namespaces)
self.root = root
self._expr = _expr

def _load_lxml_root(self, text, type, base_url=None):
self.type = type
self.root = self._get_root(text, base_url)

def __getstate__(self):
raise TypeError("can't pickle Selector objects")

def _get_root(self, text, base_url=None):
return create_root_node(text, self._parser, base_url=base_url)
return create_root_node(
text,
_ctgroup[self.type]['_parser'],
base_url=base_url,
)

def jmespath(self, query, type=None, **kwargs):
"""
Find objects matching the JMESPath ``query`` and return the result as a
:class:`SelectorList` instance with all elements flattened. List
elements implement :class:`Selector` interface too.

``query`` is a string containing the `JMESPath
<https://jmespath.org/>`_ query to apply.

``type`` is a string that allows the same values as the matching
argument of the ``__init__`` method. If not specified, it defaults to
``"json"``.
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved

Any additional named arguments are passed to the underlying
``jmespath.search`` call, e.g.::

selector.jmespath('author.name', options=jmespath.Options(dict_cls=collections.OrderedDict))
"""
if self.type == 'json':
data = self.root
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about renaming type from json to something like data? json is an encoding/decoding format; I expected self.root to be a json-encoded string after seeing type="json". But actually json means that self.root contains some arbitraty Python object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it would be a bit like renaming "html" and "xml" to "etree". self.type does not represent the format after parsing, but the format before parsing, even if the input data is provided pre-parsed. And arbitrary data is not supported either, if your data has non-JSON structures chances are you are getting some error at some point, e.g. when using JMESPath.

elif isinstance(self.root, six.string_types):
data = _load_json_or_none(self.root)
elif self.root.text is None:
data = _load_json_or_none(self._text)
else:
data = _load_json_or_none(self.root.text)
result = jmespath.search(query, data, **kwargs)
if result is None:
result = []
elif not isinstance(result, list):
result = [result]

def make_selector(x): # closure function
if isinstance(x, six.text_type):
return self.__class__(text=x, _expr=query, type=type or 'text')
else:
return self.__class__(root=x, _expr=query, type=type)

result = [make_selector(x) for x in result]
return self.selectorlist_cls(result)

Gallaecio marked this conversation as resolved.
Show resolved Hide resolved
def xpath(self, query, namespaces=None, **kwargs):
"""
Expand All @@ -242,6 +321,11 @@ def xpath(self, query, namespaces=None, **kwargs):

selector.xpath('//a[href=$url]', url="http://www.example.com")
"""
if self.type == 'text':
self._load_lxml_root(self.root, type='html')
elif self.type not in ('html', 'xml'):
raise ValueError('Cannot use xpath on a Selector of type {}'
.format(repr(self.type)))
try:
xpathev = self.root.xpath
except AttributeError:
Expand Down Expand Up @@ -279,10 +363,15 @@ def css(self, query):

.. _cssselect: https://pypi.python.org/pypi/cssselect/
"""
if self.type == 'text':
self._load_lxml_root(self.root, type='html')
elif self.type not in ('html', 'xml'):
raise ValueError('Cannot use css on a Selector of type {}'
.format(repr(self.type)))
return self.xpath(self._css2xpath(query))

def _css2xpath(self, query):
return self._csstranslator.css_to_xpath(query)
return _ctgroup[self.type]['_csstranslator'].css_to_xpath(query)

def re(self, regex, replace_entities=True):
"""
Expand Down Expand Up @@ -317,18 +406,23 @@ def get(self):
Serialize and return the matched nodes in a single unicode string.
Percent encoded content is unquoted.
"""
if self.type in ('text', 'json'):
return self.root
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if code would be simpler if we remove type="text", and use type="json" for these cases as well. if self.type == "text" would be replaced with self.type == "json" and insinstance(self.root, str). Not 100% sure about it, but there are signs that we can do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try:
return etree.tostring(self.root,
method=self._tostring_method,
encoding='unicode',
with_tail=False)
return etree.tostring(
self.root,
method=_ctgroup[self.type]['_tostring_method'],
encoding='unicode',
with_tail=False,
)
except (AttributeError, TypeError):
if self.root is True:
return u'1'
elif self.root is False:
return u'0'
else:
return six.text_type(self.root)

extract = get

def getall(self):
Expand Down Expand Up @@ -397,9 +491,12 @@ def __bool__(self):
given by the contents it selects.
"""
return bool(self.get())

__nonzero__ = __bool__

def __str__(self):
data = repr(shorten(self.get(), width=40))
return "<%s xpath=%r data=%s>" % (type(self).__name__, self._expr, data)
expr_field = 'jmespath' if self.type == 'json' else 'xpath'
return "<%s %s=%r data=%s>" % (type(self).__name__, expr_field, self._expr, data)

__repr__ = __str__
4 changes: 2 additions & 2 deletions pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ persistent=no
[MESSAGES CONTROL]
disable=bad-continuation,
c-extension-no-member,
deprecated-method,
deprecated-method, # Required for Python 2 support
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved
fixme,
import-error,
import-outside-toplevel,
Expand All @@ -27,4 +27,4 @@ disable=bad-continuation,
unused-argument,
useless-object-inheritance, # Required for Python 2 support
wrong-import-order,
wrong-import-position
wrong-import-position,
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved
1 change: 1 addition & 0 deletions pytest.ini
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,6 @@ flake8-ignore =
parsel/xpathfuncs.py E501
tests/test_selector.py E501
tests/test_selector_csstranslator.py E501
tests/test_selector_jmespath.py E501
tests/test_utils.py E501
tests/test_xpathfuncs.py E501
3 changes: 2 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ def has_environment_marker_platform_impl_support():
'w3lib>=1.19.0',
'lxml',
'six>=1.6.0',
'cssselect>=0.9'
'cssselect>=0.9',
'jmespath',
]
extras_require = {}

Expand Down
47 changes: 47 additions & 0 deletions tests/test_selector.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
import unittest
import pickle

import lxml.etree

from parsel import Selector
from parsel.selector import (
CannotRemoveElementWithoutRoot,
Expand Down Expand Up @@ -814,6 +816,51 @@ def test_remove_root_element_selector(self):
sel.css('body').remove()
self.assertEqual(sel.get(), '<html></html>')

def test_invalid_type(self):
with self.assertRaises(ValueError):
self.sscls(u'', type='xhtml')

def test_default_type(self):
text = u'foo'
selector = self.sscls(text)
self.assertEqual(selector.type, 'html')

def test_json_type(self):
obj = 1
selector = self.sscls(six.text_type(obj), type='json')
self.assertEqual(selector.root, obj)
self.assertEqual(selector.type, 'json')

def test_html_root(self):
root = lxml.etree.fromstring('<html/>')
selector = self.sscls(root=root)
self.assertEqual(selector.root, root)
self.assertEqual(selector.type, 'html')

def test_json_root(self):
obj = 1
selector = self.sscls(root=obj)
self.assertEqual(selector.root, obj)
self.assertEqual(selector.type, 'json')

def test_json_xpath(self):
obj = 1
selector = self.sscls(root=obj)
with self.assertRaises(ValueError):
selector.xpath('//*')

def test_json_css(self):
obj = 1
selector = self.sscls(root=obj)
with self.assertRaises(ValueError):
selector.css('*')

def test_invalid_json(self):
text = u'<html/>'
selector = self.sscls(text, type='json')
self.assertEqual(selector.root, None)
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved
self.assertEqual(selector.type, 'json')


class ExsltTestCase(unittest.TestCase):

Expand Down
Loading