Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong price in JSON-LD (ld+json) data #55

Closed
3id0 opened this issue Feb 25, 2021 · 1 comment · Fixed by #56
Closed

Wrong price in JSON-LD (ld+json) data #55

3id0 opened this issue Feb 25, 2021 · 1 comment · Fixed by #56
Labels
bug Something isn't working
Milestone

Comments

@3id0
Copy link

3id0 commented Feb 25, 2021

Hey, so I wanted to use Price Tracker with an Unreal Engine Marketplace URL but the discount price is ignored in favor of the (higher) base price which makes the tracker pretty much useless for this domain.
So I thought "OK I'll just add unrealengine.com with XPath to parser_configuration.json and submit a pull request but ParserXPath actually never gets called in scraper.dart because there's a JSON-LD object in the page so ParserSD is called instead of ParserXPath.

The problem is: the price attribute in the JSON-LD object on unrealengine.com/marketplace is for the base price (which rarely changes), not the discount price (which changes a lot).

My question: What would be your advice if I want to add unrealengine.com to the parser_configuration.json + ignore the JSON-LD for this domain + submit a pull request?

@3id0 3id0 added the bug Something isn't working label Feb 25, 2021
@lucafluri
Copy link
Owner

lucafluri commented Feb 26, 2021

Hi thanks for the bug report @3id0.
The scraper favors all structured data over given xpaths. This problem never occured before and it is puzzling me why unity isn't updating their JSON-LD when they have it available.

To fix this case, the priority of the parser has to be changed. The problem now is that several xpaths exist in the configuration and are used as a fallback alternative and would require a lot of manual testing.

As a temporary fix I would suggest adding a boolean property called "favorXPath" or something to the configuration JSON for each domain and checking this property before using the structured data parser (in scraper.dart).

The parser priority list is then as follows:

  1. XPath/Selector if available and favored in configuration
  2. Default Structured Data parser
  3. Fallback XPath/Selector

This should work without having to test all domains in the configuration manually.

@lucafluri lucafluri added this to the v1.0.0 milestone Feb 28, 2021
lucafluri pushed a commit that referenced this issue Feb 28, 2021
Add new boolean property "favorXPath" in parser_configuration.json

In some cases, a page's content may include JSON-LD (sdJSON) but
the data (e.g. price, name) is outdated/wrong compared to what can
be scraped with ParserXPath (this is the case with products pages
on unrealengine.com/marketplace for example).
In this case, the property "favorXPath" can be set to "true" for
a specific domain in order to ignore the problematic JSON-LD data.

Closes #55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants