Release v2.0.0 · ruippeixotog/scala-scraper

Breaking changes
- Extracting using a CSS query string as extractor will now extract elements instead of text. This allows easier
  chaining of extractors and CSS selectors and fits more nicely the current extractor model. The old behavior can be
  recovered by wrapping the CSS query string in the texts content extractor, e.g. doc >> texts("myQuery");
- HtmlExtractor, HtmlValidator and ElementQuery now have an additional type parameter for the type of Element
  they work on. If you have custom instances of one of those classes, filling the missing parameter with Element
  (which is a superclass of all elements) should be enough for them to work with all source code using
  scala-scraper 1.x;
- Methods for loading extractors and validators from a config were extracted to a separate module. In order to use
  them users must add scala-scraper-config to their SBT dependencies and import
  net.ruippeixotog.scalascraper.config.dsl.DSL._;
- The implicit conversion of Validated/Either to a RightProjection in order to expose foreach, map and
  flatMap in for comprehensions was moved to a separate object that is not imported together with the DSL. Either
  upgrade to Scala 2.12 (in which Either is already right-biased) or import the new
  net.ruippeixotog.scalascraper.util.EitherRightBias support object;
Deprecations
- SimpleExtractor and SimpleValidator are now deprecated. The classes remain available for the time being, but DSL
  methods that returned those classes now return only HtmlExtractor and HtmlValidator instances;
- The Validated type alias is now deprecated. Users should now use Either, Right and Left directly;
- The asDate content parser was deprecated in favor of asLocalDate and asDateTime;
- The DSL validation operator ~/~ was renamed to >/~ in order to have the same precedence as the extraction
  operators >> and >?>;
- The and DSL operator is deprecated and will be removed in future versions;
New features
- The concrete type of the models in scala-scraper is now passed down from the Browser to Element instances
  extracted from documents. This allows users to use features unique of each browser (such as modifying or interacting
  with elements) while still using the scala-scraper DSL to exteact and query them;
- HtmlExtractor[E, A] is now a proper instance of ElementQuery[E] => A and have map and mapQuery methods to
  map the extraction results and the preceding query, respectively;
- Content extractors, which were previously just functions, are now full-fledged HtmlExtractor instances and can be
  used by themselves, e.g. doc >> elements, doc >> elementList("myQuery") >> formData;
- A new PolyHtmlExtractor class was created, allowing the implementation of extractors whose return type depends on
  the type of the element or document being extracted;
- Overall code cleanup and simplification of some concepts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.0.0