Skip to content

v2.0.0

Latest
Compare
Choose a tag to compare
@ruippeixotog ruippeixotog released this 21 Jul 21:40
· 509 commits to master since this release
  • Breaking changes

    • Extracting using a CSS query string as extractor will now extract elements instead of text. This allows easier
      chaining of extractors and CSS selectors and fits more nicely the current extractor model. The old behavior can be
      recovered by wrapping the CSS query string in the texts content extractor, e.g. doc >> texts("myQuery");
    • HtmlExtractor, HtmlValidator and ElementQuery now have an additional type parameter for the type of Element
      they work on. If you have custom instances of one of those classes, filling the missing parameter with Element
      (which is a superclass of all elements) should be enough for them to work with all source code using
      scala-scraper 1.x;
    • Methods for loading extractors and validators from a config were extracted to a separate module. In order to use
      them users must add scala-scraper-config to their SBT dependencies and import
      net.ruippeixotog.scalascraper.config.dsl.DSL._;
    • The implicit conversion of Validated/Either to a RightProjection in order to expose foreach, map and
      flatMap in for comprehensions was moved to a separate object that is not imported together with the DSL. Either
      upgrade to Scala 2.12 (in which Either is already right-biased) or import the new
      net.ruippeixotog.scalascraper.util.EitherRightBias support object;
  • Deprecations

    • SimpleExtractor and SimpleValidator are now deprecated. The classes remain available for the time being, but DSL
      methods that returned those classes now return only HtmlExtractor and HtmlValidator instances;
    • The Validated type alias is now deprecated. Users should now use Either, Right and Left directly;
    • The asDate content parser was deprecated in favor of asLocalDate and asDateTime;
    • The DSL validation operator ~/~ was renamed to >/~ in order to have the same precedence as the extraction
      operators >> and >?>;
    • The and DSL operator is deprecated and will be removed in future versions;
  • New features

    • The concrete type of the models in scala-scraper is now passed down from the Browser to Element instances
      extracted from documents. This allows users to use features unique of each browser (such as modifying or interacting
      with elements) while still using the scala-scraper DSL to exteact and query them;
    • HtmlExtractor[E, A] is now a proper instance of ElementQuery[E] => A and have map and mapQuery methods to
      map the extraction results and the preceding query, respectively;
    • Content extractors, which were previously just functions, are now full-fledged HtmlExtractor instances and can be
      used by themselves, e.g. doc >> elements, doc >> elementList("myQuery") >> formData;
    • A new PolyHtmlExtractor class was created, allowing the implementation of extractors whose return type depends on
      the type of the element or document being extracted;
    • Overall code cleanup and simplification of some concepts.