Skip to content

Commit

Permalink
(www/R-rvest) Updated 0.3.5 to 1.0.1
Browse files Browse the repository at this point in the history
# rvest 1.0.1

* `html_table()` correctly handles tables with cells that contain blank values
  for `rowspan` and/or `colspan`, so that e.g. `<td rowspan="">` is parsed as
  `<td rowspan=1>` (@epiben, #323).

* Fix broken example

# rvest 1.0.0

## New features

* New `html_text2()` provides a more natural rendering of HTML nodes into text,
  converting `<br>` into "\n", and removing non-significant whitespace (#175).
  By default, it also converts `&nbsp;` into regular spaces, which you can
  suppress with `preserve_nbsp = TRUE` (#284).

* `html_table()` has been re-written from scratch to more closely mimic the
  algorithm that browsers use for parsing tables. This should mean that there
  are far fewer tables for which it fails to produce some output (#63, #204,
  #215). The `fill` argument has been deprecated since it is no longer needed.
  `html_table()` now returns a tibble rather than a data frame to be compatible
  with the rest of the tidyverse (#199). Its performance has been considerably
  improved (#237). It also gains a `na.strings` argument to control what values
  are converted to `NA` (#107), and a `convert` argument to control whether to
  run the conversion (#311).

* New `html_form_submit()` allows you to submit a form directly, without
  needing to create a session (#300).

* rvest is now licensed as MIT (#287).

## API changes

Since this is the 1.0.0 release, I included a large number of API changes to make rvest more compatible with current tidyverse conventions. Older functions have been deprecated, so existing code will continue to work (albeit with a few new warnings).

* rvest now imports xml2 rather than depending on it. This is cleaner because
  it avoids attaching all the xml2 functions that you're less likely to use.
  To reduce the change of breakages, rvest re-exports xml2 functions
  `read_html()` and `url_absolute()`, but your code may now need an explicit
  `library(xml2)`.

* `html_form()` now returns an object with class `rvest_form` (instead of form).
   Fields within a form now have class `rvest_field`, instead of a
  variety of classes that were lacking the `rvest_` prefix. All functions for
  working with forms have a common `html_form_` prefix: `set_values()` became
  `html_form_set()`. `submit_form()` was renamed to `session_submit()` because
  it returns a session.

* `html_node()` and `html_nodes()` have been superseded in favor of
  `html_element()`  and `html_elements()` since they (almost) always return
  elements, not nodes (#298).

* `html_session()` is now `session()` and returns an object of class
  `rvest_session` (instead of `session`). All functions that work with session
  objects now have a common `session_` prefix.

* Long deprecated `html()`, `html_tag()`, `xml()` functions have been removed.

* `minimal_html()` (which doesn't appear to be used by any other package)
  has had its arguments flipped to make it more intuitive.

* `guess_encoding()` has been renamed to `html_encoding_guess()` to avoid
  a clash with `stringr::guess_encoding()` (#209). `repair_encoding()` has
  been deprecated because it doesn't appear to work.

* `pluck()` is no longer exported to avoid a clash with `purrr::pluck()`;
  if you need it use `purrr::map_chr()` and friends instead (#209).

* `xml_tag()`, `xml_node()`, and `xml_nodes()` have been formally deprecated
  in favor of their `html_` equivalents.

## Minor improvements and bug fixes

* The "harvesting the web" vignette has been rewritten to focus more on basics
  rvest, eliminating the screenshots to keep the installed package as svelte as
  possible. It's also been renamed to `vignette("rvest")` since it's the
  vignette that you should read first.

* The SelectorGadget vignette is now a web-only article,
  <https://rvest.tidyverse.org/articles/articles/selectorgadget.html>,
  so we can be more generous with screenshots since they're no longer bundled
  with every install of the package. Together with the rewrite of the other
  vignette, this means that rvest is now ~90 Kb instead of ~1.1 Mb.

* All uses of IMDB have been eliminated since the site explicitly prohibits
  scraping (#195).

* `session_submit()` errors if `form` doesn't have a `url` (#288).

* New `session_forward()` function to complement `session_back()`.
  It now allows you to pick the submission button by position (#156).
  The `...` argument is deprecated; please use `config` instead.

* `html_form_set()` can now accept character vectors allowing you to select
  multiple checkboxes in a set or select multiple values from a multi-`<select>`
  (#127, with help from @juba). It also uses dynamic dots so that you can use
  `!!!` if you have a list of values (#189).

# rvest 0.3.6

* Remove failing example
  • Loading branch information
mef committed Aug 24, 2021
1 parent af71c6b commit 21a5006
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 7 deletions.
7 changes: 5 additions & 2 deletions www/R-rvest/Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# $NetBSD: Makefile,v 1.1 2020/08/07 01:57:55 brook Exp $
# $NetBSD: Makefile,v 1.2 2021/08/24 11:27:50 mef Exp $

R_PKGNAME= rvest
R_PKGVER= 0.3.5
R_PKGVER= 1.0.1
CATEGORIES= www

MAINTAINER= pkgsrc-users@NetBSD.org
Expand All @@ -13,6 +13,9 @@ DEPENDS+= R-selectr>=0.4.2:../../textproc/R-selectr
DEPENDS+= R-xml2>=1.2.2nb1:../../textproc/R-xml2
DEPENDS+= R-httr>=0.5:../../www/R-httr

# Packages suggested but not available: 'repurrrsive', 'webfakes'
TEST_DEPENDS+= R-readr-[0-9]*:../../textproc/R-readr

USE_LANGUAGES= # none

.include "../../math/R/Makefile.extension"
Expand Down
10 changes: 5 additions & 5 deletions www/R-rvest/distinfo
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
$NetBSD: distinfo,v 1.1 2020/08/07 01:57:55 brook Exp $
$NetBSD: distinfo,v 1.2 2021/08/24 11:27:50 mef Exp $

SHA1 (R/rvest_0.3.5.tar.gz) = ceb863b5a8dbb8192b8d2743ca53dd174709b1e4
RMD160 (R/rvest_0.3.5.tar.gz) = 4164ffb4abcadeaeb502151bcf9a89232ff60bee
SHA512 (R/rvest_0.3.5.tar.gz) = 0e33cbe7287e44c8516acd2dbeaa72ecf63fb1c0253b3b3af7f6ddc3f126894e5f2a1b70e0a03b6f15981c69665bd554bd2ade634a8c2ce16dcac61b3161086b
Size (R/rvest_0.3.5.tar.gz) = 1129355 bytes
SHA1 (R/rvest_1.0.1.tar.gz) = e5ff023885de0e699048f967f0530782a11b92e4
RMD160 (R/rvest_1.0.1.tar.gz) = ff90dc8a32abf530a1e187e1c06791ea009053a7
SHA512 (R/rvest_1.0.1.tar.gz) = 246cba48246a3d0697ed673e37afdca5d063ee56b382fe33083dd587c16035cbb37ae93f54863954dcdcdc3596e103553f0986aaeab21bb37d33346534c3fda3
Size (R/rvest_1.0.1.tar.gz) = 94307 bytes

0 comments on commit 21a5006

Please sign in to comment.