You can see below the API reference of this module.
A scraping module for humans.
- String|Object
url
: The page url or request options. - Object
opts
: The options passed toscrapeHTML
method. - Function
cb
: The callback function.
- Promise A promise object resolving with:
data
(Object): The scraped data.$
(Function): The Cheeerio function. This may be handy to do some other manipulation on the DOM, if needed.response
(Object): The response object.body
(String): The raw body as a string.
Scrapes the data in the provided element.
For the format of the selector, please refer to the Selectors section of the Cheerio library
-
Cheerio
$
: The input element. -
Object
opts
: An object containing the scraping information. If you want to scrape a list, you have to use thelistItem
selector:listItem
(String): The list item selector.data
(Object): The fields to include in the list objects:<fieldName>
(Object|String): The selector or an object containing:selector
(String): The selector.convert
(Function): An optional function to change the value.how
(Function|String): A function or function name to access the value.attr
(String): If provided, the value will be taken based on the attribute name.trim
(Boolean): Iffalse
, the value will not be trimmed (default:true
).closest
(String): If provided, returns the first ancestor of the given element.eq
(Number): If provided, it will select the nth element.texteq
(Number): If provided, it will select the nth direct text child. Deep text child selection is not possible yet. Overwrites thehow
key.listItem
(Object): An object, keeping the recursive schema of thelistItem
object. This can be used to create nested lists.
Example:
{ articles: { listItem: ".article" , data: { createdAt: { selector: ".date" , convert: x => new Date(x) } , title: "a.article-title" , tags: { listItem: ".tags > span" } , content: { selector: ".article-content" , how: "html" } , traverseOtherNode: { selector: ".upperNode" , closest: "div" , convert: x => x.length } } } }
If you want to collect specific data from the page, just use the same schema used for the
data
field.Example:
{ title: ".header h1" , desc: ".header h2" , avatar: { selector: ".header img" , attr: "src" } }
- Object The scraped data.