You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This does not happen when the ignoreShadowRoots constructor option is set to true
Repro:
import{PlaywrightCrawler}from'crawlee';import{setTimeout}from'timers/promises';conststartUrls=['https://accessgroup.my.site.com/Support/s/article/Dimensions-AOI-Error-Category-number-cannot-be-changed'];constcrawler=newPlaywrightCrawler({requestHandler: async({ parseWithCheerio, page })=>{// wait till the content loadsawaitpage.waitForSelector('.cKnowledge_Articles');// parse with Cheerio, expanding the shadow roots (and removing the content).awaitparseWithCheerio();awaitsetTimeout(10e3);},headless: false,ignoreShadowRoots: false,// set to `true` to make it work correctly});awaitcrawler.run(startUrls);
With ignoreShadowRoots: false:
bad.mp4
With ignoreShadowRoots: true:
good.mp4
CC @B4nan as the author of the original PR adding the shadow root expansion.
The text was updated successfully, but these errors were encountered:
barjin
added
bug
Something isn't working.
t-tooling
Issues with this label are in the ownership of the tooling team.
labels
Jul 17, 2024
On some pages (e.g. https://accessgroup.my.site.com/Support/s/article/Dimensions-AOI-Error-Category-number-cannot-be-changed), the call to the
parseWithCheerio
Playwright helper removes some of the content from the page.This does not happen when the
ignoreShadowRoots
constructor option is set totrue
Repro:
With
ignoreShadowRoots: false
:bad.mp4
With
ignoreShadowRoots: true
:good.mp4
CC @B4nan as the author of the original PR adding the shadow root expansion.
The text was updated successfully, but these errors were encountered: