Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add har recorder #181

Merged
merged 7 commits into from
Dec 17, 2024
Merged

Add har recorder #181

merged 7 commits into from
Dec 17, 2024

Conversation

thypon
Copy link
Member

@thypon thypon commented Dec 16, 2024

Continues #180

@thypon thypon requested a review from pes10k as a code owner December 16, 2024 19:32
Copy link

The following commits were not verified:
a39a942 (unsigned)

@thypon thypon mentioned this pull request Dec 16, 2024
@FHantke
Copy link
Contributor

FHantke commented Dec 16, 2024

@pes10k Can confirm, branch still works as expected.

@pes10k pes10k force-pushed the add-har-recorder branch 2 times, most recently from fc59bfd to 2e89553 Compare December 16, 2024 20:31
@FHantke
Copy link
Contributor

FHantke commented Dec 17, 2024

@pes10k I don't have the access rights to push here. Added tests to my fork https://github.com/FHantke/pagegraph-crawl/tree/add-har-recorder

Copy link

[puLL-Merge] - brave/pagegraph-crawl@181

Description

This PR introduces several significant changes to the pagegraph-crawl project, including the addition of HAR (HTTP Archive) generation functionality, improvements to the testing process, and updates to dependencies and documentation.

Changes

Changes

  1. .github/workflows/pr.yml:

    • Removed manual sed command for configuring the test binary path
    • Added environment variables for debugging and specifying the test binary path
  2. .gitignore:

    • Added test/debug_output/ and test/output/ to ignore list
  3. README.md:

    • Updated node version requirement
    • Minor text improvements
  4. built/brave/crawl.js:

    • Added HAR generation functionality
    • Implemented prepareHARGenerator function to set up HAR recording
    • Modified doCrawl function to include HAR generation
  5. built/brave/files.js:

    • Added writeHAR function
    • Replaced fs-extra dependency with native node:fs/promises
  6. built/brave/puppeteer.js:

    • Updated profile path handling
    • Removed commented-out stealth mode code
  7. built/brave/validate.js:

    • Added storeHar and storeHarBody options
    • Replaced hasbin with which for binary detection
  8. built/run.js:

    • Added command-line arguments for HAR generation
  9. changelog.md:

    • Updated with new features and changes for version 1.1.3
  10. eslint.config.js:

    • Added rule to ignore camelcase for properties
  11. package.json:

    • Updated version to 1.1.3
    • Added chrome-har and which dependencies
    • Removed fs-extra, hasbin, and chai dependencies
    • Updated scripts and dev dependencies
  12. src/ directory:

    • Updated TypeScript source files to reflect changes in built JavaScript files
  13. test/:

    • Removed config.js
    • Added new test pages and resources
    • Significantly refactored test.js:
      • Replaced chai with native assert
      • Implemented HAR validation
      • Added tests for HAR generation
    • Updated test_server.js (changes not shown in the diff)
sequenceDiagram
    participant User
    participant CLI
    participant Puppeteer
    participant Browser
    participant HARGenerator
    participant FileSystem

    User->>CLI: Run crawl command
    CLI->>Puppeteer: Launch browser
    Puppeteer->>Browser: Create new page
    CLI->>HARGenerator: Prepare HAR recording
    Browser->>HARGenerator: Send network events
    Browser->>FileSystem: Generate PageGraph
    HARGenerator->>FileSystem: Generate HAR file
    FileSystem->>User: Save output files
Loading

Possible Issues

  1. The removal of the fs-extra dependency might impact file operations that relied on its extended functionality.
  2. The switch from hasbin to which for binary detection could potentially behave differently in edge cases.

Security Hotspots

  1. The new HAR generation feature stores response bodies, which could potentially include sensitive information if not handled carefully.

Copy link

socket-security bot commented Dec 17, 2024

🚨 Potential security issues detected. Learn more about Socket for GitHub ↗︎

To accept the risk, merge this PR and you will not be notified again.

Alert Package NoteSourceCI
Deprecated npm/rimraf@3.0.2
  • Reason: Rimraf versions prior to v4 are no longer supported
⚠︎
Deprecated npm/@humanwhocodes/object-schema@2.0.3
  • Reason: Use @eslint/object-schema instead
⚠︎
Deprecated npm/@humanwhocodes/config-array@0.13.0
  • Reason: Use @eslint/config-array instead
⚠︎
Deprecated npm/eslint@8.57.1
  • Reason: This version is no longer supported. Please see https://eslint.org/version-support for other options.
⚠︎

View full report↗︎

Next steps

What is a deprecated package?

The maintainer of the package marked it as deprecated. This could indicate that a single version should not be used, or that the package is no longer maintained and any new vulnerabilities will not be fixed.

Research the state of the package and determine if there are non-deprecated versions that can be used, or if it should be replaced with a new, supported solution.

Take a deeper look at the dependency

Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at support [AT] socket [DOT] dev.

Remove the package

If you happen to install a dependency that Socket reports as Known Malware you should immediately remove it and select a different dependency. For other alert types, you may may wish to investigate alternative packages or consider if there are other ways to mitigate the specific risk posed by the dependency.

Mark a package as acceptable risk

To ignore an alert, reply with a comment starting with @SocketSecurity ignore followed by a space separated list of ecosystem/package-name@version specifiers. e.g. @SocketSecurity ignore npm/foo@1.0.0 or ignore all packages with @SocketSecurity ignore-all

  • @SocketSecurity ignore npm/rimraf@3.0.2
  • @SocketSecurity ignore npm/@humanwhocodes/object-schema@2.0.3
  • @SocketSecurity ignore npm/@humanwhocodes/config-array@0.13.0
  • @SocketSecurity ignore npm/eslint@8.57.1

@@ -0,0 +1,4 @@
<svg height="100" width="100" xmlns="http://www.w3.org/2000/svg">

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[safesvg] element svg: validity error : Element svg content does not follow the DTD, expecting (desc | title | metadata | animate | set | animateMotion | animateColor | animateTransform | svg | g | defs | symbol | use | switch | image | style | path | rect | circle | line | ellipse | polyline | polygon | text | altGlyphDef | marker | color-profile | linearGradient | radialGradient | pattern | clipPath | mask | filter | cursor | a | view | script | font | font-face | foreignObject)*, got (circle CDATA)

Run SVGO on your assets


Cc @thypon @kdenhartog

@pes10k pes10k force-pushed the add-har-recorder branch 2 times, most recently from b32d9d3 to ae8b73a Compare December 17, 2024 17:51
@diracdeltas
Copy link
Member

this repo has some deprecated dependencies that should be addressed (see the socket comment above)

@pes10k
Copy link
Collaborator

pes10k commented Dec 17, 2024

@diracdeltas these are just dev dependencies, and all pulled in from standardjs. Since they're just dev dependencies i was hoping they could be ignored.

If not, is there an alternative security recommends other than standardjs? IIRC it was either suggested or required for all code for a while. I can search for something kept more up to date, but if any suggestions greatly appreciated

@diracdeltas
Copy link
Member

@diracdeltas these are just dev dependencies, and all pulled in from standardjs. Since they're just dev dependencies i was hoping they could be ignored.

If not, is there an alternative security recommends other than standardjs? IIRC it was either suggested or required for all code for a while. I can search for something kept more up to date, but if any suggestions greatly appreciated

we do use standard in basically all js repos but i haven't seen these issues. they might be false positives then.

package.json Outdated
"eslint": "9.16.0",
"express": "4.21.1",
"express": "^4.21.2",
"standard": "^17.1.2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these updated deps should be pinned to exact versions instead of using ^

@pes10k
Copy link
Collaborator

pes10k commented Dec 17, 2024

@diracdeltas these are just dev dependencies, and all pulled in from standardjs. Since they're just dev dependencies i was hoping they could be ignored.

If not, is there an alternative security recommends other than standardjs? IIRC it was either suggested or required for all code for a while. I can search for something kept more up to date, but if any suggestions greatly appreciated

we do use standard in basically all js repos but i haven't seen these issues. they might be false positives then.

Hmm, odd, I don't think they're false positives. The current version of standard really does depend versions of eslint socket.io is complaining about (see https://github.com/standard/standard/blob/master/package.json#L18 for example, which is even older than the [8.57.1] version i've pulled in).

Are those other projects configured differently? Would really like to not need to remove standard here (its very handy) if possible

@pes10k pes10k force-pushed the add-har-recorder branch 3 times, most recently from 5605d9c to ce01a2a Compare December 17, 2024 18:38
Copy link
Collaborator

@pes10k pes10k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Discussing what to do with socket.io warnings in other channels

@pes10k pes10k merged commit 4594616 into main Dec 17, 2024
5 checks passed
@pes10k pes10k deleted the add-har-recorder branch December 17, 2024 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants