Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide a way to avoid re2 installation #630

Closed
gajus opened this issue Mar 31, 2023 · 23 comments · Fixed by #656
Closed

provide a way to avoid re2 installation #630

gajus opened this issue Mar 31, 2023 · 23 comments · Fixed by #656

Comments

@gajus
Copy link

gajus commented Mar 31, 2023

Unless I am missing something, re2 is added as a dependency but it is not actually being used anywhere.

@gajus
Copy link
Author

gajus commented Mar 31, 2023

It is a peer dependency of https://github.com/spamscanner/url-regex-safe, turns out.

@gajus
Copy link
Author

gajus commented Mar 31, 2023

Is it worth evaluating just using a more basic URL regex?

url-regex-safe is an overkill; and copiling re2 is taking a ton of time.

@gajus
Copy link
Author

gajus commented Mar 31, 2023

It looks like metascraper could get away with simple using new URL()

@Kikobeats
Copy link
Member

Kikobeats commented Apr 1, 2023

Yes, it's used 🙂

If r2 installation fails, still url-regex-safe can work
https://github.com/spamscanner/url-regex-safe/blob/master/src/index.js#L5

@gajus
Copy link
Author

gajus commented Apr 3, 2023

@Kikobeats But is it needed given that you can achieve the same check using new URL()?

@Kikobeats
Copy link
Member

Unfortunately it's needed. Maybe we can find a mechanism to skip re2 installation, like using an environment variable.

@gajus
Copy link
Author

gajus commented Apr 3, 2023

Could we update metascraper API such that users can provide their own RegExp constructor and instruct them to use re2 if possible?, i.e.

const SafeRegExp = (() => {
  try {
    const RE2 = require('re2');
    return typeof RE2 === 'function' ? RE2 : RegExp;
  } catch {
    return RegExp;
  }
})();

const metascraper = createMetascraper([
    createMetascraperTitleRule(),
    createMetascraperDescriptionRule(),
    createMetascraperImageRule(),
], {RegExp: SafeRegExp});

@Kikobeats
Copy link
Member

Can you paste the installation error you are getting?

Theoretically it should be work even re2 installation failed.

@gajus
Copy link
Author

gajus commented Apr 3, 2023

It is not the installation error, it is the

  1. compile time and
  2. not being able to bundle code due to native dependency

RE 1:

If you create an Dockerfile with:

FROM node:19-bullseye

RUN npm install --global pnpm@^7.30.3 turbo@^1

RUN pnpm install re2

then just the RUN pnpm install re2 step takes 2 minutes and 5 seconds, which adds up to a lot of build time overhead. For context, removing metascraper (and therefore re2) from our dependency tree reduces the install time down to less than 30 seconds for all dependencies.

RE 2:

We use esbuild to bundle our Node.js services, and currently re2 is the only dependency that cannot be bundled (being that it is a native dependency), which requires to pnpm install, which otherwise we wouldn't need to do at all in production build because we already have all the other code bundled.

@Kikobeats Kikobeats changed the title re2 is an unused dependency? provide a way to avoid re2 installation Apr 3, 2023
@Kikobeats Kikobeats reopened this Apr 3, 2023
@Kikobeats
Copy link
Member

Kikobeats commented Apr 3, 2023

what if you use pnpm override to avoid install it? https://pnpm.io/package_json#pnpmoverrides

I don't feel comfortable removing re2 since it's there for a good reason, but I have to explore a mechanism to opt-out if that's the thing the user wants.

To me, something like RE2_SKIP_INSTALL=true sounds ideal

@gajus
Copy link
Author

gajus commented Apr 3, 2023

what if you use pnpm override to avoid install it? https://pnpm.io/package_json#pnpmoverrides

It is not ideal, but we can.

I say it is not ideal because overriding internal dependencies tends to introduce very hard to debug issues long-term. Been there a few times.

@gajus
Copy link
Author

gajus commented Apr 3, 2023

I don't feel comfortable removing re2 since it's there for a good reason, but I have to explore a mechanism to opt-out if that's the thing the user wants.

To me something like RE2_SKIP_INSTALL=true sounds ideal

I would make this more explicit, like METASCRAPER_RE2_SKIP_INSTALL=true

@gajus
Copy link
Author

gajus commented Apr 13, 2023

@Kikobeats I was confused why we are still installing it, but it looks like re2 is actually a hard dependency of @metascraper/helpers

  /@metascraper/helpers@5.33.5:
    resolution: {integrity: sha512-gcULKpM00CNxlf7iWRTi4hQQIXWQUjeFal0V5U60C4P4YyfLXfjuQVBk6mmKSYENSRh7oBQhAR+YVnMalVWBcw==}
    engines: {node: '>= 12'}
    dependencies:
      audio-extensions: 0.0.0
      chrono-node: 2.5.0
      condense-whitespace: 2.0.0
      entities: 4.4.0
      file-extension: 4.0.5
      has-values: 2.0.1
      image-extensions: 1.1.0
      is-relative-url: 3.0.0
      is-uri: 1.2.4
      iso-639-3: 2.2.0
      isostring: 0.0.1
      jsdom: 21.1.1
      lodash: 4.17.21
      memoize-one: 6.0.0
      microsoft-capitalize: 1.0.5
      mime: 3.0.0
      normalize-url: 6.1.0
      re2: 1.18.0
      smartquotes: 2.3.2
      tldts: 5.7.103
      url-regex-safe: 3.0.0(re2@1.18.0)
      video-extensions: 1.2.0
    transitivePeerDependencies:
      - bluebird
      - bufferutil
      - canvas
      - supports-color
      - utf-8-validate
    dev: true

If this can at least be made a peer dependency, then we can just skip installing it.

@gajus
Copy link
Author

gajus commented Apr 13, 2023

This was painful but hopefully helps others uhop/node-re2#163 (comment)

@Kikobeats
Copy link
Member

Yes, it's a hard dependency. I want to keep it unless you explicit opt-out passing METASCRAPER_RE2_SKIP_INSTALL

@fieztazica
Copy link

is there a proper way to fix re2? im using nextjs 13 and get re2.node error when compiled

@loteoo
Copy link

loteoo commented Jul 12, 2023

Looking forward to that METASCRAPER_RE2_SKIP_INSTALL here 💪 :) - it's preventing me from using this in a project

@JaleelB
Copy link

JaleelB commented Jul 28, 2023

is there a proper way to fix re2? im using nextjs 13 and get re2.node error when compiled

I was having the same issue with next 13. i tried excluding re2 from the webpack bundle but that didn't help. I just built my own solution using Cheerio. Just grab the html from your request url, load it with cheerio, and search for the specific metadata you need. Ofc, add in the appropriate fallbacks where necessary to improve accuracy. It's pretty simple and you wont have to deal with that re2 node error.

Here is an example of what i did:

"use server"
import cheerio from 'cheerio';

export const getMetaImage = (html: string): string => {
  const $ = cheerio.load(html);
  return $('meta[property="og:image"]').attr('content')!
    ?? $('meta[name="twitter:image"]').attr('content')
    ?? $('.post-image').attr('src')
    ?? $('.entry-image').attr('src');
};

@zaosoula
Copy link

+1

@jcurlier
Copy link

jcurlier commented Aug 15, 2023

+1

just started having a related issue building a node backend service that uses metascraper

#!/bin/bash -eo pipefail
docker build --rm=false -t gcr.io/${GCP_PROJECT}/${SERVICE_NAME}:$CIRCLE_TAG .
Sending build context to Docker daemon  13.47MB
Step 1/8 : FROM node:18.17.0-alpine
18.17.0-alpine: Pulling from library/node

a8db6415: Pulling fs layer 
2f1a5d31: Pulling fs layer 
b7606c1a: Pulling fs layer 
Digest: sha256:58878e9e1ed3911bdd675d576331ed8838fc851607aed3bb91e25dfaffab3267
Status: Downloaded newer image for node:18.17.0-alpine
 ---> f1fac320ae0c
Step 2/8 : WORKDIR /home/app
 ---> Running in 048cc5001d44
 ---> 8fcfa90cc1b8
Step 3/8 : COPY package.json package-lock.json /home/app/
 ---> d8292bd4b24a
Step 4/8 : RUN npm config set update-notifier false
 ---> Running in 709aca441111
 ---> 0ffe21789570
Step 5/8 : RUN npm install --quiet --production
 ---> Running in 81143fdafca6
npm WARN config production Use `--omit=dev` instead.
npm WARN deprecated uglify-es@3.3.9: support for ECMAScript is superseded by `uglify-js` as of v3.13.0
npm ERR! code 1
npm ERR! path /home/app/node_modules/re2
npm ERR! command failed
npm ERR! command sh -c install-from-cache --artifact build/Release/re2.node --host-var RE2_DOWNLOAD_MIRROR --skip-path-var RE2_DOWNLOAD_SKIP_PATH --skip-ver-var RE2_DOWNLOAD_SKIP_VER || npm run rebuild
npm ERR! Trying https://github.com/uhop/node-re2/releases/download/1.20.1/linux-musl-x64-108.br ...
npm ERR! Writing to build/Release/re2.node ...
npm ERR! The verification has failed: building from sources ...
npm ERR! Building locally ...
npm ERR! 
npm ERR! > re2@1.20.1 rebuild
npm ERR! > node-gyp rebuild
npm ERR! 
npm ERR! 
npm ERR! > re2@1.20.1 rebuild
npm ERR! > node-gyp rebuild
npm ERR! gyp ERR! find Python 
npm ERR! gyp ERR! find Python Python is not set from command line or npm configuration

it seems re2 is failing and it is trying to build locally (the build worked before).

@jcurlier
Copy link

for info... seems the issue above with re2 and node alpine started with 1.20.1 - see uhop/node-re2#180

@Kikobeats
Copy link
Member

hope this can help #656
although your issue is related with installation time rather than execution time

@aldenquimby
Copy link
Contributor

Did anyone here find a workable solution to exclude re2 from the build? #656 only helps at runtime, so it doesn't fix the issue for compiled apps (next, webpack, esbuild, vite, etc). Is this patch the best option?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants