Link Discoverer

For our scraping needs, we found it useful to break up the crawl and the scrape into different libraries. This is just the crawler, which returns a deduped array of urls for a given website.

Clone and Install Dependencies

git clone https://github.com/tylrhas/link-discoverer.git
cd link-discoverer
npm i

# TO RUN
npm run dev
# OR
node index.js

GCloud Cloud Run Deploy

You need an authenticated GCloud account with a project will billing configured and access to IAM and Cloud Run services.

You also need the project name, which is referenced as $GCP_PROJECT_ID below.

gcloud iam service-accounts create link-discoverer-identity

gcloud run deploy link-discoverer \
  --image gcr.io/$GCP_PROJECT_ID/link-discoverer \
  --service-account rover-identity \
  --no-allow-unauthenticated

Select [1] for Cloud Run (fully managed) Select [21] for us-west1 region

Record Service URL from response and the sevice-account name you provided. Those will be used by the service invoking this run.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.vscode		.vscode
__mocks__		__mocks__
__tests__		__tests__
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
index.js		index.js
linkDiscoverer.js		linkDiscoverer.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Link Discoverer

Clone and Install Dependencies

GCloud Cloud Run Deploy

About

Releases 4

Packages

Contributors 4

Languages

g5search/link-discoverer

Folders and files

Latest commit

History

Repository files navigation

Link Discoverer

Clone and Install Dependencies

GCloud Cloud Run Deploy

About

Resources

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 4

Languages

Packages