Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instead of scraping directly from mnonboard, write sync_content.sh file #55

Open
iannesbitt opened this issue Jan 10, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request v0.1.2 Version 0.1.2 item
Milestone

Comments

@iannesbitt
Copy link
Contributor

For repositories that contain many records, it's necessary to run nohup ./sync_content.sh & so that an inadvertent ssh hangup does not cancel the scrape. Therefore, mnonboard should write the sync_content.sh file rather than run scrapy directly.

sync_content.sh should look like this, where the only dynamic content is the node name in line 3:

#!/bin/bash

NODE="mnTestDVNO"

HOME_DIR="/home/mnlite"
MNLITE_DIR="${HOME_DIR}/WORK/mnlite"
NODE_DIR="${MNLITE_DIR}/instance/nodes/${NODE}"

ENV_DIR="${HOME_DIR}/.virtualenvs/mnlite"
LOG_DIR="/var/log/mnlite"

cd "${MNLITE_DIR}"
source "${ENV_DIR}/bin/activate"
LOG_FILE="${LOG_DIR}/${NODE}-crawl.log"
logger "Start crawl on: ${NODE} logfile: ${LOG_FILE}"
scrapy crawl --logfile=${LOG_FILE} JsonldSpider -s STORE_PATH=${NODE_DIR}
logger "End crawl on ${NODE}"
@iannesbitt iannesbitt added enhancement New feature or request v0.1.2 Version 0.1.2 item labels Jan 10, 2024
@iannesbitt iannesbitt added this to the 0.1.2 milestone Jan 10, 2024
@iannesbitt iannesbitt self-assigned this Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request v0.1.2 Version 0.1.2 item
Projects
None yet
Development

No branches or pull requests

1 participant