Instead of scraping directly from `mnonboard`, write `sync_content.sh` file #55

iannesbitt · 2024-01-10T21:31:46Z

For repositories that contain many records, it's necessary to run nohup ./sync_content.sh & so that an inadvertent ssh hangup does not cancel the scrape. Therefore, mnonboard should write the sync_content.sh file rather than run scrapy directly.

sync_content.sh should look like this, where the only dynamic content is the node name in line 3:

#!/bin/bash

NODE="mnTestDVNO"

HOME_DIR="/home/mnlite"
MNLITE_DIR="${HOME_DIR}/WORK/mnlite"
NODE_DIR="${MNLITE_DIR}/instance/nodes/${NODE}"

ENV_DIR="${HOME_DIR}/.virtualenvs/mnlite"
LOG_DIR="/var/log/mnlite"

cd "${MNLITE_DIR}"
source "${ENV_DIR}/bin/activate"
LOG_FILE="${LOG_DIR}/${NODE}-crawl.log"
logger "Start crawl on: ${NODE} logfile: ${LOG_FILE}"
scrapy crawl --logfile=${LOG_FILE} JsonldSpider -s STORE_PATH=${NODE_DIR}
logger "End crawl on ${NODE}"

The text was updated successfully, but these errors were encountered:

iannesbitt added enhancement New feature or request v0.1.2 Version 0.1.2 item labels Jan 10, 2024

iannesbitt added this to the 0.1.2 milestone Jan 10, 2024

iannesbitt self-assigned this Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instead of scraping directly from `mnonboard`, write `sync_content.sh` file #55

Instead of scraping directly from `mnonboard`, write `sync_content.sh` file #55

iannesbitt commented Jan 10, 2024

Instead of scraping directly from mnonboard, write sync_content.sh file #55

Instead of scraping directly from mnonboard, write sync_content.sh file #55

Comments

iannesbitt commented Jan 10, 2024

Instead of scraping directly from `mnonboard`, write `sync_content.sh` file #55

Instead of scraping directly from `mnonboard`, write `sync_content.sh` file #55