Skip to content

stormliucong/eMERGE-Columbia-Data-Sync-Service

Repository files navigation

Introduction

  • The is an eMERGE 4 support program
  • Set up a merged R4 and local redcap database
  • Pull R4 data periodically to populate the local redcap database
  • Send out auto reminder for users to complete the survey in R4.

Known Issues

  • Since R4 is constantly changing, a better machanism is needed to report and monitor the change. Especially to avoid the errors like ERROR:root:b'{"error":"The following fields were not found in the project as real data fields: your_or_your_childs_3"}'
  • The file sync is not supported currently.
  • If the field contain @CALC annotation, the sync might fail

To do list

  • automate project setup when R4 design is changed
  • send out error email if the sync failed

How to use

  1. Create API token and endpoint

    • Create a file named api_tokens.json and put it under the root dir of this repo. An example of this json file is showed below.
    {
        "api_key_local": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
        "api_key_r4": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
        "local_endpoint": "https://emerge4.dbmi.columbia.edu/redcap/api/",
        "r4_api_endpoint" : "https://redcap.vanderbilt.edu/api/"
    } 
  2. Try connection_check/get_redcap_version.py to make sure the connection and API token works

    # Since columbia use CAS to authenticate local Redcap, we have to 
    # add a list of IP (where this program hosted) into the whitelist
    # aqua is a dynamic IP, has to add them all
    # check IP by viewing apache logs.
    sudo tail -100 /var/log/apache2/access.log | grep /redcap/api/
    sudo vi /etc/apache2/sites-available/redcap-cas.conf
        #### Open up the API without auth
        <DirectoryMatch "^/var/www/html/redcap/api/">
            # Exempted IPs. Can also use CIDR notation.
            Require ip 127.0.0.1 172.xx.xx.xx 172.xx.xx.xx
        </DirectoryMatch>
  3. Make a copy of local custermized redcap build

    • make sure the field name has been adjusted. (avoid conflict when merging with R4)
    • obtain new API tokens and change api_tokens.json accordingly.
  4. Use project_setup.py to dump the dictionary from both R4 and local redcap projects, and then load into the local redcap project created in 3.

    • This program has to be reexcuted everytime there is a R4 level data field change.
    • In case there is an error like HTTP Status: {"error":"This method cannot be used while the project is in Production status."}, move Back to Development status.
    • Due to constant change in R4, we decided to ignore some of those fields (reduce the number of times required to reset project)
  5. Modify data_pull_from_r4.py to pull R4 data periodically into local Redcap

    • cumc_id will be created by adding 1 into the current largest number in local redcap. Numbers larger than 10000 are reserved for those Epic imported records.
    • current program is optimzed for memory and time. Takes about ~5 mins to sync ~2500 records.
    • Pull R4 data and match local data by
      1. record_id in r4; (This can be assigned a record_id in a local record to enable a forced manual match.)
      2. participant IDs;
      3. first_name, last_name & DOB if this is a adult patient and given the local record has no child portion filled in.
      4. child_names & DOB if this is a child patient.
      5. If can not find a match in local data, create a new CUIMC_ID incremented. workflow
    • Also pull R4 surveyQueueLink via API for each record and store in [r4_survey_queue_link]
    • ignore_R4_fields.json file is used to ignore the recent R4 update.
    • read and write functions are vectorized to speed up the process
      • It should take less than 15 minutes to complete the data pull for 4000 records.
    • set up crob job for daily pull cron_job.sh. An example is showed below.
      # m h  dom mon dow   command
      0 0 * * * sh /phi_home/cl3720/phi/eMERGE/eIV-recruitement-support-redcap/cron_job.sh
  6. Set up alert machanism to send out auto reminder.

    • See create_survey_alert.md for more details
    • if [previous_survey_complete] = '2' AND [reminder_survey_complete] !='2'
    • send out [r4_survey_queue_link]
    • data fetch will only trigger the alert once.
    • In addition, if you want to send out email via a valid SMTP server, please see redcap_send_out_email.md for more details.
  7. Execute extract_id_mapping.py to check the wrongfully mapped IDs.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published