-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DSEGOG-255 EPAC Simulated Data #66
Conversation
- This directory wasn't being checked for formatting and linting issues in the Nox sessions before this commit, so there are some changes to files outside of the `util/realistic_data/`
- Files are now not stored into directories by day - this fits with the ingestion echo script format
- This can now be done because EPAC Data Sim can be specified as a dependency in Poetry (where it couldn't be done previously because its a GitHub repo, had some issues with that)
- This is so we can exclude it if we don't have the correct SSH permissions to access/clone the repo. For example, the CI cannot clone the repo
d64cfac
to
9d31baa
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #66 +/- ##
==========================================
- Coverage 83.18% 83.00% -0.18%
==========================================
Files 45 45
Lines 2153 2160 +7
Branches 164 164
==========================================
+ Hits 1791 1793 +2
- Misses 329 335 +6
+ Partials 33 32 -1 ☔ View full report in Codecov by Sentry. |
- This is so the old Gemini test data still works
1995510
to
b08e40e
Compare
The test coverage in this PR is down because there are no specific tests for the HDF ingestion code, which is where the modifications to the source code are. Currently, this code is effectively tested with the ingestion script. My suggestion would be that improving test coverage on the ingestion code would be a good task for Will to do (i.e. complete DSEGOG - 262)? |
- This reduces the amount of size required to store them using the simulated data we now have access too
- This will be useful when the script is run on local machines where SSH is an issue (e.g. GitHub Actions) - Also included a couple of linting fixes in this PR
- Experiments are stored using ObjectIDs, not custom identifers
endpoint_url: https://s3.echo.stfc.ac.uk | ||
access_key: access_key | ||
secret_key: secret_key | ||
simulated_data_bucket: og-realistic-data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
og-ci-simulated-data should work better for the users
these caused an error where none of the database collections where dropped when using poetry run python util/realistic_data/ingest_echo_data.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
just need to resolve the merge conflicts
and decide wether to change the simulated_data_bucket
in the example config to og-ci-simulated-data
Ready to merge. I had to merge DSEGOG-300-mongoimport-ci into this branch to get the CI to pass so before I can merge this, #97 needs to be reviewed & merged. |
This PR contains a number of scripts in
util/realistic_data/
as a mechanism to generate simulated data and ingest it onto an instance of the API. I've written some documentation about this inutil/realistic_data/README.md
so I won't go into much detail about it in this PR, but direct you to that file.There are also a small number of changes made to the source code of the API to support the change in how timestamps will be stored in HDF files.
https://github.com/ral-facilities/operationsgateway-ansible/pull/10 shows how this will be setup and used for the dev server.