Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSEGOG-255 EPAC Simulated Data #66

Merged
merged 74 commits into from
Feb 27, 2024
Merged
Show file tree
Hide file tree
Changes from 64 commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
ecb5693
DSEGOG-255 Edit ingestion of timestamp to work with new format
MRichards99 Jun 26, 2023
5c6e866
DSEGOG-255 First attempt at generating test data and storing in Echo
MRichards99 Jun 26, 2023
bc72782
DSEGOG-255 Add Echo ingestion script
MRichards99 Jul 4, 2023
67cd2fe
DSEGOG-255 Fix linting for `util/`
MRichards99 Jul 5, 2023
b6539c8
DSEGOG-255 Clear buffers before each request to `/submit/hdf`
MRichards99 Jul 5, 2023
f96240b
DSEGOG-255 Clean up data generation script
MRichards99 Jul 5, 2023
994ea24
DSEGOG-255 Improve `APIClient` to deal with cases when connection can…
MRichards99 Jul 6, 2023
79ef756
DSEGOG-255 Add script that simulates incoming HDF files from CLF
MRichards99 Jul 6, 2023
17d87e4
DSEGOG-255 Fix timestamp for further change in EPAC-DataSim
MRichards99 Jul 6, 2023
0cd307c
DSEGOG-255 Change prefix to `data/` for new data storage
MRichards99 Jul 18, 2023
a8fe557
DSEGOG-255 Add dev dependencies for simulated data
MRichards99 Jul 21, 2023
9a0ce28
DSEGOG-255 Refresh token after each 'page' of files are ingested
MRichards99 Jul 25, 2023
ce032d5
DSEGOG-255 Change directory structure of data generation
MRichards99 Jul 26, 2023
f8d5145
Use Poetry in generation script instead of external virtualenv
MRichards99 Jul 26, 2023
482b324
DSEGOG-255 Add ability to convert multiple schedule files into one mo…
MRichards99 Jul 26, 2023
104b418
DSEGOG-255 Linting fixes
MRichards99 Jul 26, 2023
86b5e35
Merge branch 'DSEGOG-243-image-storage-echo' into DSEGOG-255-realisti…
MRichards99 Jul 28, 2023
23b644d
DSEGOG-255 Implement removal of images from Echo in new ingestion script
MRichards99 Jul 28, 2023
f736317
DSEGOG-255 Fix database wipe functionality
MRichards99 Jul 28, 2023
f4c0a86
DSEGOG-255 Add documentation for simulated data
MRichards99 Jul 31, 2023
0362ea0
DSEGOG-255 Change daily ingestor script to compare filenames against …
MRichards99 Jul 31, 2023
1879686
Merge branch 'DSEGOG-243-image-storage-echo' into DSEGOG-255-realisti…
MRichards99 Jul 31, 2023
702e7e8
Merge branch 'DSEGOG-243-image-storage-echo' into DSEGOG-255-realisti…
MRichards99 Aug 4, 2023
9d31baa
DSEGOG-255 Move EPAC DataSim dependency to its own group
MRichards99 Aug 4, 2023
b08e40e
DSEGOG-255 Support both timestamp formats for HDF file ingestion
MRichards99 Aug 4, 2023
923df2a
DSEGOG-255 Correct TODO
MRichards99 Aug 14, 2023
c0cf5e4
DSEGOG-255 Remove `pass` from script
MRichards99 Aug 15, 2023
793f8b2
DSEGOG-255 Remove thread pools for uploading multiple images
MRichards99 Aug 22, 2023
1af4a92
DSEGOG-255 Add small details to bullet point process
MRichards99 Sep 19, 2023
d9ed87f
DSEGOG-255 Upgrade EPAC-DataSim version
MRichards99 Sep 19, 2023
9c8d260
DSEGOG-255 Make calendar conversion script support multi-part experim…
MRichards99 Sep 21, 2023
26aacf9
DSEGOG-255 Swap times for start and end dates in conversion script
MRichards99 Sep 22, 2023
dc1fec1
DSEGOG-255 Allow number of `s4cmd` threads to be tweaked
MRichards99 Sep 22, 2023
bc91a81
DSEGOG-255 Add instructions for generating & uploading simulated data
MRichards99 Sep 22, 2023
7137cb5
DSEGOG-255 Flip API shutdown if statement
MRichards99 Sep 29, 2023
1378e4a
DSEGOG-255 Add `--timeout` to API startup command
MRichards99 Sep 29, 2023
f4816db
Merge branch 'DSEGOG-243-image-storage-echo' into DSEGOG-255-realisti…
MRichards99 Sep 29, 2023
8cb926b
Merge branch 'DSEGOG-258-multiple-instrument-support' into DSEGOG-255…
MRichards99 Sep 29, 2023
195a8d2
DSEGOG-255 Add instructions for ingesting data from Echo
MRichards99 Sep 29, 2023
241f773
DSEGOG-255 Remove `--log-config` from example command
MRichards99 Oct 3, 2023
16645e4
DSEGOG-255 Make simulated data documentation a little clearer
MRichards99 Oct 3, 2023
f365b00
DSEGOG-255 Add script to check file counts on Echo
MRichards99 Oct 5, 2023
7320092
DSEGOG-255 Get simulated data ingestion working on CI
MRichards99 Oct 11, 2023
4b4edd3
DSEGOG-255 Disable SSH functionality in script
MRichards99 Oct 11, 2023
5dffde9
DSEGOG-255 Fix tests to work with simulated data
MRichards99 Oct 11, 2023
0abcb83
Merge branch 'DSEGOG-243-image-storage-echo' into DSEGOG-255-realisti…
MRichards99 Oct 11, 2023
9324670
DSEGOG-255 Update get image tests
MRichards99 Oct 11, 2023
84aab4a
DSEGOG-255 Fix linting errors
MRichards99 Oct 11, 2023
23b96cd
DSEGOG-255 Fix MD5 checksums
MRichards99 Oct 12, 2023
7cecf1f
DSEGOG-255 Go back to multi-threading for uploading images
MRichards99 Oct 17, 2023
f15987d
DSEGOG-255 Improve error handling on `EchoInterface`
MRichards99 Oct 19, 2023
29083ae
DSEGOG-255 Store waveforms as a list of floats
MRichards99 Nov 3, 2023
f79d50f
DSEGOG-255 Add local command handler
MRichards99 Nov 3, 2023
a846e05
DSEGOG-255 Fix waveform test
MRichards99 Nov 3, 2023
2460e31
Merge branch 'DSEGOG-243-image-storage-echo' into DSEGOG-255-realisti…
MRichards99 Nov 3, 2023
71baa73
DSEGOG-255 Update input config type for Pydantic v2
MRichards99 Nov 3, 2023
be31f36
Merge branch 'DSEGOG-243-image-storage-echo' into DSEGOG-255-realisti…
MRichards99 Nov 3, 2023
c9da190
Merge branch 'main' into DSEGOG-255-realistic-test-data
MRichards99 Nov 10, 2023
e0f4010
DSEGOG-255 Remove `_id` being created in script
MRichards99 Nov 10, 2023
6c83d96
DSEGOG-255 Revert location of experiment import file
MRichards99 Nov 10, 2023
d2bf832
DSEGOG-255 Edit tests to ensure they pass with simulated data
MRichards99 Nov 15, 2023
e2900c3
DSEGOG-255 Edit ingestion script and config for Actions
MRichards99 Nov 15, 2023
f1215e8
Merge branch 'main' into DSEGOG-255-realistic-test-data
MRichards99 Nov 17, 2023
99afd88
Merge branch 'main' into DSEGOG-255-realistic-test-data
MRichards99 Feb 6, 2024
2d5c543
DSEGOG-255 removed excess quotation marks
Will-Cross1 Feb 22, 2024
3c76355
Merge branch 'main' into DSEGOG-255-realistic-test-data
MRichards99 Feb 23, 2024
6bb85f0
DSEGOG-255 Make suggested review changes
MRichards99 Feb 23, 2024
bc07c71
DSEGOG-255 Add test users import code to new ingestion script
MRichards99 Feb 23, 2024
a7876cc
Merge branch 'main' into DSEGOG-255-realistic-test-data
MRichards99 Feb 23, 2024
505436a
DSEGOG-255 Add config for CI
MRichards99 Feb 23, 2024
4df661e
Merge branch 'DSEGOG-300-mongoimport-ci' into DSEGOG-255-realistic-te…
MRichards99 Feb 26, 2024
55b2437
DSEGOG-255 Fix linting
MRichards99 Feb 26, 2024
a263a18
DSEGOG-255 Edit example config
MRichards99 Feb 26, 2024
bf90113
Merge branch 'main' into DSEGOG-255-realistic-test-data
MRichards99 Feb 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/ci_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ images:
echo_access_key: access_key
echo_secret_key: secret_key
image_bucket_name: test-bucket
upload_image_threads: 4
preferred_colour_map_pref_name: PREFERRED_COLOUR_MAP
mongodb:
mongodb_url: mongodb://localhost:27017
Expand Down
27 changes: 27 additions & 0 deletions .github/ci_ingest_echo_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
script_options:
wipe_database: false
delete_images: false
launch_api: true
ssh:
enabled: false
ssh_connection_url: 127.0.0.1
database:
hostname: localhost
port: 27017
name: opsgateway
remote_experiments_file_path: /tmp/experiments_for_mongoimport.json
echo:
endpoint_url: https://s3.echo.stfc.ac.uk
access_key: access_key
secret_key: secret_key
simulated_data_bucket: og-ci-simulated-data
images_bucket: og-bucket
page_size: 1
api:
host: 127.0.0.1
port: 8000
username: backend
password: back
log_config_path: /home/runner/work/operationsgateway-api/operationsgateway-api/operationsgateway_api/logging.ini
gunicorn_num_workers: "1"
timeout_seconds: 1000
27 changes: 17 additions & 10 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ jobs:
path: ~/.cache/pypoetry/virtualenvs
key: ${{ runner.os }}-poetry-${{ matrix.python-version }}-${{ hashFiles('poetry.lock') }}
- name: Install dependencies
run: poetry install
run: poetry install --without simulated-data

# Configure s4cmd
- name: Add keys to s4cmd config
Expand Down Expand Up @@ -107,15 +107,22 @@ jobs:
env:
SSH_KEY_PUBLIC: ${{secrets.SSH_PUBLIC_KEY_FOR_AUTH_OPENSSH}}

# Clone repo containing test data and use script to ingest the data
- name: Checkout OperationsGateway Test Data
uses: actions/checkout@v3
with:
repository: ral-facilities/operationsgateway-test-data
path: operationsgateway-test-data
ssh-key: ${{ secrets.SSH_PRIV_OG_TEST_DATA_ACTIONS }}
- name: Run ingestion script
run: poetry run python util/ingest_hdf.py -p operationsgateway-test-data/dev_server -U backend -P back
# Setup steps for Echo ingestion script
- name: Configure echo access key
run: yq -i ".echo.access_key = \"$ECHO_S3_ACCESS_KEY\"" .github/ci_ingest_echo_config.yml
env:
ECHO_S3_ACCESS_KEY: ${{secrets.ECHO_S3_ACCESS_KEY}}
- name: Configure echo secret key
run: yq -i ".echo.secret_key = \"$ECHO_S3_SECRET_KEY\"" .github/ci_ingest_echo_config.yml
env:
ECHO_S3_SECRET_KEY: ${{secrets.ECHO_S3_SECRET_KEY}}
- name: Configure bucket name for current run
run: yq -i '.echo.images_bucket = "og-actions-${{ github.sha }}-${{ github.run_id }}-${{ matrix.python-version }}"' .github/ci_ingest_echo_config.yml
- name: Copy config for Echo Ingest script to correct place
run: cp .github/ci_ingest_echo_config.yml util/realistic_data/config.yml

- name: Run Echo Ingest script
run: poetry run python util/realistic_data/ingest_echo_data.py

- name: Load Poetry cache for Nox tests session
uses: actions/cache@v3
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,5 @@ id_rsa
id_rsa.pub
logging.ini*
!logging.ini.example
util/realistic_data/data
util/realistic_data/resources
4 changes: 2 additions & 2 deletions noxfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

# Separating Black away from the rest of the sessions
nox.options.sessions = "lint", "safety", "tests"
code_locations = "operationsgateway_api", "test", "noxfile.py"
code_locations = "operationsgateway_api", "test", "noxfile.py", "util"


def install_with_constraints(session, *args, **kwargs):
Expand Down Expand Up @@ -110,5 +110,5 @@ def safety(session):
@nox.session(python=["3.8", "3.9", "3.10"], reuse_venv=True)
def tests(session):
args = session.posargs
session.run("poetry", "install", external=True)
session.run("poetry", "install", "--without", "simulated-data", external=True)
session.run("pytest", *args)
1 change: 1 addition & 0 deletions operationsgateway_api/config.yml.example
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ images:
echo_access_key: access_key
echo_secret_key: secret_key
image_bucket_name: test-bucket
upload_image_threads: 4
preferred_colour_map_pref_name: PREFERRED_COLOUR_MAP
mongodb:
mongodb_url: mongodb://localhost:27017
Expand Down
1 change: 1 addition & 0 deletions operationsgateway_api/src/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ class ImagesConfig(BaseModel):
echo_access_key: StrictStr
echo_secret_key: StrictStr
image_bucket_name: StrictStr
upload_image_threads: StrictInt
preferred_colour_map_pref_name: StrictStr


Expand Down
1 change: 1 addition & 0 deletions operationsgateway_api/src/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ def setup_logger():
"s3transfer",
"matplotlib.font_manager",
"zeep",
"multipart",
]
for name in libraries_info_logging:
logging.getLogger(name).setLevel(logging.INFO)
Expand Down
11 changes: 6 additions & 5 deletions operationsgateway_api/src/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,16 +38,17 @@ class ImageModel(BaseModel):

class WaveformModel(BaseModel):
id_: str = Field(alias="_id")
x: str
y: str
x: List[float]
y: List[float]

class Config:
arbitrary_types_allowed = True

@field_validator("x", "y", mode="before")
def encode_values(cls, value): # noqa: N805
if isinstance(value, np.ndarray):
return str(list(value))
return list(value)
else:
# Typically will be a string when putting waveform data into the model from
# results of a MongoDB query
return value


Expand Down
6 changes: 3 additions & 3 deletions operationsgateway_api/src/records/echo_interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ def __init__(self) -> None:
log.error(
"%s: %s. This error happened with bucket '%s'",
exc.response["Error"]["Code"],
exc.response["Error"]["Message"],
exc.response["Error"].get("Message"),
Config.config.images.image_bucket_name,
)
raise EchoS3Error(
Expand Down Expand Up @@ -70,7 +70,7 @@ def download_file_object(self, image_path: str) -> BytesIO:
log.error(
"%s: %s",
exc.response["Error"]["Code"],
exc.response["Error"]["Message"],
exc.response["Error"].get("Message"),
)
raise EchoS3Error(
f"{exc.response['Error']['Code']} when downloading file at"
Expand All @@ -96,7 +96,7 @@ def upload_file_object(self, image_object: BytesIO, image_path: str) -> None:
log.error(
"%s: %s",
exc.response["Error"]["Code"],
exc.response["Error"]["Message"],
exc.response["Error"].get("Message"),
)
raise EchoS3Error(
f"{exc.response['Error']['Code']} when uploading file at"
Expand Down
28 changes: 20 additions & 8 deletions operationsgateway_api/src/records/hdf_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,18 +46,30 @@ def extract_data(self) -> Tuple[RecordModel, List[WaveformModel], List[ImageMode
log.debug("Extracting data from HDF files")

metadata_hdf = dict(self.hdf_file.attrs)
timestamp_format = "%Y-%m-%dT%H:%M:%S%z"
try:
metadata_hdf["timestamp"] = datetime.strptime(
metadata_hdf["timestamp"],
DATA_DATETIME_FORMAT,
timestamp_format,
)
self.record_id = metadata_hdf["timestamp"].strftime(ID_DATETIME_FORMAT)
except ValueError as exc:
raise HDFDataExtractionError(
"Incorrect timestamp format for metadata timestamp. Use"
f" {ID_DATETIME_FORMAT} instead",
) from exc

except ValueError:
# Try using alternative timestamp format that might be used for older HDF
# files such as old Gemini test data
# TODO - when Gemini test data is no longer needed, remove this try/except
# block that attempts to convert the timestamp for a second time. Go
# straight to raising the exception instead
try:
metadata_hdf["timestamp"] = datetime.strptime(
metadata_hdf["timestamp"],
DATA_DATETIME_FORMAT,
)
except ValueError as exc:
raise HDFDataExtractionError(
"Incorrect timestamp format for metadata timestamp. Use"
f" {timestamp_format} instead",
) from exc

self.record_id = metadata_hdf["timestamp"].strftime(ID_DATETIME_FORMAT)
self.extract_channels()

try:
Expand Down
5 changes: 2 additions & 3 deletions operationsgateway_api/src/records/waveform.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import base64
from io import BytesIO
import json
import logging

import matplotlib.pyplot as plt
Expand Down Expand Up @@ -68,8 +67,8 @@ def _create_plot(self, buffer) -> None:
plt.xticks([])
plt.yticks([])
plt.plot(
json.loads(self.waveform.x),
json.loads(self.waveform.y),
self.waveform.x,
self.waveform.y,
linewidth=0.5,
)
plt.axis("off")
Expand Down
5 changes: 2 additions & 3 deletions operationsgateway_api/src/routes/ingest_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

from operationsgateway_api.src.auth.authorisation import authorise_route
from operationsgateway_api.src.channels.channel_manifest import ChannelManifest
from operationsgateway_api.src.config import Config
from operationsgateway_api.src.error_handling import endpoint_error_handling
from operationsgateway_api.src.records.hdf_handler import HDFDataHandler
from operationsgateway_api.src.records.image import Image
Expand Down Expand Up @@ -64,10 +65,8 @@ async def submit_hdf(
image.create_thumbnail()
record.store_thumbnail(image)

# Upload images to Echo S3 using a thread pool to be more efficent when uploading
# multiple images from the same HDF file
if len(image_instances) > 0:
pool = ThreadPool(processes=len(image_instances))
pool = ThreadPool(processes=Config.config.images.upload_image_threads)
pool.map(Image.upload_image, image_instances)

if stored_record:
Expand Down
Loading
Loading