Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review Role of local_run_obj in export_tf #181

Closed
kkappler opened this issue May 30, 2022 · 2 comments
Closed

Review Role of local_run_obj in export_tf #181

kkappler opened this issue May 30, 2022 · 2 comments

Comments

@kkappler
Copy link
Collaborator

kkappler commented May 30, 2022

In process_mth5.py there is a note that says to do this review.

Basically, there is the following snippet of code being executed after the pipeline to create the mt_metadata TF object:

        tf_cls = export_tf(
            tf_collection,
            station_metadata_dict=station_metadata.to_dict(),
            survey_dict=survey_dict
        )

The tf_collection is an aurora data structure that tracks the TF values, per decimation level. These TFs can be made out of many runs.

So, we need to make sure that the TF knows which runs were used to generate it

in the current code, we are doing this:

local_run_obj = dataset_df["run"].iloc[0]
station_metadata = local_run_obj.station_group.metadata
station_metadata._runs = []
run_metadata = local_run_obj.metadata
station_metadata.add_run(run_metadata)

What this means is that only the first run is being scraped for metadata here. But it looks like there is facility for adding the other run metadata.
Here is a comment from the code in this area:

# There is a container that can handle storage of multiple runs in xml, Anna made something like this.
# N.B. Currently, only the last run makes it into the tf object,
# but we can simply iterate of the run list here, getting run metadata
# station_metadata.add_run(run_metadata)

So I will try implementing this iterator.

kkappler added a commit that referenced this issue May 30, 2022
@kkappler
Copy link
Collaborator Author

The code should look something like this, and be made a method of DatasetDefinition, called something like:

dataset_definition.get_station_metadata_for_tf_archive()

        #get a list of local runs:
        cond1 = dataset_df["station_id"]==processing_config.stations.local.id
        sub_df = dataset_df[cond1]
        #sanity check:
        run_ids = sub_df.run_id.unique()
        assert(len(run_ids) == len(sub_df))
        # iterate over these runs, packing metadata into 
        station_metadata = None
        for i,row in sub_df.iterrows():
            local_run_obj = row.run
            if station_metadata is None:
                station_metadata = local_run_obj.station_group.metadata
                station_metadata._runs = []
            run_metadata = local_run_obj.metadata
            station_metadata.add_run(run_metadata)

That will replace this block of code:

        station_metadata = local_run_obj.station_group.metadata
        station_metadata._runs = []
        run_metadata = local_run_obj.metadata
        station_metadata.add_run(run_metadata)

However, testing this method is premature since we need to first add a test that processes multiple runs.

kkappler added a commit that referenced this issue May 30, 2022
kkappler added a commit that referenced this issue Jun 25, 2022
Add a method to KernelDataset to extract run info, looping over runs.
Also, noticed that some synthetic tests were commented out, fixed this.
Also, tidied some code in process_mth5.

[Issue(s): #181]
@kkappler
Copy link
Collaborator Author

PR 187 solves this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant