Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LinAlgError encountered in processing MT data #233

Closed
jiajiasun opened this issue Oct 19, 2022 · 2 comments
Closed

LinAlgError encountered in processing MT data #233

jiajiasun opened this issue Oct 19, 2022 · 2 comments
Assignees

Comments

@jiajiasun
Copy link

Hello,

I got this error message 'LinAlgError: Singular Matrix' when processing MT data from stations "NMY23" "NMY24" in Survey "CONUS South". Below are some codes and two screenshots that might help understand the error.

mth5_run_summary = RunSummary()
mth5_run_summary.from_mth5s([mth5_path,])
run_summary = mth5_run_summary.clone()
run_summary.mini_summary

kernel_dataset = KernelDataset()
kernel_dataset.from_run_summary(run_summary,"NMY23","NMY24")
kernel_dataset.drop_runs_shorter_than(10000)
kernel_dataset.mini_summary
Capture

Capture2

@kkappler
Copy link
Collaborator

The last runs in both "NMY23" and "NMY24" are identically zero-valued in the time series
image
image

These make up the majority of the data going into the processing. The edf method seems to wind up down-weightng all the real data after about 8 iterations.

There are several fixes for this, with the most preferable being to not admit runs that have all zero value into the processing in the first place. I will see what can be done about tagging all-zero runs as invalid in the processing_summary dataframe.

@kkappler kkappler self-assigned this Sep 15, 2023
kkappler added a commit that referenced this issue Sep 15, 2023
This is a temporary workaround, which can also double as a long term fix.
The logic is to call the new run_summary.check_runs_are_valid(drop=True)
after the run_summary is created, but before the kernel_dataset.
In this way, runs that are all zero will not have a chance to enter process_mth5.

The only reason to not use this in general is that it requires non-lazy access to the
time series, so there is a time and processing cost.
kkappler added a commit that referenced this issue Sep 15, 2023
- Add column_names method to WideScaleTest
- Use column_names method in 04
- Update 04 nan handling to include <NA> now that columns have dtype
- add run_summary validation to 04 to prevent issue #233
@kkappler
Copy link
Collaborator

Zero-runs are treated as invalid. This functionality is now in mtpy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants