Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform back the predicted quantities into proper the space for real data #200

Merged
merged 4 commits into from
Jul 18, 2024

Conversation

TjarkMiener
Copy link
Member

When predicting on real data, we predict in the "space" based on the selected Transform in dl1dh during training with MCs. In order to transform back the predicted quantities, this PR is needed.


Requires DL1DH#140

@TjarkMiener
Copy link
Member Author

@BastienLacave For a (non-negligible) number of the files produced by you, the length of the pointing table is not equal to the number of cosmic events (i.e. length of the dl1b parameter table). I thought the interpolation of the pointing already took place inside the ctapipe-process tool and therefore we can assume the table lengths to be matching. Once you have time can you please check whether the files were produced correctly? Example file: "/fefs/aswg/workspace/bastien.lacave/ctlearnLSTData/DL1/02929/LST-1.1.Run02929.0018.r1.dl1.h5"

@TjarkMiener
Copy link
Member Author

TjarkMiener commented Jul 17, 2024

I have checked one of the affected files and they do seems okay. In a sense that the range of time stamps overlays. However, the pointing table is just shorter. I checked the ctapipe_io_lst code and interpolation should have happen, so I don't know why the table lengths are not matching. I could only thing of a corrupted drive report where the interpolation failed. In case we can't ensure the same table lengths here (interpolation happen apriori dl1dh+ctlearn), we would need to try to run the ctapipe PointingInterpolator here in the output_handler.py and the plugin later. @maxnoe can you please confirm whether we can assume apriori interpolation of the pointing in ctapipe_io_lst or not?

@maxnoe
Copy link
Contributor

maxnoe commented Jul 17, 2024

Hi @TjarkMiener

Recent version of ctapipe have changed how pointing data is handled.

See the following issues and change log notes:

Changelog of 0.21:
https://ctapipe.readthedocs.io/en/latest/changelog.html#ctapipe-v0-21-0-2024-04-25

Issues:

PRs:

So we don't store event wise pointing information in output files anymore, neither for simulations (where it was constant) nor for observations (where it was interpolated from a much coarser pointing table).

Instead, we added support for interpolating pointing when reading back dl1 / dl2 files (both in TableLoader and HDF5EventSource.

This expectes a monitoring pointing table at /dl0/monitoring/telescope/pointing/tel_xxx.

You can find the code I used to attach LST pointing reports to ctapipe dl1 files here:
https://gitlab.cta-observatory.org/mlinhoff/acada_lst_dl0_datachecks/-/blob/main/add_pointing.py?ref_type=heads

@TjarkMiener
Copy link
Member Author

Thanks a lot for the confirmation @maxnoe! So we need to use ctapipe PointingInterpolator here as well.

Needed to append the info to properly store it in the dl2 panda df
@TjarkMiener TjarkMiener merged commit 93f59b8 into master Jul 18, 2024
8 checks passed
@TjarkMiener TjarkMiener deleted the real_data_dl2 branch July 18, 2024 11:40
@TjarkMiener
Copy link
Member Author

Hi @maxnoe
We reduced real LST-1 data (v6.0.0) with the latest ctapipe version and attached the pointing with the script from the ACADA-Integration. We also merged the subruns into run-wised DL1 files. It turns out that the pointing interpolation is the bottleneck in the processing. Since this is a CPU-based operation we probably want this to run independently from the model inference which can be run on GPUs (but also on CPUs). This is a rather crucial for the plugin design. We could perform the model inference and temporary store the prediction in spherical offsets. Then the plugin reads this file and transform back the predicted quantities into the proper alt/az. Or we could store the spherical offsets directly into the dl2 format attached with meta data telling the TableLoader and HDF5EventSource to do the operation while reading dl2 data. Is something like foreseen in the dl2 data model, i.e. storing alt and az in different frames?

@maxnoe
Copy link
Contributor

maxnoe commented Jul 25, 2024

It turns out that the pointing interpolation is the bottleneck in the processing.

I don't know what you are doing, but I can hardly believe that the interpolation here is really the bottleneck.

I performance tested this just now, and for me interpolating for a realistic test case (2 hours of pointing data, one sample every 2 seconds) interpolating 10000 / 100000 events that are randomly distributed in 15 minutes takes < 1 ms.

So I don't see how the interpolation can be a bottleneck in any meaningful evaluation of a machine learning model.

In [38]: from astropy.time import Time
In [39]: from astropy.table import Table
In [40]: from ctapipe.io.pointing import PointingInterpolator
In [41]: import numpy as np
In [42]: t = Time.now() + np.arange(0, 7200, 2) * u.s
In [44]: alt = np.linspace(40, 50, len(t)) * u.deg
In [45]: az = np.linspace(0, 30, len(t)) * u.deg
In [46]: table = Table({"time": t, "altitude": alt, "azimuth": az})
In [47]: interpolator = PointingInterpolator()
In [48]: interpolator.add_table(1, table)
In [49]: event_t = (t[0] + np.random.uniform(0, 15 * 60, 100000) * u.s).sort()
In [50]: %timeit interpolator(tel_id=1, time=event_t)
898 µs ± 862 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Is something like foreseen in the dl2 data model, i.e. storing alt and az in different frames?

You mean storing the reconstructed direction in an other frame than AltAz? No that is not foreseen. But since many algorithms will predict the direction in e.g. NominalFrame, we might think about allowing that and moving the transformation to AltAz into common code in apply_models

@TjarkMiener
Copy link
Member Author

Ohhh, thanks a lot for this code snippet @maxnoe. Made me realize that one can provide the interpolator with an array. So the bottleneck was a loop ;-). I will fix that here so my comment about the dl2 data model is redundant. Sorry for the noise and thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants