Transform back the predicted quantities into proper the space for real data #200

TjarkMiener · 2024-07-16T08:30:59Z

When predicting on real data, we predict in the "space" based on the selected Transform in dl1dh during training with MCs. In order to transform back the predicted quantities, this PR is needed.

Requires DL1DH#140

TjarkMiener · 2024-07-16T13:39:59Z

@BastienLacave For a (non-negligible) number of the files produced by you, the length of the pointing table is not equal to the number of cosmic events (i.e. length of the dl1b parameter table). I thought the interpolation of the pointing already took place inside the ctapipe-process tool and therefore we can assume the table lengths to be matching. Once you have time can you please check whether the files were produced correctly? Example file: "/fefs/aswg/workspace/bastien.lacave/ctlearnLSTData/DL1/02929/LST-1.1.Run02929.0018.r1.dl1.h5"

TjarkMiener · 2024-07-17T07:01:09Z

I have checked one of the affected files and they do seems okay. In a sense that the range of time stamps overlays. However, the pointing table is just shorter. I checked the ctapipe_io_lst code and interpolation should have happen, so I don't know why the table lengths are not matching. I could only thing of a corrupted drive report where the interpolation failed. In case we can't ensure the same table lengths here (interpolation happen apriori dl1dh+ctlearn), we would need to try to run the ctapipe PointingInterpolator here in the output_handler.py and the plugin later. @maxnoe can you please confirm whether we can assume apriori interpolation of the pointing in ctapipe_io_lst or not?

maxnoe · 2024-07-17T08:00:39Z

Hi @TjarkMiener

Recent version of ctapipe have changed how pointing data is handled.

See the following issues and change log notes:

Changelog of 0.21:
https://ctapipe.readthedocs.io/en/latest/changelog.html#ctapipe-v0-21-0-2024-04-25

Issues:

PRs:

Change how DataWriter writes simulation pointing cta-observatory/ctapipe#2438

So we don't store event wise pointing information in output files anymore, neither for simulations (where it was constant) nor for observations (where it was interpolated from a much coarser pointing table).

Instead, we added support for interpolating pointing when reading back dl1 / dl2 files (both in TableLoader and HDF5EventSource.

This expectes a monitoring pointing table at /dl0/monitoring/telescope/pointing/tel_xxx.

You can find the code I used to attach LST pointing reports to ctapipe dl1 files here:
https://gitlab.cta-observatory.org/mlinhoff/acada_lst_dl0_datachecks/-/blob/main/add_pointing.py?ref_type=heads

TjarkMiener · 2024-07-17T12:20:44Z

Thanks a lot for the confirmation @maxnoe! So we need to use ctapipe PointingInterpolator here as well.

Needed to append the info to properly store it in the dl2 panda df

TjarkMiener · 2024-07-25T07:29:40Z

Hi @maxnoe
We reduced real LST-1 data (v6.0.0) with the latest ctapipe version and attached the pointing with the script from the ACADA-Integration. We also merged the subruns into run-wised DL1 files. It turns out that the pointing interpolation is the bottleneck in the processing. Since this is a CPU-based operation we probably want this to run independently from the model inference which can be run on GPUs (but also on CPUs). This is a rather crucial for the plugin design. We could perform the model inference and temporary store the prediction in spherical offsets. Then the plugin reads this file and transform back the predicted quantities into the proper alt/az. Or we could store the spherical offsets directly into the dl2 format attached with meta data telling the TableLoader and HDF5EventSource to do the operation while reading dl2 data. Is something like foreseen in the dl2 data model, i.e. storing alt and az in different frames?

maxnoe · 2024-07-25T09:05:04Z

It turns out that the pointing interpolation is the bottleneck in the processing.

I don't know what you are doing, but I can hardly believe that the interpolation here is really the bottleneck.

I performance tested this just now, and for me interpolating for a realistic test case (2 hours of pointing data, one sample every 2 seconds) interpolating 10000 / 100000 events that are randomly distributed in 15 minutes takes < 1 ms.

So I don't see how the interpolation can be a bottleneck in any meaningful evaluation of a machine learning model.

In [38]: from astropy.time import Time
In [39]: from astropy.table import Table
In [40]: from ctapipe.io.pointing import PointingInterpolator
In [41]: import numpy as np
In [42]: t = Time.now() + np.arange(0, 7200, 2) * u.s
In [44]: alt = np.linspace(40, 50, len(t)) * u.deg
In [45]: az = np.linspace(0, 30, len(t)) * u.deg
In [46]: table = Table({"time": t, "altitude": alt, "azimuth": az})
In [47]: interpolator = PointingInterpolator()
In [48]: interpolator.add_table(1, table)
In [49]: event_t = (t[0] + np.random.uniform(0, 15 * 60, 100000) * u.s).sort()
In [50]: %timeit interpolator(tel_id=1, time=event_t)
898 µs ± 862 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Is something like foreseen in the dl2 data model, i.e. storing alt and az in different frames?

You mean storing the reconstructed direction in an other frame than AltAz? No that is not foreseen. But since many algorithms will predict the direction in e.g. NominalFrame, we might think about allowing that and moving the transformation to AltAz into common code in apply_models

TjarkMiener · 2024-07-25T09:19:19Z

Ohhh, thanks a lot for this code snippet @maxnoe. Made me realize that one can provide the interpolator with an array. So the bottleneck was a loop ;-). I will fix that here so my comment about the dl2 data model is redundant. Sorry for the noise and thanks again!

TjarkMiener added 2 commits July 16, 2024 09:34

fix direction predictions for real data

9022c01

fix energy predictions for real data

f5eddf3

TjarkMiener requested review from BastienLacave and rcervinoucm July 16, 2024 08:30

maxnoe mentioned this pull request Jul 16, 2024

Update build configuration, use setuptools_scm #199

Merged

TjarkMiener added 2 commits July 17, 2024 15:47

add pointing interpolator

6d94cbc

fix storage of trigger info in dl2

7880056

Needed to append the info to properly store it in the dl2 panda df

rcervinoucm approved these changes Jul 18, 2024

View reviewed changes

TjarkMiener merged commit 93f59b8 into master Jul 18, 2024
8 checks passed

TjarkMiener deleted the real_data_dl2 branch July 18, 2024 11:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transform back the predicted quantities into proper the space for real data #200

Transform back the predicted quantities into proper the space for real data #200

TjarkMiener commented Jul 16, 2024

TjarkMiener commented Jul 16, 2024

TjarkMiener commented Jul 17, 2024 •

edited

Loading

maxnoe commented Jul 17, 2024

TjarkMiener commented Jul 17, 2024

TjarkMiener commented Jul 25, 2024

maxnoe commented Jul 25, 2024 •

edited

Loading

TjarkMiener commented Jul 25, 2024

Transform back the predicted quantities into proper the space for real data #200

Transform back the predicted quantities into proper the space for real data #200

Conversation

TjarkMiener commented Jul 16, 2024

TjarkMiener commented Jul 16, 2024

TjarkMiener commented Jul 17, 2024 • edited Loading

maxnoe commented Jul 17, 2024

TjarkMiener commented Jul 17, 2024

TjarkMiener commented Jul 25, 2024

maxnoe commented Jul 25, 2024 • edited Loading

TjarkMiener commented Jul 25, 2024

TjarkMiener commented Jul 17, 2024 •

edited

Loading

maxnoe commented Jul 25, 2024 •

edited

Loading