[Feature] Selector for for latest version when many provided #279

AlxEnashi · 2024-04-23T19:45:08Z

Is your feature request related to a problem? Please describe.
The database may have many values for a veature differenciated by version of the feature extractor (this may happen with classifications and magstats and corrections?)
The behaviour right now is not centralized and not completly clear.

Describe the solution you'd like
A consisten way to select the version tu use to display data musy be choosen.

Describe alternatives you've considered
Save a list with the different versions ordered by time or relevancy. If a query return many object return the one with the greates index in the list of versions. If not in the list, its always -1.
Note. the special case for lightcurve classifier and its features must be considered (exclude features version 23..

Additional context
This happen for example selecting the period for a lightcurve.
The object ZTF20aawwxkg have many periods calculated

The lightcurve uses 0.9980563820532297. Why?

ale-munozarancibia · 2024-04-23T20:17:49Z

This problem also affects classifiers.

When a new classifier and/or feature computation go into production, and an object has a previous classification and/or feature, then the Explorer shows more than one value for it, but it should show the latest one instead.

Example: ZTF22abyhaut

If I search ZTF22abyhaut in the Explorer, "Object ID" search filter, classifier "Stamp Classifier", output shows 2 rows with different highest probability classes (SN and bogus) and highest probabilities (0.537 and 0.415). A query to the database shows that these correspond to classifier versions "1.0.1" and "stamp_classifier_1.0.4" respectively. Although 2 results were shown in the search output, pressing on any of them leads to the same result https://alerce.online/object/ZTF22abyhaut, where bogus is the highest probability class (even when I pressed on the "SN" result row).

For this object, the same happens for the classifier "Lc Classifier" search: it gives 2 rows as a result, corresponding to versions "hierarchical_rf_1.1.0" and "lc_classifier_1.1.13".

This object has a period computed as a feature and displayed when selecting "Folded" in the light curve panel. It shows one value, but a query to the database shows that there are 2 "Multiband_period" computations, with values 0.330039 and 1.0, corresponding to versions "lc_classifier_1.2.1-P-transitional" and 23.12.25 respectively. It is not clear how the Explorer selects which one it will show.

The Explorer should show results only for the latest version. This includes the stamp classifier, the light curve classifier (and its branches), and features (currently only multiband period available in the Explorer).

AlxEnashi · 2024-06-10T20:49:44Z

Many solutions were considered.

In my opinion, to keet the most features of the search and api system, the refactor should be at database query level. Considering this I propose to use the taxonomy table to filter the last version of each classifier when searching and a similar table (even the same works) to fiter last version of features.
In the case of features is not that critical because there are no search querys over the table. So a backend solution could be implemented, but for consistency y suppor using the same solution for probabilities and for features.

ale-munozarancibia · 2024-06-21T14:00:45Z

New proposal, based on discussions with @AlxEnashi:

In the Explorer results table, add a column at the end indicating the version of the classifier. This will indicate to the user why rows are duplicated.
In the Explorer probabilities plot (after an object was selected from the list) show only results for the highest priority version. This version will be usually the latest available for the object, and will be shown even if the user selected a row with a different version (this mismatch occurs currently in the Explorer, but there are no indications of what version was chosen to be displayed in the plot).
The ideal solution, i.e. having no duplicates in the results table, is not feasible now because of pagination problems. A proper solution must be discussed in detail before moving to LSST.

AlxEnashi added the enhancement New feature or request label Apr 23, 2024

AlxEnashi self-assigned this Apr 29, 2024

AlxEnashi added this to ALeRCE Project Apr 29, 2024

github-project-automation bot moved this to 🆕 New in ALeRCE Project Apr 29, 2024

AlxEnashi removed the enhancement New feature or request label Apr 29, 2024

AlxEnashi moved this from 🆕 New to 🏗 In progress in ALeRCE Project May 27, 2024

AlxEnashi removed their assignment Aug 9, 2024

AlxEnashi moved this from 🏗 In progress to 🆕 New in ALeRCE Project Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Selector for for latest version when many provided #279

[Feature] Selector for for latest version when many provided #279

AlxEnashi commented Apr 23, 2024

ale-munozarancibia commented Apr 23, 2024

AlxEnashi commented Jun 10, 2024

ale-munozarancibia commented Jun 21, 2024

[Feature] Selector for for latest version when many provided #279

[Feature] Selector for for latest version when many provided #279

Comments

AlxEnashi commented Apr 23, 2024

ale-munozarancibia commented Apr 23, 2024

AlxEnashi commented Jun 10, 2024

ale-munozarancibia commented Jun 21, 2024