Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Selector for for latest version when many provided #279

Open
AlxEnashi opened this issue Apr 23, 2024 · 3 comments
Open

[Feature] Selector for for latest version when many provided #279

AlxEnashi opened this issue Apr 23, 2024 · 3 comments

Comments

@AlxEnashi
Copy link
Contributor

Is your feature request related to a problem? Please describe.
The database may have many values for a veature differenciated by version of the feature extractor (this may happen with classifications and magstats and corrections?)
The behaviour right now is not centralized and not completly clear.

Describe the solution you'd like
A consisten way to select the version tu use to display data musy be choosen.

Describe alternatives you've considered
Save a list with the different versions ordered by time or relevancy. If a query return many object return the one with the greates index in the list of versions. If not in the list, its always -1.
Note. the special case for lightcurve classifier and its features must be considered (exclude features version 23..

Additional context
This happen for example selecting the period for a lightcurve.
The object ZTF20aawwxkg have many periods calculated
image
The lightcurve uses 0.9980563820532297. Why?
image

@AlxEnashi AlxEnashi added the enhancement New feature or request label Apr 23, 2024
@ale-munozarancibia
Copy link

This problem also affects classifiers.

When a new classifier and/or feature computation go into production, and an object has a previous classification and/or feature, then the Explorer shows more than one value for it, but it should show the latest one instead.

Example: ZTF22abyhaut

If I search ZTF22abyhaut in the Explorer, "Object ID" search filter, classifier "Stamp Classifier", output shows 2 rows with different highest probability classes (SN and bogus) and highest probabilities (0.537 and 0.415). A query to the database shows that these correspond to classifier versions "1.0.1" and "stamp_classifier_1.0.4" respectively. Although 2 results were shown in the search output, pressing on any of them leads to the same result https://alerce.online/object/ZTF22abyhaut, where bogus is the highest probability class (even when I pressed on the "SN" result row).

For this object, the same happens for the classifier "Lc Classifier" search: it gives 2 rows as a result, corresponding to versions "hierarchical_rf_1.1.0" and "lc_classifier_1.1.13".

This object has a period computed as a feature and displayed when selecting "Folded" in the light curve panel. It shows one value, but a query to the database shows that there are 2 "Multiband_period" computations, with values 0.330039 and 1.0, corresponding to versions "lc_classifier_1.2.1-P-transitional" and 23.12.25 respectively. It is not clear how the Explorer selects which one it will show.

The Explorer should show results only for the latest version. This includes the stamp classifier, the light curve classifier (and its branches), and features (currently only multiband period available in the Explorer).

@AlxEnashi AlxEnashi self-assigned this Apr 29, 2024
@github-project-automation github-project-automation bot moved this to 🆕 New in ALeRCE Project Apr 29, 2024
@AlxEnashi AlxEnashi removed the enhancement New feature or request label Apr 29, 2024
@AlxEnashi AlxEnashi moved this from 🆕 New to 🏗 In progress in ALeRCE Project May 27, 2024
@AlxEnashi
Copy link
Contributor Author

Many solutions were considered.

In my opinion, to keet the most features of the search and api system, the refactor should be at database query level. Considering this I propose to use the taxonomy table to filter the last version of each classifier when searching and a similar table (even the same works) to fiter last version of features.
In the case of features is not that critical because there are no search querys over the table. So a backend solution could be implemented, but for consistency y suppor using the same solution for probabilities and for features.

@ale-munozarancibia
Copy link

New proposal, based on discussions with @AlxEnashi:

  • In the Explorer results table, add a column at the end indicating the version of the classifier. This will indicate to the user why rows are duplicated.
  • In the Explorer probabilities plot (after an object was selected from the list) show only results for the highest priority version. This version will be usually the latest available for the object, and will be shown even if the user selected a row with a different version (this mismatch occurs currently in the Explorer, but there are no indications of what version was chosen to be displayed in the plot).
    The ideal solution, i.e. having no duplicates in the results table, is not feasible now because of pagination problems. A proper solution must be discussed in detail before moving to LSST.

@AlxEnashi AlxEnashi removed their assignment Aug 9, 2024
@AlxEnashi AlxEnashi moved this from 🏗 In progress to 🆕 New in ALeRCE Project Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🆕 New
Development

No branches or pull requests

2 participants