-
Notifications
You must be signed in to change notification settings - Fork 399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Validating multi-label text classification records makes new copies of them instead of updating #3265
Comments
@marcelbusch do you know for sure that you are the only one working on this. What version of the server and Python package are you using? I think it might be good to evaluate the query syntax w.r.t. the query bahaviour. Are you using it correctly? |
@marcelbusch described in our query syntax overview you would need to query a bit differently in that case "nurnberg AND demonstration AND wegstrecke" should be what you intend to achieve, correct? w.r.t. the duplicating records, is that still happening? Could you show me a video? |
No, when I connect them with AND it's still the same behaviour... Yes that's still happening, here's a video: argilla-duplication.mp4 |
Thanks @marcelbusch, this really helps us. |
Hi @marcelbusch, we cannot reproduce this behaviour. Could your provide us with more context w.r.t. your data? Also, could you run |
Thanks for your feedback @marcelbusch! Is also happening when you validate a single record (without bulk validation action) Can you share also the extra record info (you can access it by clicking the |
I took a look in my debugger at this point: updateDatasetRecords and in the record that get's passed in there when I click on a label the id is incorrect, in fact it seems to be rounded to the nearest 100: the record has an initial id of 1478720431326732291, the record that gets passed in the updateDatasetRecords function has the id 1478720431326732300 |
Thanks for all this info @marcelbusch. It's really helpful. We'll work on fixing that. As a temporal workaround, I think you can set the record id as a string instead of a number. This should avoid this problem. |
I've been doing some test and it looks like is a javascript limit Using curl: curl -X 'POST' \
"http://localhost:6900/api/datasets/test-dataset/TextClassification:search?include_metrics=false&workspace=argilla&limit=50&from=0" \
-H 'accept: application/json' \
-H 'X-Argilla-Api-Key: argilla.apikey' \
-H 'Content-Type: application/json' \
-d '{}'
{"total":1,"records":[{"id":10805720881385292014,"status":"Default","metrics":{},"last_updated":"2023-06-30T13:23:45.462574","inputs":{"additionalProp1":"string","additionalProp3":"string","additionalProp2":"string"},"multi_label":true}],"aggregations":{"predicted_as":{},"annotated_as":{},"annotated_by":{},"predicted_by":{},"status":{"Default":1},"predicted":{},"score":{},"words":{"string":1},"metadata":{}}} @leiyre @damianpumar @keithCuniah Any ideas? |
<!-- Thanks for your contribution! As part of our Community Growers initiative 🌱, we're donating Justdiggit bunds in your name to reforest sub-Saharan Africa. To claim your Community Growers certificate, please contact David Berenstein in our Slack community or fill in this form https://tally.so/r/n9XrxK once your PR has been merged. --> # Description This PR deprecates the integer support for record id. Also, warn if big integers are used as record id Refs #3265 **Type of change** (Please delete options that are not relevant. Remember to title the PR according to the type of change) **How Has This Been Tested** After the changes the, issue cannot be reproduced anymore. Tested in dev environments **Checklist** - [ ] I added relevant documentation - [X] follows the style guidelines of this project - [X] I did a self-review of my code - [ ] I made corresponding changes to the documentation - [ ] My changes generate no new warnings - [X] I have added tests that prove my fix is effective or that my feature works - [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK) (see text above) - [X] I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/) --------- Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>
Describe the bug
In multi-label text classification, whenever I annotate records, they get copied to a validated version, but the original record stays the same. So I have a dataset with 1000 records, and after annotating 20 of them, I now have 1020 records in my dataset, the 1000 records with status "default" and the 20 annotated records. Is this expected behaviour? Because like this I can't filter for records I haven't annotated yet, when I filter for status:default I still get all 1000 records.
I run argilla and elasticsearch with the supplied docker-compose file without any modifications.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
I expect the original records to change status, so after annotating 20 records I should still have 1000 records in my dataset, now 980 with status "default" and 20 with status "validated"
Environment (please complete the following information):
Edit: Update on some more behaviour:
translateY
)The text was updated successfully, but these errors were encountered: