-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
audformat.Table.update() too restrictive #61
Comments
There is also no easy workaround as we cannot simply replace the new scheme with the original one: db_new.schemes['s'] = db.schemes['s']
db.update(db_new) fails with
|
So what you suggest is that we do not raise an error as long as the scheme in |
Yes, I don't see a reason to raise an error as long as the new scheme is a subset of the old one. |
Is this the only corner case you see? |
If you disagree it would be nice to show what we can do instead ;) |
Don't know yet, I guess we will discover more when using it. |
No that does not work, since this will not update the |
We also might have the case where the new scheme contains a subset, but adds a new entry as well as we now have a new person. How would you handle such a case? |
Good question. Generally, we have to be careful. Otherwise it can happen that you merge schemes / columns that are completely unrelated. The current approach might be restrictive, but it is also secure :) |
Another problem is that a scheme might be referenced by several columns and updating one column can have unwanted side-effects on other columns. |
Maybe you are right and we should not change the current behavior, but then we need some explanation what to do instead. E.g. for me it's not clear how I can manage to join the two databases at the moment. |
I guess you can rename the scheme and column in the new database before merging. But that's probably not what you want :) Note that currently the update will also fail if you have the same scheme with a different description. |
That's fine with me and would be what I expect. |
The following workaround let's you update the database I provided in my example: db_new.schemes['s'] = db.schemes['s']
db_new['t'].df['s'] = db_new['t'].df['s'].astype(db['t'].df['s'].dtype)
db.update(db_new)
db['t'].df results in
So it's doable, but it doesn't look like we should expect from a user to do it manually as this also adds a burden when databases should be updated and published automatically. |
Next we investigate what happening if the new database has an overlap with the old scheme labels, but adds a new one as well: import audformat.testing
db = audformat.testing.create_db(minimal=True)
db.schemes['s'] = audformat.Scheme(str, labels=['1', '2'])
db['t'] = audformat.Table(audformat.filewise_index(['a', 'b']))
db['t']['s'] = audformat.Column(scheme_id='s')
db['t']['s'].set(['1', '2'])
db_new = audformat.testing.create_db(minimal=True)
db_new.schemes['s'] = audformat.Scheme(str, labels=['1', '3'])
db_new['t'] = audformat.Table(audformat.filewise_index(['c', 'd']))
db_new['t']['s'] = audformat.Column(scheme_id='s')
db_new['t']['s'].set(['1', '3']) |
Here we can update with: import pandas as pd
common_schemes = list(set(db.schemes['s'].labels + db_new.schemes['s'].labels))
db.schemes['s'] = audformat.Scheme(str, labels=common_schemes)
db_new.schemes['s'] = audformat.Scheme(str, labels=common_schemes)
dtype = pd.api.types.CategoricalDtype(categories=common_schemes, ordered=False)
db['t'].df['s'] = db['t'].df['s'].astype(dtype)
db_new['t'].df['s'] = db_new['t'].df['s'].astype(dtype)
db.update(db_new)
db['t'].df will result in
|
There is also an easy solution to the discussed problem, but I'm not sure if you can use this in all cases. import audformat.testing
db = audformat.testing.create_db(minimal=True)
db.schemes['s'] = audformat.Scheme('str')
db['t'] = audformat.Table(audformat.filewise_index(['a', 'b']))
db['t']['s'] = audformat.Column(scheme_id='s')
db['t']['s'].set(['1', '2'])
db_new = audformat.testing.create_db(minimal=True)
db_new.schemes['s'] = audformat.Scheme('str')
db_new['t'] = audformat.Table(audformat.filewise_index(['c']))
db_new['t']['s'] = audformat.Column(scheme_id='s')
db_new['t']['s'].set(['1'])
db.update(db_new) works without any errors. The problem is that you might want to use labels to store extra information, e.g. age of a speaker. In the proposed solution above this would no longer be possible and you would have to add the extra information as columns. |
We now have the Example 1 db = audformat.testing.create_db(minimal=True)
db.schemes['s'] = audformat.Scheme(str, labels=['1', '2'])
db['t'] = audformat.Table(audformat.filewise_index(['a', 'b']))
db['t']['s'] = audformat.Column(scheme_id='s')
db['t']['s'].set(['1', '2'])
db_new = audformat.testing.create_db(minimal=True)
db_new.schemes['s'] = audformat.Scheme(str, labels=['1'])
db_new['t'] = audformat.Table(audformat.filewise_index(['c']))
db_new['t']['s'] = audformat.Column(scheme_id='s')
db_new['t']['s'].set(['1']) to update we can do: db_new.schemes['s'].replace_labels(db.schemes['s'].labels)
db.update(db_new) Example 2 db = audformat.testing.create_db(minimal=True)
db.schemes['s'] = audformat.Scheme(str, labels=['1', '2'])
db['t'] = audformat.Table(audformat.filewise_index(['a', 'b']))
db['t']['s'] = audformat.Column(scheme_id='s')
db['t']['s'].set(['1', '2'])
db_new = audformat.testing.create_db(minimal=True)
db_new.schemes['s'] = audformat.Scheme(str, labels=['1', '3'])
db_new['t'] = audformat.Table(audformat.filewise_index(['c', 'd']))
db_new['t']['s'] = audformat.Column(scheme_id='s')
db_new['t']['s'].set(['1', '3']) to update we can do: common_labels = list(set(db.schemes['s'].labels + db_new.schemes['s'].labels))
db.schemes['s'].replace_labels(common_labels)
db_new.schemes['s'].replace_labels(common_labels)
db.update(db_new) |
The solution also works with dictionary as labels, e.g. Example 3 db = audformat.testing.create_db(minimal=True)
db.schemes['s'] = audformat.Scheme(str, labels={'1': 'speaker1', '2': 'speaker2'})
db['t'] = audformat.Table(audformat.filewise_index(['a', 'b']))
db['t']['s'] = audformat.Column(scheme_id='s')
db['t']['s'].set(['1', '2'])
db_new = audformat.testing.create_db(minimal=True)
db_new.schemes['s'] = audformat.Scheme(str, labels={'1': 'speaker1', '3': 'speaker3'})
db_new['t'] = audformat.Table(audformat.filewise_index(['c', 'd']))
db_new['t']['s'] = audformat.Column(scheme_id='s')
db_new['t']['s'].set(['1', '3']) to update we can do: common_labels = {**db.schemes['s'].labels, **db_new.schemes['s'].labels}
db.schemes['s'].replace_labels(common_labels)
db_new.schemes['s'].replace_labels(common_labels)
db.update(db_new) |
I just realized that we have functions in audformat.utils.join_schemes([db, db_new], 's')
db.update(db_new) which is straightforward, you just need to find it. |
At the moment we require that the scheme of a table that is use to update an existing one has to match, but there are certain scenarios where it makes totally sense that it is only included, but does not match.
E.g. consider the following scenario that matches speaker ID labels:
this fails with
The text was updated successfully, but these errors were encountered: