Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ticket/2454/rename/vbn/fields #2462

Merged
merged 10 commits into from
Jun 7, 2022
Merged

Conversation

danielsf
Copy link
Contributor

@danielsf danielsf commented Jun 3, 2022

This PR renames some fields in the NWB and metadata tables to bring the two datastreams into alignment.

It also adds the "strip-substructure" functionality to the "structure_acronym" column in the metadata tables.

@danielsf danielsf force-pushed the ticket/2454/rename/vbn/fields branch from 4e76aae to e1adc38 Compare June 6, 2022 20:31
@danielsf danielsf force-pushed the ticket/2454/rename/vbn/fields branch from e1adc38 to ad14c9f Compare June 6, 2022 20:33
(necessary to maintain backwards combatibility with VCN)
@danielsf danielsf force-pushed the ticket/2454/rename/vbn/fields branch from 1cc4625 to bbb2e7e Compare June 6, 2022 22:54
Copy link
Contributor

@aamster aamster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, with a few minor comments

any other float will provoke an error.
"""

if isinstance(acronym, numbers.Number):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the outer if can be safely removed since you are just checking if it is nan here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, numpy doesn't like running np.isnan on non-numbers

>>> import numpy as np
>>> np.isnan('abcd')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
>>> 

new_acronym = set()

for el in acronym:
if isinstance(el, str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this part can recursively call strip_substructure_acronym and add the results to a list if you feel comfortable writing that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch; done.

@@ -526,3 +529,38 @@ def _get_session_duration_from_behavior_session_ids(
index=pd.Int64Index([x['behavior_session_id'] for x in durations],
name='behavior_session_id'))
return durations


def strip_substructure_acronym_df(
Copy link
Contributor

@aamster aamster Jun 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this function is needed. Can you not just do

df[col_name] = df[col_name].apply(strip_substructure_acronym)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can; I just don't know that much pandas. I've made the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that apply is not an inplace operation. ie

units_table['structure_acronym'].apply(strip_substructure_acronym)

won't have any effect. It needs to be assigned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is why I hate pandas ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just running apply results in a series....

>>> import pandas as pd
>>> data = [{'a': 1, 'b': 2}, {'a': 2, 'b': 3}]
>>> df = pd.DataFrame(data=data)
>>> def fn(x):
...     return x**2
... 
>>> 
>>> new_df = df['a'].apply(fn)
>>> new_df
0    1
1    4
Name: a, dtype: int64

I'm just going to go back to my "implement a function to munge the structure acronym column." It was tested.

I really do not like pandas

Copy link
Contributor

@aamster aamster Jun 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to assign the output to a column, not to the dataframe, i.e

units_table['structure_acronym'] = units_table['structure_acronym'].apply(strip_substructure_acronym)

@danielsf danielsf force-pushed the ticket/2454/rename/vbn/fields branch 2 times, most recently from 603c881 to 7c5a68f Compare June 7, 2022 17:28
@danielsf danielsf force-pushed the ticket/2454/rename/vbn/fields branch from 7c5a68f to 1a7f9ca Compare June 7, 2022 20:23
@danielsf danielsf merged commit 92cd3a0 into vbn_2022_dev Jun 7, 2022
@danielsf danielsf mentioned this pull request Jun 7, 2022
3 tasks
@danielsf danielsf deleted the ticket/2454/rename/vbn/fields branch June 8, 2022 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants