-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Input dataset format #12
Comments
No problem at all. # Original data
transactions = [('eggs', 'bacon', 'soup'),
('eggs', 'bacon', 'apple'),
('soup', 'bacon', 'banana')]
# Convert to panas.DataFrame
df = pd.DataFrame(transactions)
# Convert back to list of tuples
transactions_from_df = [tuple(row) for row in df.values.tolist()]
# They are equal, so this evaluates to True
assert transactions == transactions_from_df A list of lists will also work, it doesn't have to be a list of tuples. |
Thank you @tommyod this looks great - how would you suggest dealing with NaN values? When feeding my df directly to apriori() I get the error: I can use your code above to transform into a list, but in my data I have a couple of baskets which are huge, leading to many 'nan' values in the lists, will these have an adverse effect on the results? |
NaN likely represents nothing, so convert |
Cool, thank you - should this help anyone else in the future, here is the method I used to remove nans from lists of varying sizes:
|
Hi there - love your work on this package! I have a question regarding input datasets, in your example this is a list of tuples, but is it possible to work with dataframes too? What are the restrictions around input data?
Many thanks,
Ben
The text was updated successfully, but these errors were encountered: