Add new function to remove duplicate rows from a DataFrame #319

wesm · 2011-11-01T23:20:43Z

Should be reasonably performant, probably just use sets + np.apply_along_axis

The text was updated successfully, but these errors were encountered:

seanjtaylor · 2011-11-01T23:34:22Z

Not sure if this should be a 1) dedicated DataFrame method, 2) a method on DataFrameGroupBy, or 3) call to aggregate with a specific function.

# 1
df = DataFrame()
new_df = df.remove_duplicates(('key1', 'key2'))

#2
new_df = df.groupby(('key1', 'key2')).first(check_identical=True) # take only the first row for each group, make sure all other rows with this key have the same values

#3
new_df = df.groupby(('key1', 'key2')).aggregate(first_row) # some function that takes the first row for each group

wesm · 2011-11-01T23:44:26Z

Here's an easy and fast implementation using existing tools:

grouped = df.groupby(keys)
index = [gp_keys[0] for gp_keys in grouped.groups.values()]
new_df = df.reindex(index)

maybe possible to go even faster, I'll play around with it

seanjtaylor · 2011-11-02T19:46:18Z

This is exactly what I was looking for. And I learned something useful about pandas internals today.

…licate rows, GH #319

wesm added a commit that referenced this issue Nov 7, 2011

ENH: DataFrame.drop_duplicates and DataFrame.duplicated to remove dup…

b95c905

…licate rows, GH #319

wesm closed this as completed Nov 7, 2011

dan-nadler pushed a commit to dan-nadler/pandas that referenced this issue Sep 23, 2019

Support for audit logs in ChunkStore (pandas-dev#319)

c5951f0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new function to remove duplicate rows from a DataFrame #319

Add new function to remove duplicate rows from a DataFrame #319

wesm commented Nov 1, 2011

seanjtaylor commented Nov 1, 2011

wesm commented Nov 1, 2011

seanjtaylor commented Nov 2, 2011

Add new function to remove duplicate rows from a DataFrame #319

Add new function to remove duplicate rows from a DataFrame #319

Comments

wesm commented Nov 1, 2011

seanjtaylor commented Nov 1, 2011

wesm commented Nov 1, 2011

seanjtaylor commented Nov 2, 2011