Feature Request: regex for drop #4818

jseabold · 2013-09-11T19:29:48Z

Don't have time to implement this, but I wanted to float the idea and park it. It's pretty trivial and you can achieve the same thing with filter, but it might be nice if drop had a regex keyword. E.g., these would be equivalent

df = df.filter(regex="^(?!var_start)")
df = df.drop(regex="^var_start", axis=1)

The text was updated successfully, but these errors were encountered:

nehalecky · 2013-09-12T01:21:11Z

Nice. Perhaps on a more fundamental level, we could simply expose the Series.str methods to all pandas index classes and allow for pattern searches across labels that way? I tried explaining (rather poorly) use cases for this type of functionality a while back, don't really know how well it came across:
#2922 (comment)

jtratner · 2013-09-12T11:16:45Z

We've talked a few times about moving min, max, & friends to a mixin so they can be used for Index as well as series, etc. We could try to do the same thing for str too.

jseabold · 2013-09-21T18:30:53Z

I'm often doing things like

var_names = df.filter(regex="pat").columns.tolist()

Would be great if I could just do df.columns.select("pat").tolist() or something.

jreback · 2013-09-21T18:32:04Z

fyi...you don't need the tolist usually, as the returned index is already pretty list-like

jreback · 2013-09-21T18:32:30Z

wouldn't df.columns.filter('pat') be better?

jseabold · 2013-09-21T18:37:51Z

Yeah, sure. Except regex isn't the default arg for filter (unfortunately). It's what I want 95% of the time.

Also, not list-like enough for me

["a"] + pd.Index(["b", "c"])

jreback · 2014-03-11T02:35:21Z

@jseabold

I think its trivial to allow filter to work (on the specified axis, defaults to 1, e.g. columns) on a frame
with a regex if it is passed directly, e.g.

df.filter('a_reg_ex'), while if its list-like it will match exactly, e.g. df.filter(['A','B'])

is their a reason didn't do this before? (this is not even a big API change and is backwards compatible)

at the same time is their a reason at all for select? (which filter actually uses, but can be folded in)

@hayd @jorisvandenbossche

hayd · 2014-03-11T03:41:57Z

Is a slight API change, as atm you can do (though I guess this is break-able):

In [11]: df = pd.DataFrame([[1, 2], [1, 3], [5, 6]], columns=['A', 'B'])

In [12]: df.filter('AB')  # equivalent to ['A', 'B']
Out[12]:
   A  B
0  1  2
1  1  3
2  5  6

In [13]: df.filter(regex='AB')  # works
Out[13]:
Empty DataFrame
Columns: []
Index: [0, 1, 2]

I had no idea what like arg did without checking source, it's basically just a subset of regex (in the spirit of SQL's like)...

👍 on taking string regex or list-like or a crit (a la select), and maybe dep the other args. I reckon alias select to filter and depreciate it (but not remove it). Happy to take this and do drop at the same time.

jreback · 2014-03-11T03:46:42Z

Go for it!

hayd · 2014-03-11T04:18:38Z

Ah, like is subtley different in that it converts stuff to string first (regex raises horribly if there is any non-strings)... e.g. if you have df.columns = [0, 'a'] should regex='0' find the first col? (basically should we convert to string), should there be a way to choose not to grab ints?

hayd · 2014-03-11T04:22:40Z

Also, select says above it "# TODO: Check if this was clearer in 0.12"...

hayd · 2014-03-11T05:15:30Z

This function can be significantly simplified, and also work with dupe colnames.

Any thoughts on a good argname (items is not that great)?

    def filter(self, items, axis=None, **kwargs):
        """
        Restrict the info axis to set of items or wildcard

        Parameters
        ----------
        items : Either function, regex or list-like
            Boolean function to be called on each index (label)
            Regular expression to be tested against each index
            List of info axis to restrict to

        axis : int

maybe labels (a la drop)

jreback · 2014-03-11T11:03:25Z

crit? matcher? selection?

hayd · 2014-03-11T16:26:20Z

I was thinking of unifying the args for filter and drop, so perhaps makes sense to use drop's args. see PR

Is there an argument to keep filter rather than select?

jreback · 2014-03-11T16:28:20Z

in theory 1 should apply to data (think query/select)
and 1 labels

i think somewhat arbitrary but filter already does labels

jreback · 2014-03-11T16:29:06Z

also may want to post on mailing list to get some more feedback on this once u have a nice proposed API

tik0 · 2019-03-13T07:57:37Z

I just want to leave a simple one-liner (almost, if on emerges everything together) here, how one could do this. Drop all columns starting with "enc_":

import pandas as pd
import re
def filter(df, regex):
    matches = [match[0] for match in [re.findall( regex, col) for col in df.columns] if len(match)]
    return df.drop(matches, axis=1)
d = {'enc_1': [1, 2], 'enc_2': [3, 4], 'dec_2': [3, 4]}
df = pd.DataFrame(data=d)
filter(df, r'enc_.*')

WillAyd · 2019-03-14T04:10:30Z

@tik0 you can use a regex to do that. Something like:

df.filter(regex=r'^(?!enc_)')

@jreback this is a pretty old issue but I think it's duplicative of what's available already in filter. Any reason to keep this one open?

MarcoGorelli · 2021-03-21T15:59:30Z

closing as there hasn't been any uptake here (and I agree with the assessment that df.filter(regex=r'^(?!enc_)') is simple enough), though please do ping if you have use-case where that isn't practical

jreback modified the milestones: 0.15.0, 0.14.0 Mar 11, 2014

hayd mentioned this issue Mar 11, 2014

CLN/API refactor drop and filter and deprecate select #6599

Closed

jreback modified the milestones: 0.14.1, 0.14.0 May 5, 2014

jreback modified the milestones: 0.15.0, 0.14.1 Jun 5, 2014

jreback modified the milestones: 0.15.0, 0.15.1 Jul 6, 2014

jreback modified the milestones: 0.16, 0.15.0 Sep 14, 2014

jreback removed this from the 0.16 milestone Oct 7, 2014

jreback modified the milestones: 0.15.1, 0.16 Oct 7, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

TomAugspurger added the good first issue label Oct 11, 2017

jreback removed the Difficulty Novice label Dec 15, 2017

datapythonista modified the milestones: Contributions Welcome, Someday Jul 8, 2018

simonjayhawkins removed the good first issue label Apr 25, 2020

rhshadrach added the Closing Candidate May be closeable, needs more eyeballs label Feb 12, 2021

MarcoGorelli closed this as completed Mar 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: regex for drop #4818

Feature Request: regex for drop #4818

jseabold commented Sep 11, 2013

nehalecky commented Sep 12, 2013

jtratner commented Sep 12, 2013

jseabold commented Sep 21, 2013

jreback commented Sep 21, 2013

jreback commented Sep 21, 2013

jseabold commented Sep 21, 2013

jreback commented Mar 11, 2014

hayd commented Mar 11, 2014

jreback commented Mar 11, 2014

hayd commented Mar 11, 2014

hayd commented Mar 11, 2014

hayd commented Mar 11, 2014

jreback commented Mar 11, 2014

hayd commented Mar 11, 2014

jreback commented Mar 11, 2014

jreback commented Mar 11, 2014

tik0 commented Mar 13, 2019 •

edited

Loading

WillAyd commented Mar 14, 2019

MarcoGorelli commented Mar 21, 2021

Feature Request: regex for drop #4818

Feature Request: regex for drop #4818

Comments

jseabold commented Sep 11, 2013

nehalecky commented Sep 12, 2013

jtratner commented Sep 12, 2013

jseabold commented Sep 21, 2013

jreback commented Sep 21, 2013

jreback commented Sep 21, 2013

jseabold commented Sep 21, 2013

jreback commented Mar 11, 2014

hayd commented Mar 11, 2014

jreback commented Mar 11, 2014

hayd commented Mar 11, 2014

hayd commented Mar 11, 2014

hayd commented Mar 11, 2014

jreback commented Mar 11, 2014

hayd commented Mar 11, 2014

jreback commented Mar 11, 2014

jreback commented Mar 11, 2014

tik0 commented Mar 13, 2019 • edited Loading

WillAyd commented Mar 14, 2019

MarcoGorelli commented Mar 21, 2021

tik0 commented Mar 13, 2019 •

edited

Loading