Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Implement MultiIndex.is_monotonic_decreasing #17455

Merged
merged 1 commit into from
Sep 19, 2017

Conversation

jschendel
Copy link
Member

Note that this is a dupe PR of #16573, which appears to have gone stale.

Implementation ended up being easier than what I suggested in the issue; can use the fact that an index is monotonic decreasing if it's reverse is monotonic increasing. Just a matter of reversing the index and calling is_monotonic_increasing.

Regarding MultiIndex tests:

  • Added a test case to the is_monotonic_increasing test to verify it's working for an empty index.
  • The is_monotonic_decreasing test cases consist of decreasing versions of the test cases in the is_monotonic_increasing test.

Regarding IntervalIndex tests:

  • Added a test for IntervalIndex.is_monotonic_increasing and IntervalIndex.is_monotonic_decreasing since they use the MultiIndex implementation under the hood.
  • Split a test that originally checked both monotonic and is_unique into separate tests. Expanded the test cases for both tests.

@jreback
Copy link
Contributor

jreback commented Sep 7, 2017

lgtm. @TomAugspurger can you give a once over.

@TomAugspurger
Copy link
Contributor

This looks fine, though I wonder if we can mirror the implementation of is_monotonic_increasing and avoid the overhead of reversing the index?

    @property
    def is_monotonic_decreasing(self):
        """
        return if the index is monotonic decreasing (only equal or
        decreasing) values.
        """
        values = [self._get_level_values(i).values
                  for i in reversed(range(len(self.levels)))]
        try:
            sort_order = np.lexsort(values)
            return Index(sort_order).is_monotonic_decreasing
        except TypeError:

            # we have mixed types and np.lexsort is not happy
            return Index(self.values).is_monotonic_decreasing

@jschendel does that pass the tests you wrote, or am I missing something?

@jreback
Copy link
Contributor

jreback commented Sep 7, 2017

maybe make a helper function for this? (and just call in both)

e.g.

    def _is_monotonic_attribute(self, attr):
        values = [self._get_level_values(i).values
                  for i in reversed(range(len(self.levels)))]
        try:
            sort_order = np.lexsort(values)
            return getattr(Index(sort_order), attr)
        except TypeError:

            # we have mixed types and np.lexsort is not happy
            return getattr(Index(self.values), attr)

@TomAugspurger
Copy link
Contributor

Yeah, it's borderline too much duplication to implement them both like that. I could go either way, but would probably have a helper.

@jschendel
Copy link
Member Author

@TomAugspurger : That doesn't quite work if the MultiIndex has dupes, e.g. pd.MultiIndex.from_tuples([(3, 3), (2, 2), (2, 2)]). The issue is that np.lexsort keeps the the dupes in order, so sort_order ends up being [1 2 0] in the case of my example.

The workaround I came up with was to add a fake unique decreasing level to values to force uniqueness, then use the approach you suggested. I actually implemented this when I was working on a solution before I realized I could reverse.

I did some ad hoc timings to decide which implementation to use, and the overhead of reversing the index was actually less than the overhead of adding an additional level to sort by in np.lexsort once the MI got sufficiently large (fairly significant when I tested it on an MI with ~12 million elements). Reversing was slower for smaller MI, but the gap wasn't terribly large. Didn't do extensive timing though.

For reference, here's my implementation with the fake decreasing level to force uniqueness. Perhaps there are some optimizations I missed.

def is_monotonic_decreasing(self):
    """
    return if the index is monotonic decreasing (only equal or
    decreasing) values.
    """

    # initialize values as decreasing range to force uniqueness, needed
    # since lexsort() only sorts ascending and want dupes in decreasing order
    values = [np.arange(len(self) - 1, -1, -1)]

    # reversed() because lexsort() wants the most significant key last.
    values += [self._get_level_values(i).values
               for i in reversed(range(len(self.levels)))]
    try:
        sort_order = np.lexsort(values)
        return Index(sort_order).is_monotonic_decreasing
    except TypeError:

        # we have mixed types and np.lexsort is not happy
        return Index(self.values).is_monotonic_decreasing

I guess we could potentially use some type of heuristic based on size to choose between which method to use (similar to how isin is implemented iirc?) but that seems like it might just be unnecessary complexity.

@jreback
Copy link
Contributor

jreback commented Sep 7, 2017

@jschendel can you post an asv of the is_monotonic for various index types (maybe against 0.20.3)

@jschendel
Copy link
Member Author

@jreback : I haven't used asv before but it doesn't look too hard to use, so I should be able to get some benchmarks.

I'm a little confused as to what you're looking for? I'm not changing the is_monotonic or is_monotonic_increasing methods (dupes are handled fine in the increasing case, it's just decreasing where they cause an issue). Am I missing something?

Are you looking for an asv of is_monotonic and the two is_monotonic_decreasing methods I described (reversing and adding a fake level for uniqueness)? Then compare the two is_monotonic_decreasing methods against each other to see which is better, and both against is_monotonic to see how much of a perf hit is taken by having to account for decreasing?

@jreback
Copy link
Contributor

jreback commented Sep 7, 2017

@jschendel

I think we already have some asv's for is_monotonic and friends, just want to be sure we have coverage for MultiIndex.

@jreback
Copy link
Contributor

jreback commented Sep 17, 2017

@jschendel I think this was fine, can you rebase

@jschendel
Copy link
Member Author

@jreback : rebased

Regarding the asv's for is_monotonic: there is one for MultiIndex.is_monotonic. Looks like it's checking the non-mixed case (i.e. the try block). Couldn't find any other asv's for monotonic at all, regardless of index type. I might not be looking in the right place though?

@jreback
Copy link
Contributor

jreback commented Sep 18, 2017

that looks like the place; I think we should have asv for these properties. it can do later if u want

this PR.lgtm ping on green

Implemented MultiIndex.is_monotonic_decreasing, and added associated tests.  Also added tests for IntervalIndex.is_monotonic_decreasing, as it uses MultiIndex under the hood.
@codecov
Copy link

codecov bot commented Sep 19, 2017

Codecov Report

Merging #17455 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17455      +/-   ##
==========================================
- Coverage   91.22%    91.2%   -0.02%     
==========================================
  Files         163      163              
  Lines       49625    49625              
==========================================
- Hits        45270    45261       -9     
- Misses       4355     4364       +9
Flag Coverage Δ
#multiple 88.99% <100%> (ø) ⬆️
#single 40.19% <50%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/multi.py 96.9% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.77% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6630c4e...ae6320b. Read the comment docs.

@jschendel
Copy link
Member Author

ping @jreback : green

@TomAugspurger TomAugspurger merged commit 21a3800 into pandas-dev:master Sep 19, 2017
@TomAugspurger
Copy link
Contributor

Thanks @jschendel!

@jschendel jschendel deleted the mi-is_mono_dec branch September 19, 2017 22:37
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
Implemented MultiIndex.is_monotonic_decreasing, and added associated tests.  Also added tests for IntervalIndex.is_monotonic_decreasing, as it uses MultiIndex under the hood.
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Implemented MultiIndex.is_monotonic_decreasing, and added associated tests.  Also added tests for IntervalIndex.is_monotonic_decreasing, as it uses MultiIndex under the hood.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MultiIndex is_monotonic_decreasing is incorrect
3 participants