A mutable, self-balancing interval tree for Python 2 and 3. Queries may be by point, by range overlap, or by range envelopment.
This library was designed to allow tagging text and time intervals, where the intervals include the lower bound but not the upper bound.
pip install intervaltree
- Supports Python 2.7 and Python 3.4+ (Tested under 2.7, and 3.4 thru 3.6)
- Initializing
- blank
tree = IntervalTree()
- from an iterable of
Interval
objects (tree = IntervalTree(intervals)
) - from an iterable of tuples
(
tree = IntervalTree.from_tuples(interval_tuples)
)
- blank
- Insertions
tree[begin:end] = data
tree.add(interval)
tree.addi(begin, end, data)
- Deletions
tree.remove(interval)
(raisesValueError
if not present)tree.discard(interval)
(quiet if not present)tree.removei(begin, end, data)
(short fortree.remove(Interval(begin, end, data))
)tree.discardi(begin, end, data)
(short fortree.discard(Interval(begin, end, data))
)tree.remove_overlap(point)
tree.remove_overlap(begin, end)
(removes all overlapping the range)tree.remove_envelop(begin, end)
(removes all enveloped in the range)
- Overlap queries
tree[point]
tree[begin:end]
tree.search(point)
tree.search(begin, end)
- Envelop queries
tree.search(begin, end, strict=True)
- Membership queries
interval_obj in tree
(this is fastest, O(1))tree.containsi(begin, end, data)
tree.overlaps(point)
tree.overlaps(begin, end)
- Iterable
for interval_obj in tree:
tree.items()
- Sizing
len(tree)
tree.is_empty()
not tree
tree.begin()
(thebegin
coordinate of the leftmost interval)tree.end()
(theend
coordinate of the rightmost interval)
- Set-like operations
- union
result_tree = tree.union(iterable)
result_tree = tree1 | tree2
tree.update(iterable)
tree |= other_tree
- difference
result_tree = tree.difference(iterable)
result_tree = tree1 - tree2
tree.difference_update(iterable)
tree -= other_tree
- intersection
result_tree = tree.intersection(iterable)
result_tree = tree1 & tree2
tree.intersection_update(iterable)
tree &= other_tree
- symmetric difference
result_tree = tree.symmetric_difference(iterable)
result_tree = tree1 ^ tree2
tree.symmetric_difference_update(iterable)
tree ^= other_tree
- comparison
tree1.issubset(tree2)
ortree1 <= tree2
tree1 <= tree2
tree1.issuperset(tree2)
ortree1 > tree2
tree1 >= tree2
tree1 == tree2
- union
- Restructuring
chop(begin, end)
(slice intervals and remove everything betweenbegin
andend
, optionally modifying the data fields of the chopped-up intervals)slice(point)
(slice intervals atpoint
)split_overlaps()
(slice at all interval boundaries, optionally modifying the data field)merge_overlaps()
(joins overlapping intervals into a single interval, optionally merging the data fields)merge_equals()
(joins intervals with matching ranges into a single interval, optionally merging the data fields)
- Copying and typecasting
IntervalTree(tree)
(Interval
objects are same as those in tree)tree.copy()
(Interval
objects are shallow copies of those in tree)set(tree)
(can later be fed intoIntervalTree()
)list(tree)
(ditto)
- Pickle-friendly
- Automatic AVL balancing
Getting started
>>> from intervaltree import Interval, IntervalTree >>> t = IntervalTree() >>> t IntervalTree()
Adding intervals - any object works!
>>> t[1:2] = "1-2" >>> t[4:7] = (4, 7) >>> t[5:9] = {5: 9}
Query by point
The result of a query is a
set
object, so if ordering is important, you must sort it first.>>> sorted(t[6]) [Interval(4, 7, (4, 7)), Interval(5, 9, {5: 9})] >>> sorted(t[6])[0] Interval(4, 7, (4, 7))
Query by range
Note that ranges are inclusive of the lower limit, but non-inclusive of the upper limit. So:
>>> sorted(t[2:4]) []
But:
>>> sorted(t[1:5]) [Interval(1, 2, '1-2'), Interval(4, 7, (4, 7))]
Accessing an
Interval
object>>> iv = Interval(4, 7, (4, 7)) >>> iv.begin 4 >>> iv.end 7 >>> iv.data (4, 7) >>> begin, end, data = iv >>> begin 4 >>> end 7 >>> data (4, 7)
Constructing from lists of intervals
We could have made a similar tree this way:
>>> ivs = [(1, 2), (4, 7), (5, 9)] >>> t = IntervalTree( ... Interval(begin, end, "%d-%d" % (begin, end)) for begin, end in ivs ... )
Or, if we don't need the data fields:
>>> t2 = IntervalTree(Interval(*iv) for iv in ivs)
Or even:
>>> t2 = IntervalTree.from_tuples(ivs)
Removing intervals
>>> t.remove( Interval(1, 2, "1-2") ) >>> sorted(t) [Interval(4, 7, '4-7'), Interval(5, 9, '5-9')] >>> t.remove( Interval(500, 1000, "Doesn't exist")) # raises ValueError Traceback (most recent call last): ValueError >>> t.discard(Interval(500, 1000, "Doesn't exist")) # quietly does nothing >>> del t[5] # same as t.remove_overlap(5) >>> t IntervalTree()
We could also empty a tree entirely:
>>> t2.clear() >>> t2 IntervalTree()
Or remove intervals that overlap a range:
>>> t = IntervalTree([ ... Interval(0, 10), ... Interval(10, 20), ... Interval(20, 30), ... Interval(30, 40)]) >>> t.remove_overlap(25, 35) >>> sorted(t) [Interval(0, 10), Interval(10, 20)]
We can also remove only those intervals completely enveloped in a range:
>>> t.remove_envelop(5, 20) >>> sorted(t) [Interval(0, 10)]
Chopping
We could also chop out parts of the tree:
>>> t = IntervalTree([Interval(0, 10)]) >>> t.chop(3, 7) >>> sorted(t) [Interval(0, 3), Interval(7, 10)]
To modify the new intervals' data fields based on which side of the interval is being chopped:
>>> def datafunc(iv, islower): ... oldlimit = iv[islower] ... return "oldlimit: {0}, islower: {1}".format(oldlimit, islower) >>> t = IntervalTree([Interval(0, 10)]) >>> t.chop(3, 7, datafunc) >>> sorted(t)[0] Interval(0, 3, 'oldlimit: 10, islower: True') >>> sorted(t)[1] Interval(7, 10, 'oldlimit: 0, islower: False')
Slicing
You can also slice intervals in the tree without removing them:
>>> t = IntervalTree([Interval(0, 10), Interval(5, 15)]) >>> t.slice(3) >>> sorted(t) [Interval(0, 3), Interval(3, 10), Interval(5, 15)]
You can also set the data fields, for example, re-using
datafunc()
from above:>>> t = IntervalTree([Interval(5, 15)]) >>> t.slice(10, datafunc) >>> sorted(t)[0] Interval(5, 10, 'oldlimit: 15, islower: True') >>> sorted(t)[1] Interval(10, 15, 'oldlimit: 5, islower: False')
See the issue tracker on GitHub.
- Eternally Confuzzled's AVL tree
- Wikipedia's Interval Tree
- Heavily modified from Tyler Kahn's Interval Tree implementation in Python (GitHub project)
- Incorporates contributions from:
- konstantint/Konstantin Tretyakov of the University of Tartu (Estonia)
- siniG/Avi Gabay
- lmcarril/Luis M. Carril of the Karlsruhe Institute for Technology (Germany)
- Chaim-Leib Halbert, 2013-2017
- Modifications, Konstantin Tretyakov, 2014
Licensed under the Apache License, version 2.0.
The source code for this project is at https://github.com/chaimleib/intervaltree
- Dropped support for Python 2.6, 3.2, and 3.3
- Add support for Python 3.5 and 3.6
- Faster
Interval
overlap checking (@tuxzz, #56) - Updated README:
- new restructuring methods from 2.1.0
- example of
from_tuples()
added - more info about
chop()
,split_overlaps()
,merge_overlaps()
andmerge_equals()
.
- Fixes:
Node.from_tuples()
will now raise an error if given an empty iterable. This should never happen, and it should error if it does.Interval.distance_to()
gave an incorrect distance when passed theInterval
's upper boundaryNode.pop_greatest_child()
sometimes forgot torotate()
when creating new child nodes. (@escalonn, #41, #42)IntervalTree.begin()
andend()
are O(1), not O(n). (@ProgVal, #40)
- Maintainers:
- use github.com/kennethreitz/pyandoc
- reorganize tests
- more tests added to improve code coverage (We're at 95%! Woohoo!)
- test for issue #4 had a broken import reference
- Added:
merge_overlaps()
method and testsmerge_equals()
method and testsrange()
methodspan()
method, for returning the difference betweenend()
andbegin()
- Fixes:
- Development version numbering is changing to be compliant with PEP440. Version numbering now contains major, minor and micro release numbers, plus the number of builds following the stable release version, e.g. 2.0.4b34
- Speed improvement:
begin()
andend()
methods used iterativemin()
andmax()
builtins instead of the more efficientiloc
member available toSortedDict
overlaps()
method used to returnTrue
even if provided null test interval
- Maintainers:
- Added coverage test (
make coverage
) with html report (htmlcov/index.html
) - Tests run slightly faster
- Added coverage test (
- Fix: Issue #27: README incorrectly showed using a comma instead of a
colon when querying the
IntervalTree
: it showedtree[begin, end]
instead oftree[begin:end]
- Fix: README showed using + operator for setlike union instead of the correct | operator
- Removed tests from release package to speed up installation; to get the tests, download from GitHub
- Fix: Issue #20: performance enhancement for large trees.
IntervalTree.search()
made a copy of the entireboundary_table
resulting in linear search time. Thesortedcollections
package is now the sole install dependency
- Fix: Issue #26: failed to prune empty
Node
after a rotation promoted contents ofs_center
IntervalTree
now supports the fullcollections.MutableSet
API- Added:
__delitem__
toIntervalTree
Interval
comparison methodslt()
,gt()
,le()
andge()
toInterval
, as an alternative to the comparison operators, which are designed for sortingIntervalTree.from_tuples(iterable)
IntervalTree.clear()
IntervalTree.difference(iterable)
IntervalTree.difference_update(iterable)
IntervalTree.union(iterable)
IntervalTree.intersection(iterable)
IntervalTree.intersection_update(iterable)
IntervalTree.symmetric_difference(iterable)
IntervalTree.symmetric_difference_update(iterable)
IntervalTree.chop(a, b)
IntervalTree.slice(point)
- Deprecated
IntervalTree.extend()
-- useupdate()
instead - Internal improvements:
- More verbose tests with progress bars
- More tests for comparison and sorting behavior
- Code in the README is included in the unit tests
- Fixes
- BACKWARD INCOMPATIBLE: On ranged queries where
begin >= end
, the query operated on the overlaps ofbegin
. This behavior was documented as expected in 1.x; it is now changed to be more consistent with the definition ofInterval
s, which are half-open. - Issue #25: pruning empty Nodes with staggered descendants could result in invalid trees
- Sorting
Interval
s and numbers in the same list gathered all the numbers at the beginning and theInterval
s at the end IntervalTree.overlaps()
and friends returnedNone
instead ofFalse
- Maintainers:
make install-testpypi
failed because thepip
was missing a--pre
flag
- BACKWARD INCOMPATIBLE: On ranged queries where
- Removed requirement for pyandoc in order to run functionality tests.
- Added ability to use
Interval.distance_to()
with points, not justIntervals
- Added documentation on return types to
IntervalTree
andInterval
Interval.__cmp__()
works with points too- Fix:
IntervalTree.score()
returned maximum score of 0.5 instead of 1.0. Now returns max of subscores instead of avg - Internal improvements:
- Development version numbering scheme, based on
git describe
the "building towards" release is appended after a hyphen, eg. 1.0.2-37-g2da2ef0-1.10. The previous tagged release is 1.0.2, and there have been 37 commits since then, current tag is g2da2ef0, and we are getting ready for a 1.1.0 release - Optimality tests added
Interval
overlap tests for ranges,Interval
s and points added
- Development version numbering scheme, based on
-Bug fixes: - Node.depth_score_helper()
raised AttributeError
-
README formatting
- Fix: pip install failure because of failure to generate README.rst
- Renamed from PyIntervalTree to intervaltree
- Speed improvements for adding and removing Intervals (~70% faster than 0.4)
- Bug fixes:
- BACKWARD INCOMPATIBLE:
len()
of anInterval
is always 3, reverting to default behavior fornamedtuples
. In Python 3,len
returning a non-integer raises an exception. Instead, useInterval.length()
, which returns 0 for null intervals andend - begin
otherwise. Also, if thelen() === 0
, thennot iv
isTrue
. - When inserting an
Interval
via__setitem__
and improper parameters given, all errors were transformed toIndexError
split_overlaps
did not update theboundary_table
counts
- BACKWARD INCOMPATIBLE:
- Internal improvements:
- More robust local testing tools
- Long series of interdependent tests have been separated into sections
- Faster balancing (~80% faster)
- Bug fixes:
- Double rotations were performed in place of a single rotation when presented an unbalanced Node with a balanced child.
- During single rotation, kept referencing an unrotated Node instead of the new, rotated one
- Made IntervalTree crash if inited with a null Interval (end <= begin)
- IntervalTree raises ValueError instead of AssertionError when a null Interval is inserted
- Support for Python 3.2+ and 2.6+
- Changed license from LGPL to more permissive Apache license
- Merged changes from https://github.com/konstantint/PyIntervalTree to
https://github.com/chaimleib/PyIntervalTree
- Interval now inherits from a namedtuple. Benefits: should be faster. Drawbacks: slight behavioural change (Intervals not mutable anymore).
- Added float tests
- Use setup.py for tests
- Automatic testing via travis-ci
- Removed dependency on six
- Interval improvements:
- Intervals without data have a cleaner string representation
- Intervals without data are pickled more compactly
- Better hashing
- Intervals are ordered by begin, then end, then by data. If data is not orderable, sorts by type(data)
- Bug fixes:
- Fixed crash when querying empty tree
- Fixed missing close parenthesis in examples
- Made IntervalTree crash earlier if a null Interval is added
- Internals:
- New test directory
- Nicer display of data structures for debugging, using custom test/pprint.py (Python 2.6, 2.7)
- More sensitive exception handling
- Local script to test in all supported versions of Python
- Added IntervalTree.score() to measure how optimally a tree is structured
- Slight changes for inclusion in PyPI.
- Some documentation changes
- Added tests
- Bug fix: interval addition via [] was broken in Python 2.7 (see http://bugs.python.org/issue21785)
- Added intervaltree.bio subpackage, adding some utilities for use in bioinformatics