Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IntSet to DataStructures #114

Closed
wants to merge 3 commits into from
Closed

Add IntSet to DataStructures #114

wants to merge 3 commits into from

Conversation

mbauman
Copy link
Contributor

@mbauman mbauman commented Aug 9, 2015

This moves my work at JuliaLang/julia#10065 into DataStructures. I don't export the name DataStructures.IntSet since I'm not sure what the best way to do this would be. Please feel free to push modifications to this branch. We could either rename the type in DataStructures, or we could export it (forcing folks to manually choose Base.IntSet or DataStructures.IntSet), or we could leave it as is (which means that you use Base.IntSet by default unless you explicitly import DataStructures.IntSet).

Note that this is the pure Julia refactor, not the current Base implementation (which relies upon some C code in base). That means that it's a little slower in a few cases, but for the most part it's very competitive or way faster.

We could also easily add a second BitVector to the type in the future to add support for negative Ints, making this a true IntSet (well, except we still wouldn't be able to reach typemax(Int) or typemin(Int), but you can't do that on a 64 bit machine and it's a terrible idea in any case).

@kmsquire
Copy link
Member

kmsquire commented Aug 9, 2015

Cc: @StephenVavasis

(Some branches are missing coverage since they would require 2^32 or 2^64 bits of storage in order to test completely)
@mbauman
Copy link
Contributor Author

mbauman commented Aug 9, 2015

Note that this behaves way better than Base with regards to complement IntSets, which is precisely why I wanted to eliminate complement:

Base:

julia> c = complement(IntSet(0:255))
WARNING: storing zero in IntSets is deprecated
 in depwarn at /Users/mbauman/Applications/julia-0.4/lib/julia/sys.dylib
 in push! at /Users/mbauman/Applications/julia-0.4/lib/julia/sys.dylib
 in call at intset.jl:10
while loading no file, in expression starting on line 0
WARNING: complement IntSets are deprecated
 in depwarn at /Users/mbauman/Applications/julia-0.4/lib/julia/sys.dylib
 in complement! at deprecated.jl:732
 in complement at deprecated.jl:739
while loading no file, in expression starting on line 0
IntSet([256, ..., 9223372036854775806])

julia> pop!(c, 256)
ERROR: InexactError()
 in sizehint! at /Users/mbauman/Applications/julia-0.4/lib/julia/sys.dylib
 in pop! at intset.jl:80
 in pop! at intset.jl:101

julia> pop!(c, 256)
256

julia> c
IntSet([384, ..., 9223372036854775806])

julia> first(c)
ERROR: ArgumentError: set must be non-empty
 in first at intset.jl:186

julia> last(c)
ERROR: ArgumentError: set has no last element
 in last at intset.jl:202

This PR:

julia> using DataStructures
       const IntSet = DataStructures.IntSet
DataStructures.IntSet

julia> c = complement(IntSet(0:255))
IntSet([256, 257, 258, ..., 9223372036854775806])

julia> pop!(c, 256)
256

julia> c
IntSet([257, 258, 259, ..., 9223372036854775806])

julia> first(c)
257

julia> last(c)
9223372036854775806

@StephenVavasis
Copy link
Contributor

It looks good to me, but shouldn't it be documented in DataStructures' README.rst? Can the IntSet documentation from the Julia manual just be copied and pasted?

@mbauman
Copy link
Contributor Author

mbauman commented Aug 9, 2015

Yes, definitely. The language should be tightened up as compared to the base documentation, but it can serve as a good start.

This version doesn't hold non-negative integers, it holds Ints from 0 to typemax(Int)-1. It's pedantic, but having that well-defined makes the implementation and behavior much more sane.

@mbauman
Copy link
Contributor Author

mbauman commented Aug 9, 2015

Alright, I've added a blurb to README.rst.

Any thoughts about the name and/or export? It's actually a sorted set — should it be called SortedIntSet? That would avoid the clash with base. I don't feel strongly any which way.

@StephenVavasis
Copy link
Contributor

Matt,

IntSet is not a sorted set in the sense of the sorted containers of DataStructures.jl. A sorted container has the property that its entries can be enumerated in sort-order in time proportional to O(n) or O(n log n), where n is the number entries. To enumerate an IntSet in order requires O(m), where m is the maximum entry.

But meanwhile, I am confused about the general question of renaming. I thought that the point of moving IntSet to DataStructures was because you and others are designing a new 'core' bitvector type for managing indices, but you want to leave the old IntSet around for users who depend on it in their codes. In this case, we should preserve the same name IntSet and give a different name to the new bitvector type?

@mbauman
Copy link
Contributor Author

mbauman commented Aug 9, 2015

The elements are enumerated in sorted order, but you're right that the algorithmic complexities are different from the other Sorted* types here.

Like I said, I don't feel strongly. I'm just trying to find a solution that will work the best during this migration time where there's both Base.IntSet and an implementation here. As I see it, the options are:

  • Call this DataStructures.IntSet, but don't export it. This means that users that run into the IntSet deprecations on master will have to using DataStructures; import DataStructures.IntSet in order to get the old un-deprecated behaviors.
  • Call this DataStructures.IntSet and export it. This means that all users who use both DataStructures and IntSet will need to explicitly choose which one they want, whether they need complements and stored zeros or not. This seems messy to me.
  • Call this something else entirely. As folks run into deprecation warnings in Julia, they can be pointed here and rename their type to the exported name in DataStructures (whatever that may be).

In any of the cases above, compatibility is pretty straight-forward… just use DataStructures version if you want stored zeros and complements. Otherwise, Base's version will do you just fine.

Eventually, Base's name will be changed to IndexSet, but that rename will probably wait until the implementation changes.

@tkelman
Copy link
Contributor

tkelman commented Aug 9, 2015

UIntSet ?

@mbauman
Copy link
Contributor Author

mbauman commented Aug 9, 2015

I'd rather an Int16Set that covered both positive and negative numbers near zero. There are 2^64 UInt64s, which means we can't use a bit vector to address them all with an Int64 index. It also makes things like length of complements messy - do they return an Int128?

@StephenVavasis
Copy link
Contributor

I vote for Matt's first option (call it DataStructures.IntSet but don't export it). It is easier to add two statements to each module than to change a name everywhere it occurs. In addition, that option is easier on people writing packages that are supposed to support Julia 0.3, 0.4 and 0.5. Package writers could simply start their program with

@compat import DataStructures.IntSet

and the compat package would do the right thing. (@stevengj?)

@mbauman
Copy link
Contributor Author

mbauman commented Aug 9, 2015

No need for @compat since this is in a package. If you want to use this just update your packages and use it! I agree that it seems like the path of least resistance — it's what's implemented and documented right now.

@StephenVavasis
Copy link
Contributor

Matt,

I think that compat or something similar will be needed for a package-writer who wants to use IntSet and wants his/her program to work correctly for 0.3, 0.4, 0.5. In 0.3 and earlier 0.4 masters, IntSet is in Base. In upcoming 0.4 masters, it will issue deprecation warnings, In 0.5, it may be gone from Base completely and only in DataStructures. So it is seems like some metaprogramming is necessary for a package that uses IntSet and is supposed to work in all three Julia versions. Note that my sorted containers code falls into this category.

@mbauman
Copy link
Contributor Author

mbauman commented Aug 10, 2015

Right, my point is that regardless of the Julia version, folks should just choose to use this implementation if they rely on 0s or complements. In fact, that's what happens for the sorted containers here. On both 0.3 and 0.4, they use this implementation.

There's no reason to use Base.IntSet on 0.3 and this on 0.4 -- just use this on both! Then even if it disappears from base in 0.5, you'll already have been using the replacement.

@mbauman
Copy link
Contributor Author

mbauman commented Sep 6, 2015

As far as I'm concerned, this is ready to merge. I'm not a heavy user of or contributor to DataStructures, however, so I'd prefer someone else to review and pull the button here.

@hayd
Copy link
Contributor

hayd commented Oct 5, 2015

There also a warning during testing atm

WARNING: stored zeros in IntSet is deprecated

@kmsquire
Copy link
Member

kmsquire commented Nov 6, 2015

Hi @mbauman, sorry to ignore this for so long. Can you fix the conflicts, and I'll merge?

@kmsquire
Copy link
Member

kmsquire commented Jan 2, 2016

Bump.

@kmsquire
Copy link
Member

Merged this in #160.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants