Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implemented SparseIntSet #533

Merged
merged 25 commits into from
Oct 4, 2019
Merged

Conversation

louisponet
Copy link
Contributor

I've implemented the SparseIntSet datastructure as was talked about in #532
I based the tests and functionality on that from IntSet, but I wasn't sure how to do symdiff, and also I'm not sure if it even makes sense in this case.

I also added some documentation.

Let me know if there are any things that need to be improved!

Cheers

@codecov
Copy link

codecov bot commented Sep 28, 2019

Codecov Report

Merging #533 into master will increase coverage by 0.45%.
The diff coverage is 96.84%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #533      +/-   ##
==========================================
+ Coverage   87.36%   87.81%   +0.45%     
==========================================
  Files          31       32       +1     
  Lines        1978     2076      +98     
==========================================
+ Hits         1728     1823      +95     
- Misses        250      253       +3
Impacted Files Coverage Δ
src/sparse_int_set.jl 96.84% <96.84%> (ø)
src/queue.jl 100% <0%> (ø) ⬆️
src/stack.jl 100% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 46f87f1...e002e38. Read the comment docs.

Copy link
Member

@oxinabox oxinabox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very exciting

Looks pretty good.
I've just done a surface lever review.
I need to take another round at it and look up logic after these changes are in

src/sparse_int_set.jl Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/DataStructures.jl Outdated Show resolved Hide resolved
Copy link
Member

@oxinabox oxinabox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now finished reviewing the code as it is.
I will review the tests later

src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
@louisponet
Copy link
Contributor Author

Okay I think I implemented all your suggestions and made it mutable again. I also made cleaned up cleanup! adding an elseif any(iszero, s.counters) just so the cleanup only actually happens when needed.

if pageid > length(s.reverse)
diff = pageid - length(s.reverse)

resize!(s.reverse, pageid - 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better not to do that? push! will resize for you.
And it has heuristics to better handle the resizing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand what you mean, do you mean to move assure! into push! I guess that's ok since I don't think it's used anywhere else

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I mean why are we doing resize! then push! ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this was to generate the undefs but now indeed i'm just using push! after sizehint!, I don't know if that's optimal

Copy link
Member

@oxinabox oxinabox Oct 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is, basically,
Marginally better is resize! + assignment to index
but you missout on the heuristics of how push! will do extra resizing under the hood when it has to grow (I think sizehint! actaully might also block those since it iwill grow early)
and that extra reizinging under the hood gives speedup between seperate calls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did a quick benchmark with doing resize! followed by assignment to index, difference is basically swallowed by noise on the benchmark it seems.

@oxinabox
Copy link
Member

oxinabox commented Oct 1, 2019

I've been thinking about this a bit.
I think cleanup should replace the vectors it wants to remove with Vector{Int}() a zero length vector. We can even define just one of those. Call it const NULL_PAGE=Vector{Int}()
Then we have assure! check ===NULL_PAGE instead of checking isassigned.

If we make a function _pop_and_pageid that does a pop and also returns both the poped value,
and the pageid it came from,
then cleanup! would not have to search the whole array to decide what to clean,
it would just check if that counter is zero,
and if so would replace that page with the NULL_PAGE
and if we have made it that cheap,
then maybe we don't need to have pop! and dirty_pop! and cleanup!
since we can just do that check of if the counter is equal to zero
at every removal,
and then occationally it will be, but that operation will be cheap as well.

We would never get to resize reverse down, but I think that is ok, generally we can't do that very often anyway, since it would need everything removed to come from the far end.

What do you think?

@louisponet
Copy link
Contributor Author

That sounds reasonable to me, I find the whole undef thing a little fuzzy to begin with. I guess the difference in memory usage for undef vs a Vector{Int}() is not huge anyway. I'll have a go at that

@oxinabox
Copy link
Member

oxinabox commented Oct 1, 2019

I guess the difference in memory usage for undef vs a Vector{Int}() is not huge anyway

It should be nothing at all. (baring the single allocation for the global constant NULL_PAGE)
Because at the lowest level undef should be a pointer to 0x0000_000...,
and pointer to a constant will be some other pointer of same size

docs/src/sparse_int_set.md Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
@louisponet
Copy link
Contributor Author

About the iteration through zips (my main usecase, like in the benchmark), is there any way to speed this up? I guess not really because it's mainly the random access pattern taking up most time rather than the logic itself?

I also ran into a strange inference issue on id and tids when I didn't have the id_tids function, and just did it inside the iterate function. I don't get how inference could fail since literally everything is Ints.

@oxinabox
Copy link
Member

oxinabox commented Oct 2, 2019

I will look into it in 20 hours time

src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
src/sparse_int_set.jl Outdated Show resolved Hide resolved
@oxinabox
Copy link
Member

oxinabox commented Oct 3, 2019

I think with the last few comments resolve we will be able to merge this.
Also can you bump the version number in the Project.toml
so I can tag a release straight after merging?

https://white.ucc.asn.au/2019/09/28/Continuous-Delivery-For-Julia-Packages.html

Project.toml Outdated Show resolved Hide resolved
return nothing
end
id, tids = id_tids(it, state)
il = length(it)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was done at the start of the function it could also be used in the first if

Also why il and not something more descriptive, like it_len or even iterator_length

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sorry I was too fast in doing this my bad

louisponet and others added 2 commits October 4, 2019 11:21
Co-Authored-By: Lyndon White <oxinabox@ucc.asn.au>
@oxinabox oxinabox merged commit 0c70c9c into JuliaCollections:master Oct 4, 2019
@louisponet
Copy link
Contributor Author

Great! thanks for all your help with this!

@oxinabox
Copy link
Member

oxinabox commented Oct 4, 2019

no problem, thanks for your contribution😀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants