Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant time len #182

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion src/bitmap/arbitrary.rs
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,8 @@ mod test {
container.ensure_correct_store();
container
}).collect::<Vec<Container>>();
RoaringBitmap { containers }
let len = containers.iter().map(|c| c.len()).sum();
RoaringBitmap { len, containers }
}
}

Expand Down
40 changes: 27 additions & 13 deletions src/bitmap/inherent.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ impl RoaringBitmap {
/// let mut rb = RoaringBitmap::new();
/// ```
pub fn new() -> RoaringBitmap {
RoaringBitmap { containers: Vec::new() }
RoaringBitmap { len: 0, containers: Vec::new() }
}

/// Adds a value to the set.
Expand All @@ -41,7 +41,9 @@ impl RoaringBitmap {
&mut self.containers[loc]
}
};
container.insert(index)
let inserted = container.insert(index);
self.len += inserted as u64;
inserted
}

/// Search for the specific container by the given key.
Expand Down Expand Up @@ -90,7 +92,9 @@ impl RoaringBitmap {
// If the end range value is in the same container, just call into
// the one container.
if start_container_key == end_container_key {
return self.containers[first_index].insert_range(start_index..=end_index);
let inserted = self.containers[first_index].insert_range(start_index..=end_index);
self.len += inserted;
return inserted;
}

// For the first container, insert start_index..=u16::MAX, with
Expand All @@ -115,7 +119,7 @@ impl RoaringBitmap {
let last_index = self.find_container_by_key(end_container_key);

inserted += self.containers[last_index].insert_range(0..=end_index);

self.len += inserted;
inserted
}

Expand All @@ -139,7 +143,7 @@ impl RoaringBitmap {
pub fn push(&mut self, value: u32) -> bool {
let (key, index) = util::split(value);

match self.containers.last_mut() {
let pushed = match self.containers.last_mut() {
Some(container) if container.key == key => container.push(index),
Some(container) if container.key > key => false,
_otherwise => {
Expand All @@ -148,7 +152,9 @@ impl RoaringBitmap {
self.containers.push(container);
true
}
}
};
self.len += pushed as u64;
pushed
}

///
Expand All @@ -172,6 +178,7 @@ impl RoaringBitmap {
self.containers.push(container);
}
}
self.len += 1;
}

/// Removes a value from the set. Returns `true` if the value was present in the set.
Expand All @@ -189,7 +196,7 @@ impl RoaringBitmap {
/// ```
pub fn remove(&mut self, value: u32) -> bool {
let (key, index) = util::split(value);
match self.containers.binary_search_by_key(&key, |c| c.key) {
let removed = match self.containers.binary_search_by_key(&key, |c| c.key) {
Ok(loc) => {
if self.containers[loc].remove(index) {
if self.containers[loc].len() == 0 {
Expand All @@ -201,7 +208,9 @@ impl RoaringBitmap {
}
}
_ => false,
}
};
self.len -= removed as u64;
removed
}

/// Removes a range of values.
Expand Down Expand Up @@ -244,6 +253,7 @@ impl RoaringBitmap {
}
index += 1;
}
self.len -= removed;
removed
}

Expand Down Expand Up @@ -282,6 +292,7 @@ impl RoaringBitmap {
/// assert_eq!(rb.contains(1), false);
/// ```
pub fn clear(&mut self) {
self.len = 0;
self.containers.clear();
}

Expand Down Expand Up @@ -320,7 +331,7 @@ impl RoaringBitmap {
/// assert_eq!(rb.len(), 2);
/// ```
pub fn len(&self) -> u64 {
self.containers.iter().map(|container| container.len()).sum()
self.len
}

/// Returns the minimum value in the set (if the set is non-empty).
Expand Down Expand Up @@ -375,8 +386,6 @@ impl RoaringBitmap {
/// assert_eq!(rb.rank(10), 2)
/// ```
pub fn rank(&self, value: u32) -> u64 {
// if len becomes cached for RoaringBitmap: return len if len > value
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. Should have been return len if len > max

Computing max is not constant time for BitmapStore


let (key, index) = util::split(value);

match self.containers.binary_search_by_key(&key, |c| c.key) {
Expand All @@ -391,7 +400,7 @@ impl RoaringBitmap {
}
}

/// Returns the nth integer in the bitmap (if the set is non-empty)
/// Returns the nth integer in the bitmap or `None` if n > len()
///
/// # Examples
///
Expand All @@ -410,6 +419,10 @@ impl RoaringBitmap {
/// assert_eq!(rb.select(2), Some(100));
/// ```
pub fn select(&self, n: u32) -> Option<u32> {
if n as u64 >= self.len() {
return None;
}

let mut n = n as u64;

for container in &self.containers {
Expand All @@ -435,10 +448,11 @@ impl Default for RoaringBitmap {

impl Clone for RoaringBitmap {
fn clone(&self) -> Self {
RoaringBitmap { containers: self.containers.clone() }
RoaringBitmap { len: self.len, containers: self.containers.clone() }
}

fn clone_from(&mut self, other: &Self) {
self.len = other.len;
self.containers.clone_from(&other.containers);
}
}
Expand Down
16 changes: 8 additions & 8 deletions src/bitmap/iter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,16 @@ pub struct IntoIter {
}

impl Iter<'_> {
fn new(containers: &[Container]) -> Iter {
let size_hint = containers.iter().map(|c| c.len()).sum();
Iter { inner: containers.iter().flatten(), size_hint }
fn new(bitmap: &RoaringBitmap) -> Iter {
let size_hint = bitmap.len();
Iter { inner: bitmap.containers.iter().flatten(), size_hint }
Comment on lines +20 to +22
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain to me the fact #179 removes the size_hint implementation and here we continue to track the length of the bitmap? I feel lost.

Copy link
Contributor Author

@saik0 saik0 Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#176 Has the reasoning

The code it replaced would unconditionally panic. To guess the original author's intent: it was a runtime check that the count is tracked at the outer iterator. Panicking broke some new proptests, so we implemented the correct behavior. However, it is still true that the size hint is unreachable from the exposed API. The iterator is doing extra work the end user can never observe.

I suspect the regression went unnoticed since #125 plus this issue is still a net perf gain.

Proposing we just remove the size hints altogether and return the default (0, None). It will satisfy the validity requirements of tests without needlessly slowing down iteration.

We track the len of the RoaringBitmap here, therefore the the BitmapStore iter size_hint is unreachable by the external api. However, it should not panic, because it is reachable by tests, so it can return the default (0, None) which indicates no size hint.

}
}

impl IntoIter {
fn new(containers: Vec<Container>) -> IntoIter {
let size_hint = containers.iter().map(|c| c.len()).sum();
IntoIter { inner: containers.into_iter().flatten(), size_hint }
fn new(bitmap: RoaringBitmap) -> IntoIter {
let size_hint = bitmap.len();
IntoIter { inner: bitmap.containers.into_iter().flatten(), size_hint }
}
}

Expand Down Expand Up @@ -95,7 +95,7 @@ impl RoaringBitmap {
/// assert_eq!(iter.next(), None);
/// ```
pub fn iter(&self) -> Iter {
Iter::new(&self.containers)
Iter::new(self)
}
}

Expand All @@ -113,7 +113,7 @@ impl IntoIterator for RoaringBitmap {
type IntoIter = IntoIter;

fn into_iter(self) -> IntoIter {
IntoIter::new(self.containers)
IntoIter::new(self)
}
}

Expand Down
4 changes: 4 additions & 0 deletions src/bitmap/mod.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
#![allow(clippy::suspicious_op_assign_impl)] // allow for +/- len in op assign
#![allow(clippy::suspicious_arithmetic_impl)]

mod arbitrary;
mod container;
mod fmt;
Expand Down Expand Up @@ -36,4 +39,5 @@ pub use self::iter::Iter;
#[derive(PartialEq)]
pub struct RoaringBitmap {
containers: Vec<container::Container>,
len: u64,
}
Loading