Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] A partitioned HIBF layout. #230

Draft
wants to merge 30 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
7ea2e5d
[FEATURE] Compute a partitioned HIBF layout.
smehringer Nov 9, 2023
0590816
[MISC] automatic linting
seqan-actions Nov 28, 2023
d75d814
fix configuration variable initialisation
smehringer Dec 4, 2023
46f4619
fix thread local variable hibf_config.
smehringer Dec 4, 2023
e29b5a5
NEW paritioning approach: folded.
smehringer Dec 5, 2023
cab641f
fix alternative
smehringer Dec 6, 2023
968684a
adapt approach
smehringer Dec 6, 2023
168d12b
improve layout.
smehringer Dec 7, 2023
9eeb4bf
new approach
smehringer Dec 8, 2023
49ffd16
[MISC] automatic linting
seqan-actions Dec 8, 2023
a8c8ce9
FIX
smehringer Dec 8, 2023
36c3541
[MISC] automatic linting
seqan-actions Dec 8, 2023
9a1f529
fix
smehringer Dec 8, 2023
f854040
[MISC] automatic linting
seqan-actions Dec 8, 2023
8ae25d6
[INFRA] Use hibf branch
eseiler Dec 9, 2023
11e940d
[MISC] read_layouts_file
eseiler Dec 9, 2023
f9f8f5b
[FIX] tests
eseiler Dec 9, 2023
27f2506
[FIX] clang
eseiler Dec 9, 2023
c4e6087
[FEATURE] general.cpp can process multiple layouts.
smehringer Dec 11, 2023
75c0e03
knuts similarity approach
smehringer Dec 13, 2023
df418f1
[MISC] automatic linting
seqan-actions Dec 13, 2023
fec8cc2
fix stuff
smehringer Dec 13, 2023
b48519b
fic
smehringer Dec 13, 2023
eeee4be
[FIX] Similarity approach: Prohibit assigning to a partition that is …
smehringer Jan 9, 2024
256596c
[FEATURE] Similarity Approach: process blocks in random order. Might …
smehringer Jan 9, 2024
3bcce02
fix new similarity approach.
smehringer Jan 10, 2024
d21f2b8
[MISC] automatic linting
seqan-actions Jan 10, 2024
9b241bc
fix similarity approach
smehringer Jan 18, 2024
d19c2fb
[MISC] automatic linting
seqan-actions Jan 18, 2024
457b614
[FEATURE] Adapt similarity approach by Knuts suggestions: random seed…
smehringer Feb 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions include/chopper/configuration.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,18 @@
namespace chopper
{

enum partitioning_scheme
{
blocked,
sorted,
folded,
weighted_fold,
similarity
};

struct configuration
{
int partitioning_approach{};
/*!\name General Configuration
* \{
*/
Expand All @@ -46,6 +56,16 @@ struct configuration
bool precomputed_files{false};
//!\}

/*!\name Partitioned HIBF configuration
* \{
*/
//!\brief The maximum index size that the HIBF should not exceed. number_of_paritions will be set accordingly.
size_t maximum_index_size{0};

//!\brief The number of partitions for the HIBF index.
size_t number_of_partitions{0};
//!\}

/*!\name Configuration of size estimates
* \{
*/
Expand Down Expand Up @@ -93,6 +113,9 @@ struct configuration
archive(CEREAL_NVP(disable_sketch_output));
archive(CEREAL_NVP(precomputed_files));

archive(CEREAL_NVP(maximum_index_size));
archive(CEREAL_NVP(number_of_partitions));

archive(CEREAL_NVP(output_filename));
archive(CEREAL_NVP(determine_best_tmax));
archive(CEREAL_NVP(force_all_binnings));
Expand Down
5 changes: 3 additions & 2 deletions include/chopper/layout/input.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@ namespace chopper::layout
{

std::vector<std::vector<std::string>> read_filenames_from(std::istream & stream);
std::tuple<std::vector<std::vector<std::string>>, configuration, seqan::hibf::layout::layout>
read_layout_file(std::istream & stream);

std::tuple<std::vector<std::vector<std::string>>, configuration, std::vector<seqan::hibf::layout::layout>>
read_layouts_file(std::istream & stream);

} // namespace chopper::layout
2 changes: 1 addition & 1 deletion lib/hibf
Submodule hibf updated 70 files
+2 −7 .github/workflows/ci_iwyu.yml
+3 −4 README.md
+6 −3 include/hibf/contrib/aligned_allocator.hpp
+6 −5 include/hibf/hierarchical_interleaved_bloom_filter.hpp
+24 −160 include/hibf/interleaved_bloom_filter.hpp
+2 −0 include/hibf/layout/compute_layout.hpp
+1 −2 include/hibf/layout/data_store.hpp
+3 −4 include/hibf/layout/graph.hpp
+2 −1 include/hibf/layout/hierarchical_binning.hpp
+5 −6 include/hibf/layout/layout.hpp
+1 −1 include/hibf/layout/simple_binning.hpp
+7 −0 include/hibf/misc/bit_vector.hpp
+300 −0 include/hibf/misc/counting_vector.hpp
+1 −2 include/hibf/misc/insert_iterator.hpp
+61 −0 include/hibf/misc/partition.hpp
+2 −2 include/hibf/misc/print.hpp
+13 −1 include/hibf/platform.hpp
+5 −0 include/hibf/sketch/compute_sketches.hpp
+2 −3 include/hibf/sketch/estimate_kmer_counts.hpp
+0 −1 include/hibf/sketch/hyperloglog.hpp
+0 −1 include/hibf/sketch/toolbox.hpp
+4 −5 src/config.cpp
+9 −0 src/hierarchical_interleaved_bloom_filter.cpp
+25 −5 src/layout/compute_layout.cpp
+2 −2 src/layout/graph.cpp
+4 −2 src/layout/layout.cpp
+4 −4 src/misc/print.cpp
+34 −3 src/sketch/compute_sketches.cpp
+4 −3 src/sketch/hyperloglog.cpp
+11 −12 src/sketch/toolbox.cpp
+8 −8 test/documentation/DoxygenLayout.xml
+21 −1 test/documentation/doxygen-awesome/doxygen-awesome-tabs.js
+225 −78 test/documentation/doxygen-awesome/doxygen-awesome.css
+4 −4 test/documentation/hibf-doxygen-layout.cmake
+2 −2 test/documentation/hibf-doxygen.cmake
+1 −1 test/documentation/hibf_doxygen_cfg.in
+2 −0 test/include/hibf/test/bytes.hpp
+2 −2 test/include/hibf/test/sandboxed_path.hpp
+2 −3 test/include/hibf/test/tmp_directory.hpp
+3 −4 test/include/hibf/test/type_name_as_string.hpp
+0 −19 test/iwyu/mappings/gcc.stl.headers.imp
+0 −11 test/iwyu/mappings/gcc.symbols.imp
+5 −0 test/iwyu/mappings/iwyu.imp
+51 −21 test/iwyu/mappings/libcxx.imp
+28 −0 test/iwyu/mappings/stl.public.imp
+1 −2 test/performance/ibf/bit_vector_benchmark.cpp
+128 −96 test/performance/ibf/interleaved_bloom_filter_benchmark.cpp
+0 −1 test/performance/sketch/hyperloglog_benchmark.cpp
+1 −2 test/snippet/ibf/counting_agent.cpp
+2 −1 test/snippet/ibf/counting_vector.cpp
+3 −4 test/snippet/readme.cpp
+1 −2 test/snippet/test/tmp_directory.cpp
+3 −0 test/unit/hibf/CMakeLists.txt
+18 −0 test/unit/hibf/bit_vector_test.cpp
+0 −2 test/unit/hibf/build/bin_size_in_bits_test.cpp
+7 −0 test/unit/hibf/counting_vector_avx512_test.cpp
+197 −0 test/unit/hibf/counting_vector_test.cpp
+1 −1 test/unit/hibf/hierarchical_interleaved_bloom_filter_test.cpp
+7 −0 test/unit/hibf/interleaved_bloom_filter_avx512_test.cpp
+16 −10 test/unit/hibf/interleaved_bloom_filter_test.cpp
+15 −2 test/unit/hibf/layout/compute_layout_test.cpp
+50 −6 test/unit/hibf/layout/layout_test.cpp
+1 −2 test/unit/hibf/path_test.cpp
+4 −4 test/unit/hibf/print_test.cpp
+8 −8 test/unit/hibf/sketch/toolbox_test.cpp
+0 −1 test/unit/hibf/timer_test.cpp
+2 −2 test/unit/test/expect_range_eq_test.cpp
+1 −1 test/unit/test/expect_same_type_test.cpp
+2 −3 test/unit/test/file_access_test.cpp
+2 −3 test/unit/test/temporary_snippet_file_test.cpp
Loading
Loading