Skip to content

Commit

Permalink
[Fix](bloom filter) Fix bloom filter memory leak (#34871)
Browse files Browse the repository at this point in the history
* Issue: Doris occasionally encounters an issue where memory usage becomes exceptionally high and does not decrease. The leaked memory is occupied by Bloom filters stored in memory.

Reason: The segment cache stores segment objects read from files into memory. It functions as an LRU cache with an eviction strategy: when the number of segments exceeds the maximum number, or the total memory size of segment objects in the cache exceeds the maximum usage, it evicts the older segments. However, there is a piece of logic in the code that first reads the segment object into memory, assuming it occupies memory size A, then places the read segment object into the cache (at this point, the cache considers the segment object size to be A). It then reads the segment's Bloom filter from the file and assigns it to the segment's Bloom filter member variable, assuming the Bloom filter occupies memory size B. Thus, the total size of the segment object at this point is A+B. However, the cache does not update this size, leading to the actual size of the segment object stored in the cache (A+B) being larger than the size considered by the cache (A). When the number of segment objects in the cache increases to a certain extent, the used memory will surge dramatically. However, the cache does not perceive the size as reaching the eviction limit, so it does not evict the segment objects. In such cases, a memory leak issue arises.

Solution: Since each segment object only reads the Bloom filter once, the issue can be resolved by changing the logic from reading the segment, placing it into the cache, and then reading the Bloom filter to reading the segment, reading the Bloom filter, and then placing it into the cache.
  • Loading branch information
Yukang-Lian authored May 23, 2024
1 parent 5bcc542 commit a62659a
Show file tree
Hide file tree
Showing 5 changed files with 385 additions and 3 deletions.
2 changes: 1 addition & 1 deletion be/src/olap/base_tablet.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -541,7 +541,7 @@ Status BaseTablet::lookup_row_key(const Slice& encoded_key, bool with_seq_col,
if (UNLIKELY(segment_caches[i] == nullptr)) {
segment_caches[i] = std::make_unique<SegmentCacheHandle>();
RETURN_IF_ERROR(SegmentLoader::instance()->load_segments(
std::static_pointer_cast<BetaRowset>(rs), segment_caches[i].get(), true));
std::static_pointer_cast<BetaRowset>(rs), segment_caches[i].get(), true, true));
}
auto& segments = segment_caches[i]->get_segments();
DCHECK_EQ(segments.size(), num_segments);
Expand Down
10 changes: 10 additions & 0 deletions be/src/olap/rowset/segment_v2/segment.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -283,6 +283,15 @@ Status Segment::_parse_footer(SegmentFooterPB* footer) {
}

Status Segment::_load_pk_bloom_filter() {
#ifdef BE_TEST
if (_pk_index_meta == nullptr) {
// for BE UT "segment_cache_test"
return _load_pk_bf_once.call([this] {
_meta_mem_usage += 100;
return Status::OK();
});
}
#endif
DCHECK(_tablet_schema->keys_type() == UNIQUE_KEYS);
DCHECK(_pk_index_meta != nullptr);
DCHECK(_pk_index_reader != nullptr);
Expand Down Expand Up @@ -312,6 +321,7 @@ Status Segment::load_pk_index_and_bf() {
RETURN_IF_ERROR(_load_pk_bloom_filter());
return Status::OK();
}

Status Segment::load_index() {
auto status = [this]() { return _load_index_impl(); }();
if (!status.ok()) {
Expand Down
8 changes: 7 additions & 1 deletion be/src/olap/segment_loader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#include "olap/segment_loader.h"

#include "common/config.h"
#include "common/status.h"
#include "olap/olap_define.h"
#include "olap/rowset/beta_rowset.h"
#include "util/stopwatch.hpp"
Expand Down Expand Up @@ -50,7 +51,8 @@ void SegmentCache::erase(const SegmentCache::CacheKey& key) {
}

Status SegmentLoader::load_segments(const BetaRowsetSharedPtr& rowset,
SegmentCacheHandle* cache_handle, bool use_cache) {
SegmentCacheHandle* cache_handle, bool use_cache,
bool need_load_pk_index_and_bf) {
if (cache_handle->is_inited()) {
return Status::OK();
}
Expand All @@ -61,9 +63,13 @@ Status SegmentLoader::load_segments(const BetaRowsetSharedPtr& rowset,
}
segment_v2::SegmentSharedPtr segment;
RETURN_IF_ERROR(rowset->load_segment(i, &segment));
if (need_load_pk_index_and_bf) {
RETURN_IF_ERROR(segment->load_pk_index_and_bf());
}
if (use_cache && !config::disable_segment_cache) {
// memory of SegmentCache::CacheValue will be handled by SegmentCache
auto* cache_value = new SegmentCache::CacheValue();
_cache_mem_usage += segment->meta_mem_usage();
cache_value->segment = std::move(segment);
_segment_cache->insert(cache_key, *cache_value, cache_handle);
} else {
Expand Down
7 changes: 6 additions & 1 deletion be/src/olap/segment_loader.h
Original file line number Diff line number Diff line change
Expand Up @@ -114,15 +114,20 @@ class SegmentLoader {
// Load segments of "rowset", return the "cache_handle" which contains segments.
// If use_cache is true, it will be loaded from _cache.
Status load_segments(const BetaRowsetSharedPtr& rowset, SegmentCacheHandle* cache_handle,
bool use_cache = false);
bool use_cache = false, bool need_load_pk_index_and_bf = false);

void erase_segment(const SegmentCache::CacheKey& key);

void erase_segments(const RowsetId& rowset_id, int64_t num_segments);

// Just used for BE UT
int64_t cache_mem_usage() const { return _cache_mem_usage; }

private:
SegmentLoader();
std::unique_ptr<SegmentCache> _segment_cache;
// Just used for BE UT
int64_t _cache_mem_usage = 0;
};

// A handle for a single rowset from segment lru cache.
Expand Down
Loading

0 comments on commit a62659a

Please sign in to comment.