NUMA-aware cache modes #400

krizhanovsky · 2016-01-24T20:42:15Z

Introduction

Currently cache supports either sharded or replicated modes. However, there is no sense to replicate very large (e.g. several gigabytes like DVD images) entries to each NUMA node, rather the entries should be stored in one particular node (also the entries are sent in many TDB lookups in sense of #391). In particular, web cache is a bad candidate for replicated data, since there is eviction process and large data set. From the other hand, small and written-once on the configuration time data (#910 and #1350) is a good candidate for replication.

Also see Challenges of Memory Management on Modern NUMA System: Optimizing NUMA systems applications with Carrefour for modern NUMA problems and optimization methods, page replication in particular.

Modes of operation

There must be 3 modes fully resolved on TDB side:

UNIFORM - just pretend that we work on uniform memory (interleaving default NUMA mode)
REPLICATED - inserts and deletes happen on all the nodes, lookup on local node only. At the moment we see configuration time databases only, so no need for lock-free
SHARD - forest of HTries to cope with 128GB database limit (we must be able to keep several shards on each node). It seems we also need to adjust the allocators, including the early kernel huge pages one (also relates to Huge pages allocation issue and the crash on cache sizes >=2GB #1515 ), level to make it place pages on specific nodes. Probably, this way we also can cope with the 1st problem of Huge pages allocation issue and the crash on cache sizes >=2GB #1515 with one too large contiguous memory allocation.

Testing

(This is from tempesta-tech/tempesta-test#61 , reopen the issue if necessary).

When a web resource is saved into cache, it can be saved only on current or on all NUMA nodes. All cache-related tests must validate correctness of cache behaviour:

In sharding mode, resource musn't be copied across all the NUMA nodes,
In replicated mode, resource must be copied across all NUMA nodes.

Currently we run tests on CI on single NUMA node, so CI must also be updated to use different number of NUMA nodes. Probably there is more tests that must be run on different NUMA topologies

const-t · 2024-12-10T12:55:12Z

We must keep in mind, NUMA node is not always consist of memory and cpu. Sometimes node can have only memory or only cpu. Now we ignore such cases for cache's REPLICA mode, but prohibit to use SHARD cache's mode on setups where some node doesn't have at least one cpu. Also in this case we reserve memory at boot stage for each online node, even if node doesn't have cpus. Doing this we waste some space, however naive solution with using for_each_node_with_cpus is not applicable here, because at this boot stage for_each_node_with_cpus is not available for use.

const-t · 2024-12-13T12:03:39Z

Here we doing __cache_add_node(), however more proper way to do it is to call only tfw_cache_copy_resp() and tdb_entry_alloc_unique() for each node, instead of calling whole __cache_add_node().

krizhanovsky added the enhancement label Jan 24, 2016

krizhanovsky added this to the 0.6 OS milestone Jan 24, 2016

krizhanovsky self-assigned this May 23, 2016

krizhanovsky removed their assignment Oct 15, 2018

krizhanovsky added the performance label Oct 15, 2018

krizhanovsky mentioned this issue Nov 24, 2018

[Cache] Numa aware testing tempesta-tech/tempesta-test#61

Closed

krizhanovsky added the cache label Mar 16, 2020

krizhanovsky added the TDB Tempesta DB module and related issues label Apr 27, 2020

krizhanovsky modified the milestones: backlog, 0.11 - HTTP/2 perf, 1.1 - TLS 1.3 Sep 8, 2022

krizhanovsky mentioned this issue Sep 8, 2022

TDBv0.2: Cache background revalidation and eviction #515

Open

22 tasks

krizhanovsky changed the title ~~Hybrid cache mode~~ NUMA-aware cache modes Sep 8, 2022

krizhanovsky modified the milestones: 1.1 - TLS 1.3, 0.8 - Beta: TDBv0.2 & web cache eviction Sep 9, 2022

krizhanovsky modified the milestones: 1.1: TBD, 1.x - TBD Nov 7, 2023

krizhanovsky mentioned this issue Dec 2, 2024

Kernel panic in cache on NUMA #2295

Closed

krizhanovsky mentioned this issue Dec 11, 2024

Fix NUMA for the cache #2302

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NUMA-aware cache modes #400

NUMA-aware cache modes #400

krizhanovsky commented Jan 24, 2016 •

edited

Loading

const-t commented Dec 10, 2024

const-t commented Dec 13, 2024

NUMA-aware cache modes #400

NUMA-aware cache modes #400

Comments

krizhanovsky commented Jan 24, 2016 • edited Loading

Introduction

Modes of operation

Testing

const-t commented Dec 10, 2024

const-t commented Dec 13, 2024

krizhanovsky commented Jan 24, 2016 •

edited

Loading