Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory corruption observed when using Moka v0.9.6 #281

Closed
tatsuya6502 opened this issue Jun 19, 2023 · 15 comments · Fixed by #283
Closed

Memory corruption observed when using Moka v0.9.6 #281

tatsuya6502 opened this issue Jun 19, 2023 · 15 comments · Fixed by #283
Assignees
Labels
bug Something isn't working
Milestone

Comments

@tatsuya6502
Copy link
Member

tatsuya6502 commented Jun 19, 2023

Split from MaterializeInc/database-issues#93.

Note: I started looking into Moka because recently we tried using it in MaterializeInc/materialize#19614, but had to revert it because we were observing memory corruption.

  • Moka v0.9.6
  • No stack traces and core dumps are available.
  • The target platform (CPU arch and OS) are yet unknown. => Linux x86_64
  • moka::sync::Cache are used in async contexts.
  • Multiple (many?)Cache instances. (An instance per blob URL. Do not know how many)
  • Weigher and eviction listener (immediate mode) are configured.

I am trying to reproduce the issue using mokabench but no luck: #279 (comment)

CC: @ParkMyCar

@tatsuya6502 tatsuya6502 self-assigned this Jun 20, 2023
@tatsuya6502
Copy link
Member Author

tatsuya6502 commented Jun 20, 2023

Details of the issue: https://github.com/MaterializeInc/database-issues/issues/5879

It seems easy to reproduce if we can run an integration end-to-end test called "Long Zippy w/ user tables".

This happened twice in a row in long Zippy w/ user tables test: https://buildkite.com/materialize/release-qualification/builds/197#018883bf-5086-4a43-a8b4-ede1ef192de5

@tatsuya6502
Copy link
Member Author

tatsuya6502 commented Jun 20, 2023

https://github.com/MaterializeInc/materialize/blob/bfbebe04a7d181f1bac91f01785309a52cf2de34/ci/release-qualification/pipeline.yml#L99-L108

  - id: zippy-user-tables-large
    label: "Long Zippy w/ user tables"
    timeout_in_minutes: 2880
    agents:
      queue: linux-x86_64-large
    artifact_paths: junit_*.xml
    plugins:
      - ./ci/plugins/mzcompose:
          composition: zippy
          args: [--scenario=UserTablesLarge, --actions=200000]

@tatsuya6502
Copy link
Member Author

tatsuya6502 commented Jun 21, 2023

I ran a short version of "Zippy w/ user tables test" on Linux x86_64. I used the same scenario, but smaller number of actions 2,000 instead of 200,000.

I could not reproduce the issue. I will try a longer run.

Environment:

  • Docker (x86_64)
  • Intel Core i7-12700F
    • 8 × performance cores × 2 hyper threads
    • 4 × efficient cores

Terminal Log:

$ git clone git@github.com:MaterializeInc/materialize.git
$ cd materialize

$ git branch moka-segfault 07386e427f
$ git checkout moka-segfault

$ rg moka
Cargo.lock
2903: "moka",
3241:name = "moka"
4380: "moka",
8629: "moka",

src/persist-client/Cargo.toml
41:moka = { version = "0.9.6", default-features = false, features = ["sync"] }

src/persist-client/src/internal/cache.rs
16:use moka::notification::RemovalCause;
17:use moka::sync::Cache;
100:        // This could maybe use moka's async cache to unify any concurrent

src/workspace-hack/Cargo.toml
63:moka = { version = "0.9.6", default-features = false, features = ["atomic64", "sync"] }
164:moka = { version = "0.9.6", default-features = false, features = ["atomic64", "sync"] }
$ ./bin/mzcompose --find zippy run default --scenario UserTablesLarge --actions 2000
==> Collecting mzbuild images
materialize/ubuntu-base:mzbuild-WBP7JFLWFQYUIHLXR5JSOAJ74VVKI6O7
materialize/clusterd:mzbuild-Q77JA6BIOATFUZAWANSIJFOCUXV3GOGX
materialize/materialized:mzbuild-IIUOBQEI7JVK3YSGYYV6FP3EP3JGFF5I
materialize/test-certs:mzbuild-6JWZ2MGLSYOCL3MLHIEIJMMLQLEOLCNY
materialize/postgres:mzbuild-ENTVWBFMO762DEEKAMURDMVMQ52GIO63
materialize/testdrive:mzbuild-ANAR5HEUWWDMTQUJ3ZA7XJWEY64H2ZTB
...
==> Running test case workflow-default
==> Running workflow default
...
[+] Running 3/3
 ✔ Container zippy-cockroach-1     Healthy
 ✔ Container zippy-minio-1         Healthy
 ✔ Container zippy-materialized-1  Healthy
> ALTER SYSTEM SET enable_unmanaged_cluster_replicas = true;
> CREATE CLUSTER storaged REPLICAS (r2 (
                STORAGECTL ADDRESSES ['storaged:2100'],
                STORAGE ADDRESSES ['storaged:2103'],
                COMPUTECTL ADDRESSES ['storaged:2101'],
                COMPUTE ADDRESSES ['storaged:2102'],
                WORKERS 4
            ))
[+] Stopping 1/1
 ✔ Container zippy-materialized-1  Stopped
Going to remove zippy-materialized-1
[+] Removing 1/0
 ✔ Container zippy-materialized-1  Removed
[+] Building 0.0s (0/0)
[+] Running 1/1
 ✔ Container zippy-testdrive-1  Healthy
Generating test...
Running test...
--- MaterializeInc/database-issues#1: KafkaStart
[+] Building 0.0s (0/0)
[+] Running 2/2
 ✔ Container zippy-zookeeper-1  Healthy
 ✔ Container zippy-kafka-1      Healthy
--- MaterializeInc/database-issues#2: CockroachStart
[+] Building 0.0s (0/0)
[+] Running 1/1
 ✔ Container zippy-cockroach-1  Healthy
--- MaterializeInc/database-issues#3: MinioStart
[+] Building 0.0s (0/0)
[+] Running 1/1
 ✔ Container zippy-minio-1  Healthy
--- MaterializeInc/database-issues#4: MzStart
[+] Building 0.0s (0/0)
[+] Running 3/3
 ✔ Container zippy-cockroach-1     Healthy
 ✔ Container zippy-minio-1         Healthy
 ✔ Container zippy-materialized-1  Healthy
--- MaterializeInc/materialize#5: StoragedStart
[+] Building 0.0s (0/0)
[+] Running 1/1
 ✔ Container zippy-storaged-1  Healthy
--- MaterializeInc/database-issues#5: CreateTable
> CREATE TABLE table_0 (f1 INTEGER);
rows match; continuing at ts 1687305573.2167296
> INSERT INTO table_0 VALUES (0);
rows match; continuing at ts 1687305573.2269592

...

> INSERT INTO table_0 SELECT * FROM generate_series(568924, 572569);
rows match; continuing at ts 1687306589.1175394
==> mzcompose: test case workflow-default succeeded

@tatsuya6502
Copy link
Member Author

tatsuya6502 commented Jun 21, 2023

There are almost 300 commits between when moka was added and when removed:
MaterializeInc/materialize@1bcd1c4...07386e4

I need to know when (which commit) last failed.

@tatsuya6502
Copy link
Member Author

There are almost 300 commits between when moka was added and when removed:
MaterializeInc/materialize@1bcd1c4...07386e4

I need to know when (which commit) last failed.

The following commits were mentioned in https://github.com/MaterializeInc/database-issues/issues/5879.

@tatsuya6502
Copy link
Member Author

Ran the test (with 5,000 actions) several times on my home PC, but I have not reproduced the issue.

Environment:

  • Docker (x86_64)
  • Intel Core i7-12700F
    • 8 × performance cores × 2 hyper threads
    • 4 × efficient cores

My PC (Windows 11 + WSL2) does not have enough RAM to run longer Zippy test. It has only 16GB of RAM and Docker containers like Kafka sometimes get killed while running test. I will expand the RAM to 32GB.

I hope running longer test would help to reproduce the issue. But another aspect could be the number of available cores/hyper threads. My PC has 16 cores and 20 threads, but their EC2 instance only has 8 vCPUs:

  • c5.2xlarge instances, which have 8 vCPUs (i.e., hyperthreads) and 16GB of memory.

I will probably adjust the number of hyper threads available for Docker:

Also, I will add some tracing logs to both Moka and Materialize to get better understandings on the access pattern.

@0xdeafbeef
Copy link

@tatsuya6502 I have 32 threads / 128 ram and can leave test running for night if this can help

@tatsuya6502
Copy link
Member Author

@0xdeafbeef Thanks for the offer. Wow, you have a great machine!

I think I am going to be okay. I already ordered the RAM modules and am waiting for them to arrive. I think 32GB should be enough for now.

@tatsuya6502
Copy link
Member Author

FYI, I got the RAM modules for my PC, and confirmed it can run the test with at least 50,000 actions. I allocated 24GiB of memory and only 8 logical CPU cores to WSL2.

The reason I did not try 200,000 actions is that it will take long time (Estimated ~30 hours). 50,000 actions took 7 hours and 14 minutes. The memory utilization was fine so I had feeling that it could run 200,000 without issue.

Also, before I got the RAM module, I was also able to run 80,000 actions. I found that reducing the number of logical CPU cores for WSL2/Docker will reduce the memory utilization of the test. So I allocated only 8 cores (instead of 20) and 10GiB of memory (instead of 8GiB) to WSL2 and it was able to run 80,000 actions.

@tatsuya6502
Copy link
Member Author

I talked to a Materialize engineer. They will try to reproduce the issue using an EC2 instance and give me the full set of instructions.

MaterializeInc/materialize#20136 (comment)

Hopefully someone from the QA team has time to attempt the repro early next week and give you the full set of instructions.

@tatsuya6502
Copy link
Member Author

tatsuya6502 commented Jul 2, 2023

With some helps from Materialize engineers, I made some progress. I reproduced the segmentation fault in Moka v0.9.6 in Materialize, and got a core dump to analyze.

NOTES: I had to use an Amazon EC2 c6a.12xlarge instance, which has 48 × vCPUs. This is because using my home PC (20 × HyperThreads) can take few days to reproduce with the current test (details). Also it was a bit tricky to get core dump in a Docker container, so it took some time to me to figure out. (An modified Materialize test harness for getting core dumps)

Here is the back trace of the crashed thread:

moka-segfault-2023-07-01

It crashed in the Deque::move_to_back method (src/common/deque.rs#L202), which was called by the base_cache::Inner::handle_upsert method (src/sync_base/base_cache.rs#L1414).

Materialize uses Moka v0.9.6, and this problem may have been fixed by #259 in v0.11.0.

I will back-port the fix to v0.9.x, and run the same test to see if the problem goes away.

Idea of steps to reproduce

If this is related to #259, reproducing steps could be the followings. However the end result depends largely on how the compiler optimizes reading an immutable reference & to a LRU double-ended queue (deque).

  1. Create a cache with weighted capacity 6.

  2. Stop the periodic sync.

  3. Get not-existing key X once. (popularity = 1)

  4. Get not-existing key Y once. (popularity = 1)

  5. Insert key X with weight 2.

  6. Insert key Y with weight 2.

  7. Call sync:

    • All X and Y are admitted.
    • Now the LRU deque has: (front) XY (back)
  8. Get not-existing key V three times. (popularity = 3)

  9. Insert V with weight 3.

  10. Invalidate Y.

    • This removes Y from the concurrent hash table (cht), but removing it’s node from the deque will be delayed to a future sync.
  11. Call sync. The followings will happen and the last one can cause a segmentation fault:

    • X and Y are selected as the victims (entries to evict), and V is marked as admitted.
      • weights: X + Y = 4, V = 3
      • popularity: X + Y = 2, V = 3
    • X is removed from the cht, and its deque node is unlinked and dropped.
      • Unlinking X node changes Y node's prev pointer from Some(NonNull(X)) to None .
    • Because of step 10, Y is no longer in the cht, so Y's deque node is added to the skipped_nodes vector.
    • V's deque node is created and pushed to the back of the deque.
    • The nodes in the skipped_nodes are processed. (There is only Y node): Y node will be moved to the back of the deque.
      • Deque's move_to_back method checks Y's prev pointer, which is None now.
      • However, because of the bug (fixed by #259 for v0.11.0), optimized machine instructions may use an stale value of the prev pointer in a CPU register (pointing to X), instead of reloading the actual value None from the memory.
      • X was already dropped, so trying to access X (in the following code: *prev.as_ptr()) will cause the segmentation fault.

src/common/deque.rs#L202

188: pub(crate) unsafe fn move_to_back(&mut self, mut node: NonNull<DeqNode<T>>) {
        ...
201:    match node.prev {
202:        Some(prev) if node.next.is_some() => (*prev.as_ptr()).next = node.next,

@tatsuya6502
Copy link
Member Author

tatsuya6502 commented Jul 2, 2023

  1. Invalidate Y.
    • This removes Y from the concurrent hash table (cht), but removing it’s node from the deque will be delayed to a future sync.
  2. Call sync. The followings will happen and the last one can cause a segmentation fault:
    • X and Y are selected as the victims (entries to evict), and V is marked as admitted.
      • weights: X + Y = 4, V = 3
      • popularity: X + Y = 2, V = 3
    • X is removed from the cht, and its deque node is unlinked and dropped.
      • Unlinking X node changes Y node's prev pointer from Some(NonNull(X)) to None .
    • Because of step 10, Y is no longer in the cht, so Y's deque node is added to the skipped_nodes vector.

I reviewed code and realized that above will not reproduce the problem.

  • Step 10 (invalidate Y) needs to be done in a smaller time window.
  • It has to be done:
    • after the first bullet of 11 (X and Y are selected as ...)
    • and before the third bullet of 11 (Because of step 10, Y is no longer in the cht, ...).

@tatsuya6502 tatsuya6502 linked a pull request Jul 3, 2023 that will close this issue
@tatsuya6502
Copy link
Member Author

tatsuya6502 commented Jul 3, 2023

Materialize uses Moka v0.9.6, and this problem may have been fixed by #259 in v0.11.0.

I will back-port the fix to v0.9.x, and run the same test to see if the problem goes away.

I backported the fix to v0.9.x:

I am running "Long Zippy w/ user tables" on an EC2 c6a.12xlarge instance (48 vCPUs). It has been running for 13 hours 14 hours without reproducing the problem. This looks promising.

I will release Moka v0.9.8 with MaterializeInc/materialize#283 after I finish running pre-release tests. And then I will let the Materialize engineers know that the problem may have been fixed.

@tatsuya6502
Copy link
Member Author

I will backport MaterializeInc/materialize#259 to Moka v0.10.x as well. I will keep this issue open until I am done with it.

@tatsuya6502
Copy link
Member Author

Closing as completed.

Moka v0.9.x

Moka v0.10.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants