Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd creates excessive number of WAL files #10885

Closed
dharmab opened this issue Jul 9, 2019 · 17 comments
Closed

etcd creates excessive number of WAL files #10885

dharmab opened this issue Jul 9, 2019 · 17 comments

Comments

@dharmab
Copy link

dharmab commented Jul 9, 2019

We have a 3-node etcd cluster running etcd 3.3.10 for a Kubernetes cluster. This etcd cluster runs on nodes with high-performance but low-capacity volumes. We've set max-wals to 128, but we consistently see the number of WALs exceeding 128 for extended periods of time. (We do see WAL purging in our logs, so purging is occurring eventually.) Restarting etcd purges the WALs back down to 128.

vm-etcd-13prodva7-2 etcd # ls *.wal | wc -l
133
vm-etcd-13prodva7-2 etcd # sudo systemctl restart etcd
vm-etcd-13prodva7-2 etcd # ls *.wal | wc -l
128

How do we troubleshoot why so many WALs are being created? Can we take further steps to limit the number of WALs and reduce the chance of running out of storage space on these smaller nodes?

@xiang90
Copy link
Contributor

xiang90 commented Jul 9, 2019

can you set max-wals number lower than 128? the purging is a periodical operation.

@dharmab
Copy link
Author

dharmab commented Jul 9, 2019

Yes, we're going to reduce max-wals from 128 back to the default of 5.

I have two concerns:

  • max-wals isn't actually a hard maximum, but rather a soft cap beyond which WALs begin to be purged. So max-wals needs to be set significantly below the desired maximum WAL count.
  • fileutils.PurgeFile only purges files down to max-wals. There may be further candidates for purging (WALs which have no flocks), but PurgeFile won't purge these files if it gets down to max-wals. On smaller disks, it may be helpful to purge any and all unlocked WALs rather than short-circuiting.

Setting max-wals to a very small number seems to workaround these concerns, so we'll try that next.

@dharmab
Copy link
Author

dharmab commented Jul 16, 2019

We reduced max-wals to 5 but are still seeing wildly excessive WAL utilization:

vm-etcd-13prodva7-1 wal # cat /etc/ethos/etcd.conf.yaml | grep max-wals
max-wals: 5
vm-etcd-13prodva7-1 wal # ls -lh
total 17G
-rw-------. 1 root root   0 Jul 16 18:53 0.tmp
-rw-------. 1 root root 62M Jul 16 18:15 00000000000003e4-000000000776e505.wal
-rw-------. 1 root root 62M Jul 16 18:15 00000000000003e5-000000000776e8f5.wal
-rw-------. 1 root root 62M Jul 16 18:15 00000000000003e6-000000000776e9ba.wal
-rw-------. 1 root root 62M Jul 16 18:15 00000000000003e7-000000000776ea7a.wal
-rw-------. 1 root root 62M Jul 16 18:15 00000000000003e8-000000000776eb18.wal
-rw-------. 1 root root 62M Jul 16 18:15 00000000000003e9-000000000776ebe4.wal
-rw-------. 1 root root 62M Jul 16 18:15 00000000000003ea-000000000776ed05.wal
-rw-------. 1 root root 62M Jul 16 18:15 00000000000003eb-000000000776ee29.wal
-rw-------. 1 root root 62M Jul 16 18:15 00000000000003ec-000000000776eeeb.wal
-rw-------. 1 root root 62M Jul 16 18:15 00000000000003ed-000000000776efb3.wal
-rw-------. 1 root root 62M Jul 16 18:15 00000000000003ee-000000000776f06a.wal
-rw-------. 1 root root 62M Jul 16 18:16 00000000000003ef-000000000776f19a.wal
-rw-------. 1 root root 62M Jul 16 18:16 00000000000003f0-000000000776f404.wal
-rw-------. 1 root root 62M Jul 16 18:16 00000000000003f1-000000000776f4ac.wal
-rw-------. 1 root root 62M Jul 16 18:16 00000000000003f2-000000000776f55f.wal
-rw-------. 1 root root 62M Jul 16 18:16 00000000000003f3-000000000776f615.wal
-rw-------. 1 root root 62M Jul 16 18:16 00000000000003f4-000000000776f6d7.wal
-rw-------. 1 root root 62M Jul 16 18:17 00000000000003f5-000000000776fad3.wal
-rw-------. 1 root root 62M Jul 16 18:17 00000000000003f6-000000000776fe2d.wal
-rw-------. 1 root root 62M Jul 16 18:17 00000000000003f7-000000000776fed8.wal
-rw-------. 1 root root 62M Jul 16 18:17 00000000000003f8-000000000776ff8a.wal
-rw-------. 1 root root 62M Jul 16 18:17 00000000000003f9-000000000777003e.wal
-rw-------. 1 root root 62M Jul 16 18:18 00000000000003fa-00000000077700f0.wal
-rw-------. 1 root root 62M Jul 16 18:18 00000000000003fb-0000000007770592.wal
-rw-------. 1 root root 62M Jul 16 18:18 00000000000003fc-0000000007770653.wal
-rw-------. 1 root root 62M Jul 16 18:18 00000000000003fd-00000000077706ea.wal
-rw-------. 1 root root 62M Jul 16 18:18 00000000000003fe-000000000777078c.wal
-rw-------. 1 root root 62M Jul 16 18:18 00000000000003ff-0000000007770847.wal
-rw-------. 1 root root 62M Jul 16 18:19 0000000000000400-0000000007770be9.wal
-rw-------. 1 root root 62M Jul 16 18:19 0000000000000401-0000000007770e98.wal
-rw-------. 1 root root 62M Jul 16 18:19 0000000000000402-0000000007770f46.wal
-rw-------. 1 root root 62M Jul 16 18:19 0000000000000403-0000000007770ff8.wal
-rw-------. 1 root root 62M Jul 16 18:19 0000000000000404-00000000077710b3.wal
-rw-------. 1 root root 62M Jul 16 18:20 0000000000000405-000000000777115c.wal
-rw-------. 1 root root 62M Jul 16 18:20 0000000000000406-0000000007771637.wal
-rw-------. 1 root root 62M Jul 16 18:20 0000000000000407-0000000007771706.wal
-rw-------. 1 root root 62M Jul 16 18:20 0000000000000408-00000000077717ad.wal
-rw-------. 1 root root 62M Jul 16 18:20 0000000000000409-0000000007771856.wal
-rw-------. 1 root root 62M Jul 16 18:20 000000000000040a-0000000007771910.wal
-rw-------. 1 root root 62M Jul 16 18:20 000000000000040b-0000000007771a49.wal
-rw-------. 1 root root 62M Jul 16 18:20 000000000000040c-0000000007771b49.wal
-rw-------. 1 root root 62M Jul 16 18:20 000000000000040d-0000000007771c33.wal
-rw-------. 1 root root 62M Jul 16 18:20 000000000000040e-0000000007771ce4.wal
-rw-------. 1 root root 62M Jul 16 18:20 000000000000040f-0000000007771d97.wal
-rw-------. 1 root root 62M Jul 16 18:21 0000000000000410-0000000007771e9d.wal
-rw-------. 1 root root 62M Jul 16 18:21 0000000000000411-00000000077721fc.wal
-rw-------. 1 root root 62M Jul 16 18:21 0000000000000412-00000000077722e5.wal
-rw-------. 1 root root 62M Jul 16 18:21 0000000000000413-0000000007772392.wal
-rw-------. 1 root root 62M Jul 16 18:21 0000000000000414-000000000777242e.wal
-rw-------. 1 root root 62M Jul 16 18:21 0000000000000415-00000000077724de.wal
-rw-------. 1 root root 62M Jul 16 18:22 0000000000000416-00000000077725af.wal
-rw-------. 1 root root 62M Jul 16 18:22 0000000000000417-0000000007772b1b.wal
-rw-------. 1 root root 62M Jul 16 18:22 0000000000000418-0000000007772bc6.wal
-rw-------. 1 root root 62M Jul 16 18:22 0000000000000419-0000000007772c72.wal
-rw-------. 1 root root 62M Jul 16 18:22 000000000000041a-0000000007772d1b.wal
-rw-------. 1 root root 62M Jul 16 18:22 000000000000041b-0000000007772dc4.wal
-rw-------. 1 root root 62M Jul 16 18:23 000000000000041c-0000000007772ecf.wal
-rw-------. 1 root root 62M Jul 16 18:23 000000000000041d-000000000777341d.wal
-rw-------. 1 root root 62M Jul 16 18:23 000000000000041e-00000000077734b7.wal
-rw-------. 1 root root 62M Jul 16 18:23 000000000000041f-000000000777356c.wal
-rw-------. 1 root root 62M Jul 16 18:23 0000000000000420-000000000777360e.wal
-rw-------. 1 root root 62M Jul 16 18:24 0000000000000421-00000000077736c4.wal
-rw-------. 1 root root 62M Jul 16 18:24 0000000000000422-0000000007773b3d.wal
-rw-------. 1 root root 62M Jul 16 18:24 0000000000000423-0000000007773c08.wal
-rw-------. 1 root root 62M Jul 16 18:24 0000000000000424-0000000007773cb3.wal
-rw-------. 1 root root 62M Jul 16 18:24 0000000000000425-0000000007773d5c.wal
-rw-------. 1 root root 62M Jul 16 18:24 0000000000000426-0000000007773dfd.wal
-rw-------. 1 root root 62M Jul 16 18:24 0000000000000427-0000000007773ed4.wal
-rw-------. 1 root root 62M Jul 16 18:24 0000000000000428-00000000077742a8.wal
-rw-------. 1 root root 62M Jul 16 18:24 0000000000000429-0000000007774469.wal
-rw-------. 1 root root 62M Jul 16 18:24 000000000000042a-000000000777451a.wal
-rw-------. 1 root root 62M Jul 16 18:24 000000000000042b-00000000077745b9.wal
-rw-------. 1 root root 62M Jul 16 18:24 000000000000042c-0000000007774683.wal
-rw-------. 1 root root 62M Jul 16 18:25 000000000000042d-0000000007774750.wal
-rw-------. 1 root root 62M Jul 16 18:25 000000000000042e-0000000007774b15.wal
-rw-------. 1 root root 62M Jul 16 18:25 000000000000042f-0000000007774bde.wal
-rw-------. 1 root root 62M Jul 16 18:25 0000000000000430-0000000007774cac.wal
-rw-------. 1 root root 62M Jul 16 18:25 0000000000000431-0000000007774d54.wal
-rw-------. 1 root root 62M Jul 16 18:25 0000000000000432-0000000007774e09.wal
-rw-------. 1 root root 62M Jul 16 18:25 0000000000000433-0000000007774eb6.wal
-rw-------. 1 root root 62M Jul 16 18:25 0000000000000434-0000000007775037.wal
-rw-------. 1 root root 62M Jul 16 18:25 0000000000000435-0000000007775105.wal
-rw-------. 1 root root 62M Jul 16 18:25 0000000000000436-00000000077751c8.wal
-rw-------. 1 root root 62M Jul 16 18:25 0000000000000437-0000000007775298.wal
-rw-------. 1 root root 62M Jul 16 18:25 0000000000000438-0000000007775361.wal
-rw-------. 1 root root 62M Jul 16 18:26 0000000000000439-00000000077754b8.wal
-rw-------. 1 root root 62M Jul 16 18:26 000000000000043a-00000000077756ce.wal
-rw-------. 1 root root 62M Jul 16 18:26 000000000000043b-0000000007775780.wal
-rw-------. 1 root root 62M Jul 16 18:26 000000000000043c-000000000777582b.wal
-rw-------. 1 root root 62M Jul 16 18:26 000000000000043d-00000000077758d5.wal
-rw-------. 1 root root 62M Jul 16 18:27 000000000000043e-00000000077759a0.wal
-rw-------. 1 root root 62M Jul 16 18:27 000000000000043f-000000000777604e.wal
-rw-------. 1 root root 62M Jul 16 18:27 0000000000000440-000000000777610b.wal
-rw-------. 1 root root 62M Jul 16 18:27 0000000000000441-00000000077761c4.wal
-rw-------. 1 root root 62M Jul 16 18:27 0000000000000442-0000000007776268.wal
-rw-------. 1 root root 62M Jul 16 18:28 0000000000000443-0000000007776314.wal
-rw-------. 1 root root 62M Jul 16 18:28 0000000000000444-00000000077767cb.wal
-rw-------. 1 root root 62M Jul 16 18:28 0000000000000445-0000000007776886.wal
-rw-------. 1 root root 62M Jul 16 18:28 0000000000000446-000000000777692a.wal
-rw-------. 1 root root 62M Jul 16 18:28 0000000000000447-00000000077769d1.wal
-rw-------. 1 root root 62M Jul 16 18:28 0000000000000448-0000000007776a73.wal
-rw-------. 1 root root 62M Jul 16 18:29 0000000000000449-0000000007776b6a.wal
-rw-------. 1 root root 62M Jul 16 18:29 000000000000044a-00000000077770ce.wal
-rw-------. 1 root root 62M Jul 16 18:29 000000000000044b-0000000007777181.wal
-rw-------. 1 root root 62M Jul 16 18:29 000000000000044c-000000000777722a.wal
-rw-------. 1 root root 62M Jul 16 18:29 000000000000044d-00000000077772d6.wal
-rw-------. 1 root root 62M Jul 16 18:30 000000000000044e-0000000007777387.wal
-rw-------. 1 root root 62M Jul 16 18:30 000000000000044f-0000000007777849.wal
-rw-------. 1 root root 62M Jul 16 18:30 0000000000000450-0000000007777933.wal
-rw-------. 1 root root 62M Jul 16 18:30 0000000000000451-00000000077779e1.wal
-rw-------. 1 root root 62M Jul 16 18:30 0000000000000452-0000000007777a84.wal
-rw-------. 1 root root 62M Jul 16 18:30 0000000000000453-0000000007777b30.wal
-rw-------. 1 root root 62M Jul 16 18:30 0000000000000454-0000000007777c11.wal
-rw-------. 1 root root 62M Jul 16 18:31 0000000000000455-0000000007777f9a.wal
-rw-------. 1 root root 62M Jul 16 18:31 0000000000000456-000000000777830f.wal
-rw-------. 1 root root 62M Jul 16 18:31 0000000000000457-00000000077783b9.wal
-rw-------. 1 root root 62M Jul 16 18:31 0000000000000458-0000000007778463.wal
-rw-------. 1 root root 62M Jul 16 18:31 0000000000000459-0000000007778514.wal
-rw-------. 1 root root 62M Jul 16 18:32 000000000000045a-00000000077785cd.wal
-rw-------. 1 root root 62M Jul 16 18:32 000000000000045b-0000000007778b3e.wal
-rw-------. 1 root root 62M Jul 16 18:32 000000000000045c-0000000007778c18.wal
-rw-------. 1 root root 62M Jul 16 18:32 000000000000045d-0000000007778cd8.wal
-rw-------. 1 root root 62M Jul 16 18:32 000000000000045e-0000000007778d78.wal
-rw-------. 1 root root 62M Jul 16 18:32 000000000000045f-0000000007778e03.wal
-rw-------. 1 root root 62M Jul 16 18:33 0000000000000460-0000000007778f33.wal
-rw-------. 1 root root 62M Jul 16 18:33 0000000000000461-0000000007779479.wal
-rw-------. 1 root root 62M Jul 16 18:33 0000000000000462-0000000007779527.wal
-rw-------. 1 root root 62M Jul 16 18:33 0000000000000463-00000000077795cf.wal
-rw-------. 1 root root 62M Jul 16 18:33 0000000000000464-0000000007779682.wal
-rw-------. 1 root root 62M Jul 16 18:34 0000000000000465-0000000007779730.wal
-rw-------. 1 root root 62M Jul 16 18:34 0000000000000466-0000000007779b46.wal
-rw-------. 1 root root 62M Jul 16 18:34 0000000000000467-0000000007779c6d.wal
-rw-------. 1 root root 62M Jul 16 18:34 0000000000000468-0000000007779d1e.wal
-rw-------. 1 root root 62M Jul 16 18:34 0000000000000469-0000000007779dc0.wal
-rw-------. 1 root root 62M Jul 16 18:34 000000000000046a-0000000007779e66.wal
-rw-------. 1 root root 62M Jul 16 18:34 000000000000046b-0000000007779f19.wal
-rw-------. 1 root root 62M Jul 16 18:34 000000000000046c-000000000777a34b.wal
-rw-------. 1 root root 62M Jul 16 18:34 000000000000046d-000000000777a525.wal
-rw-------. 1 root root 62M Jul 16 18:34 000000000000046e-000000000777a5ca.wal
-rw-------. 1 root root 62M Jul 16 18:34 000000000000046f-000000000777a67c.wal
-rw-------. 1 root root 62M Jul 16 18:34 0000000000000470-000000000777a739.wal
-rw-------. 1 root root 62M Jul 16 18:35 0000000000000471-000000000777a7fc.wal
-rw-------. 1 root root 62M Jul 16 18:35 0000000000000472-000000000777ab9a.wal
-rw-------. 1 root root 62M Jul 16 18:35 0000000000000473-000000000777ac55.wal
-rw-------. 1 root root 62M Jul 16 18:35 0000000000000474-000000000777ad0d.wal
-rw-------. 1 root root 62M Jul 16 18:35 0000000000000475-000000000777adc8.wal
-rw-------. 1 root root 62M Jul 16 18:35 0000000000000476-000000000777ae7e.wal
-rw-------. 1 root root 62M Jul 16 18:35 0000000000000477-000000000777af33.wal
-rw-------. 1 root root 62M Jul 16 18:35 0000000000000478-000000000777b0bc.wal
-rw-------. 1 root root 62M Jul 16 18:35 0000000000000479-000000000777b178.wal
-rw-------. 1 root root 62M Jul 16 18:35 000000000000047a-000000000777b230.wal
-rw-------. 1 root root 62M Jul 16 18:35 000000000000047b-000000000777b2df.wal
-rw-------. 1 root root 62M Jul 16 18:36 000000000000047c-000000000777b3c2.wal
-rw-------. 1 root root 62M Jul 16 18:36 000000000000047d-000000000777b5e7.wal
-rw-------. 1 root root 62M Jul 16 18:36 000000000000047e-000000000777b742.wal
-rw-------. 1 root root 62M Jul 16 18:36 000000000000047f-000000000777b805.wal
-rw-------. 1 root root 62M Jul 16 18:36 0000000000000480-000000000777b8b3.wal
-rw-------. 1 root root 62M Jul 16 18:36 0000000000000481-000000000777b959.wal
-rw-------. 1 root root 62M Jul 16 18:37 0000000000000482-000000000777ba15.wal
-rw-------. 1 root root 62M Jul 16 18:37 0000000000000483-000000000777c078.wal
-rw-------. 1 root root 62M Jul 16 18:37 0000000000000484-000000000777c153.wal
-rw-------. 1 root root 62M Jul 16 18:37 0000000000000485-000000000777c221.wal
-rw-------. 1 root root 62M Jul 16 18:37 0000000000000486-000000000777c2bc.wal
-rw-------. 1 root root 62M Jul 16 18:37 0000000000000487-000000000777c385.wal
-rw-------. 1 root root 62M Jul 16 18:38 0000000000000488-000000000777c4d1.wal
-rw-------. 1 root root 62M Jul 16 18:38 0000000000000489-000000000777c8da.wal
-rw-------. 1 root root 62M Jul 16 18:38 000000000000048a-000000000777c978.wal
-rw-------. 1 root root 62M Jul 16 18:38 000000000000048b-000000000777ca2a.wal
-rw-------. 1 root root 62M Jul 16 18:38 000000000000048c-000000000777cacb.wal
-rw-------. 1 root root 62M Jul 16 18:39 000000000000048d-000000000777cbd0.wal
-rw-------. 1 root root 62M Jul 16 18:39 000000000000048e-000000000777d155.wal
-rw-------. 1 root root 62M Jul 16 18:39 000000000000048f-000000000777d205.wal
-rw-------. 1 root root 62M Jul 16 18:39 0000000000000490-000000000777d2a7.wal
-rw-------. 1 root root 62M Jul 16 18:39 0000000000000491-000000000777d35a.wal
-rw-------. 1 root root 62M Jul 16 18:40 0000000000000492-000000000777d405.wal
-rw-------. 1 root root 62M Jul 16 18:40 0000000000000493-000000000777d8f4.wal
-rw-------. 1 root root 62M Jul 16 18:40 0000000000000494-000000000777d9b3.wal
-rw-------. 1 root root 62M Jul 16 18:40 0000000000000495-000000000777da78.wal
-rw-------. 1 root root 62M Jul 16 18:40 0000000000000496-000000000777db16.wal
-rw-------. 1 root root 62M Jul 16 18:40 0000000000000497-000000000777dbc4.wal
-rw-------. 1 root root 62M Jul 16 18:40 0000000000000498-000000000777dc86.wal
-rw-------. 1 root root 62M Jul 16 18:40 0000000000000499-000000000777de1a.wal
-rw-------. 1 root root 62M Jul 16 18:40 000000000000049a-000000000777deef.wal
-rw-------. 1 root root 62M Jul 16 18:40 000000000000049b-000000000777dfb1.wal
-rw-------. 1 root root 62M Jul 16 18:40 000000000000049c-000000000777e054.wal
-rw-------. 1 root root 62M Jul 16 18:40 000000000000049d-000000000777e10d.wal
-rw-------. 1 root root 62M Jul 16 18:41 000000000000049e-000000000777e35b.wal
-rw-------. 1 root root 62M Jul 16 18:41 000000000000049f-000000000777e5c7.wal
-rw-------. 1 root root 62M Jul 16 18:41 00000000000004a0-000000000777e66c.wal
-rw-------. 1 root root 62M Jul 16 18:41 00000000000004a1-000000000777e720.wal
-rw-------. 1 root root 62M Jul 16 18:41 00000000000004a2-000000000777e7bd.wal
-rw-------. 1 root root 62M Jul 16 18:41 00000000000004a3-000000000777e87f.wal
-rw-------. 1 root root 62M Jul 16 18:42 00000000000004a4-000000000777eae2.wal
-rw-------. 1 root root 62M Jul 16 18:42 00000000000004a5-000000000777edff.wal
-rw-------. 1 root root 62M Jul 16 18:42 00000000000004a6-000000000777eeab.wal
-rw-------. 1 root root 62M Jul 16 18:42 00000000000004a7-000000000777ef48.wal
-rw-------. 1 root root 62M Jul 16 18:42 00000000000004a8-000000000777effa.wal
-rw-------. 1 root root 62M Jul 16 18:43 00000000000004a9-000000000777f0ab.wal
-rw-------. 1 root root 62M Jul 16 18:43 00000000000004aa-000000000777f6fc.wal
-rw-------. 1 root root 62M Jul 16 18:43 00000000000004ab-000000000777f7b6.wal
-rw-------. 1 root root 62M Jul 16 18:43 00000000000004ac-000000000777f85b.wal
-rw-------. 1 root root 62M Jul 16 18:43 00000000000004ad-000000000777f906.wal
-rw-------. 1 root root 62M Jul 16 18:43 00000000000004ae-000000000777f9b4.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004af-000000000777fb0e.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004b0-000000000777ff08.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004b1-000000000777ffa2.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004b2-0000000007780053.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004b3-00000000077800f9.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004b4-00000000077801af.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004b5-0000000007780560.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004b6-00000000077807f5.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004b7-000000000778089d.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004b8-000000000778094b.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004b9-0000000007780a08.wal
-rw-------. 1 root root 62M Jul 16 18:44 00000000000004ba-0000000007780ab7.wal
-rw-------. 1 root root 62M Jul 16 18:45 00000000000004bb-0000000007780c3f.wal
-rw-------. 1 root root 62M Jul 16 18:45 00000000000004bc-0000000007780ee2.wal
-rw-------. 1 root root 62M Jul 16 18:45 00000000000004bd-0000000007780f8b.wal
-rw-------. 1 root root 62M Jul 16 18:45 00000000000004be-0000000007781041.wal
-rw-------. 1 root root 62M Jul 16 18:45 00000000000004bf-0000000007781100.wal
-rw-------. 1 root root 62M Jul 16 18:45 00000000000004c0-00000000077811c0.wal
-rw-------. 1 root root 62M Jul 16 18:45 00000000000004c1-00000000077812f0.wal
-rw-------. 1 root root 62M Jul 16 18:45 00000000000004c2-00000000077813bb.wal
-rw-------. 1 root root 62M Jul 16 18:45 00000000000004c3-000000000778148a.wal
-rw-------. 1 root root 62M Jul 16 18:45 00000000000004c4-0000000007781522.wal
-rw-------. 1 root root 62M Jul 16 18:45 00000000000004c5-00000000077815ef.wal
-rw-------. 1 root root 62M Jul 16 18:46 00000000000004c6-000000000778178b.wal
-rw-------. 1 root root 62M Jul 16 18:46 00000000000004c7-00000000077819af.wal
-rw-------. 1 root root 62M Jul 16 18:46 00000000000004c8-0000000007781a58.wal
-rw-------. 1 root root 62M Jul 16 18:46 00000000000004c9-0000000007781b1c.wal
-rw-------. 1 root root 62M Jul 16 18:46 00000000000004ca-0000000007781bce.wal
-rw-------. 1 root root 62M Jul 16 18:47 00000000000004cb-0000000007781c84.wal
-rw-------. 1 root root 62M Jul 16 18:47 00000000000004cc-00000000077821c4.wal
-rw-------. 1 root root 62M Jul 16 18:47 00000000000004cd-0000000007782342.wal
-rw-------. 1 root root 62M Jul 16 18:47 00000000000004ce-000000000778240c.wal
-rw-------. 1 root root 62M Jul 16 18:47 00000000000004cf-000000000778249f.wal
-rw-------. 1 root root 62M Jul 16 18:47 00000000000004d0-000000000778254a.wal
-rw-------. 1 root root 62M Jul 16 18:48 00000000000004d1-0000000007782681.wal
-rw-------. 1 root root 62M Jul 16 18:48 00000000000004d2-0000000007782b85.wal
-rw-------. 1 root root 62M Jul 16 18:48 00000000000004d3-0000000007782c23.wal
-rw-------. 1 root root 62M Jul 16 18:48 00000000000004d4-0000000007782cc5.wal
-rw-------. 1 root root 62M Jul 16 18:48 00000000000004d5-0000000007782d76.wal
-rw-------. 1 root root 62M Jul 16 18:49 00000000000004d6-0000000007782e6c.wal
-rw-------. 1 root root 62M Jul 16 18:49 00000000000004d7-0000000007783455.wal
-rw-------. 1 root root 62M Jul 16 18:49 00000000000004d8-0000000007783505.wal
-rw-------. 1 root root 62M Jul 16 18:49 00000000000004d9-00000000077835d8.wal
-rw-------. 1 root root 62M Jul 16 18:49 00000000000004da-0000000007783678.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004db-0000000007783727.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004dc-0000000007783c0e.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004dd-0000000007783d0c.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004de-0000000007783da5.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004df-0000000007783e41.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004e0-0000000007783ee9.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004e1-0000000007783fdd.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004e2-0000000007784126.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004e3-0000000007784216.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004e4-00000000077842b5.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004e5-000000000778435a.wal
-rw-------. 1 root root 62M Jul 16 18:50 00000000000004e6-0000000007784425.wal
-rw-------. 1 root root 62M Jul 16 18:51 00000000000004e7-0000000007784594.wal
-rw-------. 1 root root 62M Jul 16 18:51 00000000000004e8-0000000007784909.wal
-rw-------. 1 root root 62M Jul 16 18:51 00000000000004e9-000000000778499e.wal
-rw-------. 1 root root 62M Jul 16 18:51 00000000000004ea-0000000007784a5e.wal
-rw-------. 1 root root 62M Jul 16 18:51 00000000000004eb-0000000007784b0c.wal
-rw-------. 1 root root 62M Jul 16 18:51 00000000000004ec-0000000007784bd1.wal
-rw-------. 1 root root 62M Jul 16 18:52 00000000000004ed-0000000007784cee.wal
-rw-------. 1 root root 62M Jul 16 18:52 00000000000004ee-0000000007785196.wal
-rw-------. 1 root root 62M Jul 16 18:52 00000000000004ef-000000000778524f.wal
-rw-------. 1 root root 62M Jul 16 18:52 00000000000004f0-0000000007785307.wal
-rw-------. 1 root root 62M Jul 16 18:52 00000000000004f1-000000000778539e.wal
-rw-------. 1 root root 62M Jul 16 18:52 00000000000004f2-0000000007785452.wal
-rw-------. 1 root root 62M Jul 16 18:53 00000000000004f3-00000000077858b4.wal
-rw-------. 1 root root 62M Jul 16 18:53 00000000000004f4-0000000007785bff.wal
-rw-------. 1 root root 62M Jul 16 18:53 00000000000004f5-0000000007785cb0.wal
-rw-------. 1 root root 62M Jul 16 18:53 00000000000004f6-0000000007785d6d.wal
-rw-------. 1 root root 62M Jul 16 18:53 00000000000004f7-0000000007785e17.wal
-rw-------. 1 root root 62M Jul 16 19:40 00000000000004f8-0000000007785ed3.wal

This is from an etcd node that had started with a blank disk and running for less than three hours, on a fairly small Kubernetes cluster (less than 800 pods across 20 nodes)

@xiang90
Copy link
Contributor

xiang90 commented Jul 16, 2019

can you get the logs from etcd server

@dharmab
Copy link
Author

dharmab commented Jul 17, 2019

Sure, @xiang90. The logs will likely be very large and we'll have to pull a large amount of data from Splunk- are there any particular lines or patterns you are interested in?

@dharmab
Copy link
Author

dharmab commented Jul 17, 2019

I've pulled 3 hours of etcd logs from our Splunk. The logs start just before we begin maintenance on the 3 etcd nodes to change the max-wals setting. The file is a JSON array where each log line is a JSON object. The result._time field is the timestamp, the result.node_ip field is the peer IP, and the result.message field is the log line. The timeframe in the logs overlaps the timestamp in the above WAL listing.

etcd-issue-10885.tar.gz

@xiang90
Copy link
Contributor

xiang90 commented Jul 18, 2019

Potential problems:

  1. check what is written into etcd through k8s. do you have very large CRDs? It is not normal to write more than 200MB data into etcd within 1 minutes for a small k8s cluster
  2. if you do write large items into etcd, then you also need to reduce the --snapshot-count. the in use wal entries cannot be cleaned up.
  3. there are also a bunch of healthy check failures from etcd peers. you also need to fix that.

@dharmab
Copy link
Author

dharmab commented Aug 7, 2019

We've implemented a few changes:

  • Reduced snapshot count back to pre-3.2 setting
  • Moved the WAL files to separate, large disks
  • Disabled some workload features which used very large CRDs

Thanks all for the advice! Closing this issue- will reopen if needed.

@dharmab dharmab closed this as completed Aug 7, 2019
@juliangomez-samana
Copy link

juliangomez-samana commented Apr 29, 2021

Hello everyone. My team had the same issue. In our case we are using a Docker with ETCD and the excessive WAL files were not only consuming the assigned disk, but also creating a high memory consumption that was causing the Docker being to stop when it reached more than 4 Gb of memory.

We found that "ETCD was taking snapshots every 10,000 records by default, and the WAL files couldn’t be deleted". We identified that the Docker was not creating any snapshots. We decided to modify our ETCD Docker with the following values: "--max-snapshots=2 --max-wals=5 --enable-v2 -auto-compaction-retention 1 --snapshot-count 5000".

After these changes, the Docker started creating snapshots every 25 minutes and the WAL files are being deleted so the device is working with 2 snapshots and just 5 WALs. The disk is not being overloaded and the memory consumption goes from 1.2 Gb to 2.3 Gb maximum and then goes back to 1.2 Gb and as a result, the Docker is not getting stuck as the memory stays below 4 Gb.

I hope this comment is useful.

Regards,

Julian Gomez
Samana Group.

@yguo0905
Copy link

The problem still exists. Could we reopen this issue?

@serathius

@ahrtr
Copy link
Member

ahrtr commented Jun 24, 2023

The problem still exists. Could we reopen this issue?

Could you raise a new issue and provide the following info?

  1. etcd version
  2. etcd logs

@bsdnet
Copy link

bsdnet commented Jul 5, 2023

@yguo0905 @ahrtr

The etcd version is v3.4.13. What etcd logs we are looking for - client, server or?

@thdrnsdk
Copy link

thdrnsdk commented Nov 9, 2023

same issue in our cluster

@moonek
Copy link

moonek commented Nov 9, 2023

Since max-wals is not declared, the default value (5) is used, but the wal file exceeds 150 and the storage is full.
(etcd version: v3.5.9)

please reopen this issue.

@jmhbnz
Copy link
Member

jmhbnz commented Nov 9, 2023

Hey @moonek, @thdrnsdk - We have a new bug template which makes sure we get information to ensure we can properly investigate bug reports.

Can one or both of you please raise a new issue using the Bug template and ensure all the requested fields are provided.

Many thanks!

@bsdnet
Copy link

bsdnet commented Dec 31, 2023

Do we have new issue to track this?

@tigertall
Copy link

Since v3.2, the default value of --snapshot-count has changed from from 10,000 to 100,000.

If the snapshots are too infrequent, there can be more than --max-wals=5, as file-system level locks are protecting the files preventing them from being deleted too early.

maybe because this ? so change snapshot-count to a little value cause snapshot in little time, then wal file can clean more often 。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests