Replies: 37 comments
-
|
Beta Was this translation helpful? Give feedback.
-
Also, |
Beta Was this translation helpful? Give feedback.
-
@liulikun |
Beta Was this translation helpful? Give feedback.
-
Is it because of OOM error? How much memory do you have in the container? Did you have a gigantic SQL transaction at the stopped binlog position? |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
That seems close to the default |
Beta Was this translation helpful? Give feedback.
-
@liulikun Thanks! The error log is a little bit different. |
Beta Was this translation helpful? Give feedback.
-
@liulikun, big transactions should just spool to disk, advising to skip them is not generally good practice. @olraxy, I think it's quite possible you're still encountering out-of-memory issues that bubble up in weird ways. I'd be interested to see a snapshot of |
Beta Was this translation helpful? Give feedback.
-
@osheroff
|
Beta Was this translation helpful? Give feedback.
-
wow, that's slammed. what kind of transaction volume are we talking about here? thinking https://github.com/patric-r/jvmtop/blob/master/doc/ConsoleProfiler.md might be a good line of investigation here... |
Beta Was this translation helpful? Give feedback.
-
mysql uses 4 bytes to store the If the transaction is too big, it can overflow the If you can use
|
Beta Was this translation helpful? Give feedback.
-
@osheroff @liulikun |
Beta Was this translation helpful? Give feedback.
-
You can set environment variable |
Beta Was this translation helpful? Give feedback.
-
@olraxy
|
Beta Was this translation helpful? Give feedback.
-
JvmTop 0.8.0 alpha - 10:02:38, amd64, 4 cpus, Linux 4.4.0-104, load avg 6.43
|
Beta Was this translation helpful? Give feedback.
-
Hi @ypereirareis what version of maxwell are you on? Is this the @surendra-outreach it depends on your workload (esp how big your transactions are, as well as how large your schemas are) but for a normal workload a 2GB heap should be fine. |
Beta Was this translation helpful? Give feedback.
-
@osheroff - our current configuration is like 200 databases on a mysql server with each data base having 175 tales. Our pod are configured with 6GB, with two full cores. My question is more about good Java GC configuration settings, like: -server -Xms2G -Xmx6G -XX:PermSize=512m -XX:+UseG1GC -XX:MaxGCPauseMillis=? -XX:ParallelGCThreads=? -XX:ConcGCThreads=? -XX:InitiatingHeapOccupancyPercent=? What is your recommendations for GC settings? https://docs.oracle.com/cd/E55119_01/doc.71/e55122/cnf_jvmgc.htm#WSEAD420 |
Beta Was this translation helpful? Give feedback.
-
@surendra-outreach I don't have direct experience running with such a large schema. G1 is a good choice of garbage collector, but I wouldn't twiddle any of the knobs (except of course Are you experiencing any issues or excessive CPU due to GC? I do also recommend that you collect some JVM metrics about GC'ing, especially if you start to tweak these knobs. |
Beta Was this translation helpful? Give feedback.
-
@osheroff - thanks for the suggestions. One data point I forgot to mention was some transactions are large, usually their size range from 250k - 500K. We do observe excessive CPU, but not sure if it is correlated to GC. We are currently recycling PODs when memory usage is above 90%. Does this any lead to data loss? Also, does MD use producer, consumer, and a queue model design pattern to process binlog? And Is there any chance for memory leak in some lib? |
Beta Was this translation helpful? Give feedback.
-
No data loss should happen. If maxwell is hard-killed you may get some data duplication.
yes. the binlog replicator is on its own thread and produces rows into a queue. these are consumed by maxwell and fed to the producer. Some producers also have a queue (kafka, kinesis, others), whereas some just write the row directly to the producer.
Sure, why not? We fixed a memory leak in mysql-binlog-connector some releases back. Further leaks are possible, but it's hard to know for sure without proper JVM instrumentation; just because Maxwell is using 6GB of memory does not mean that it has a 6gb active heap. |
Beta Was this translation helpful? Give feedback.
-
@osheroff - thanks a lot for the response. |
Beta Was this translation helpful? Give feedback.
-
@surendra-outreach what version of maxwell are you running? |
Beta Was this translation helpful? Give feedback.
-
no *known* memory leaks in 1.29.2. If you find a maxwell process that you
believe to be leaking, go ahead and capture a heap dump using `jmap -dump:live PID`.
…On Wed, Feb 17, 2021 at 2:22 PM surendra-outreach ***@***.***> wrote:
v1.29.2
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1027 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAB7P5AR2QRGLTMBO2QBFBLS7Q6QXANCNFSM4FGMFAWA>
.
|
Beta Was this translation helpful? Give feedback.
-
@osheroff - Here is what I observed in our case with OOM issue. The binlog reader is fetching changes a lot faster into queue than the Kafka producer (we use most reliable config for at least once processing) can keep up. As a result, unbounded queue is growing and leading to OOM event. Is there way limit the queue size? Or any future plan to implementing back pressure strategy? Thanks for all the help and guidance. |
Beta Was this translation helpful? Give feedback.
-
The queues inside maxwell all have bounds iirc. I’ll double check, but I think they’re all limited to like 20-30 rows.
You might want to check your Kafka queue sizes. Also you might want to get a heap dump.
… On Feb 21, 2021, at 09:16, surendra-outreach ***@***.***> wrote:
@osheroff - Here is what I observed in our case with OOM issue. The binlog reader is fetching changes a lot faster into queue than the Kafka producer (we use most reliable config for at least once processing) can keep up. As a result, unbounded queue is growing and leading to OOM event. Is there way limit the queue size? Or any future plan to implementing back pressure strategy? Thanks for all the help and guidance.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Beta Was this translation helpful? Give feedback.
-
@osheroff - Are you referring to buffer.memory settings for Kafka producer. Our current MD config for producer is aimed for max reliability, like
But don't see configuring anything related to Kafka producer client buffer sizes. Thanks |
Beta Was this translation helpful? Give feedback.
-
so there's three queues:
There's also a transaction buffer that can take up a lot of memory (up to 25% of configured max). Again, though, without a heap dump (captured via |
Beta Was this translation helpful? Give feedback.
-
@osheroff - Thanks for the details. We are only seeing this behavior during the huge transaction surge. If we bump up the POD memory, it works and takes a while to gradually delete the queue. We usefully run with 4GB or 6GB. For example, last week, someone ran a bulk update on a large table that resulted in 10+ million CDC log events. We increased the POD memory to 16GB, then it was able to handle the load and clear the backlog gradually. Usually the memory usage comes to 2GB, which is a normal state. At this point I suspect that Kafka producer (consumer) is not able to keep up with the bin log reader (producer) rate. "There's also a transaction buffer that can take up a lot of memory (up to 25% of configured max)." - Is this related to Kafka? Can you please clarify. Yes, next time i will post heap dump from jmap. Thanks |
Beta Was this translation helpful? Give feedback.
-
idk how relevant this info all still is, but here it is anyway |
Beta Was this translation helpful? Give feedback.
-
Hi,
I run maxwell with docker.
logs in maxwell container:
09:18:54,971 INFO AbstractSchemaStore - storing schema @position[BinlogPosition[mysql-bin.001323:363446621], lastHeartbeat=1529572731838] after applying "CREATE TABLE IF NOT EXISTS
block_site_custom_category
(id
INTEGER (11) PRIMARY KEY AUTO_INCREMENT NOT NULL,name
VARCHAR (255) NOT NULL,forbidden
INTEGER (1) NOT NULL DEFAULT 1)" to test_db, new schema id is 26309:18:54,982 INFO AbstractSchemaStore - storing schema @position[BinlogPosition[mysql-bin.001323:363450605], lastHeartbeat=1529572731838] after applying "ALTER TABLE
block_site_custom
ADD COLUMNcategoryID
INTEGER (11) NOT NULL DEFAULT 1" to test_db, new schema id is 26409:18:54,990 INFO AbstractSchemaStore - storing schema @position[BinlogPosition[mysql-bin.001323:363453573], lastHeartbeat=1529572731838] after applying "ALTER TABLE
terminal
ADD COLUMNpci_mac_list
VARCHAR (255) NOT NULL DEFAULT ''" to test_db, new schema id is 26512:18:14,676 INFO BinaryLogClient - Trying to restore lost connection to 10.0.0.11:3306
12:18:16,453 WARN BinlogConnectorReplicator - replicator stopped at position: mysql-bin.001324:729514106 -- restarting
12:18:16,482 INFO BinlogConnectorLifecycleListener - Binlog disconnected.
Exception in thread "blc-10.0.0.11:3306" java.lang.IllegalStateException: BinaryLogClient is already connected
at com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:473)
at com.github.shyiko.mysql.binlog.BinaryLogClient$7.run(BinaryLogClient.java:793)
at java.lang.Thread.run(Thread.java:748)
12:18:26,241 INFO BinaryLogClient - Connected to 10.0.0.11:3306 at mysql-bin.001324/729514106 (sid:6379, cid:647968501)
12:18:26,242 INFO BinlogConnectorLifecycleListener - Binlog connected.
01:44:18,897 INFO MaxwellContext - Sending final heartbeat: 1529631856281
The value
binlog_file
in maxwell.positions is stillmysql-bin.001324
before I restart maxwell container.Now ismysql-bin.001328
Any idea?
Beta Was this translation helpful? Give feedback.
All reactions