Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in StringBuilder cache for Point.lineprotocol() #521

Closed
rhajek opened this issue Sep 17, 2018 · 9 comments
Closed

Memory leak in StringBuilder cache for Point.lineprotocol() #521

rhajek opened this issue Sep 17, 2018 · 9 comments
Labels
Milestone

Comments

@rhajek
Copy link
Contributor

rhajek commented Sep 17, 2018

Hello,

Point.CACHED_STRINGBUILDERS is per thread cache to store a StringBuilder instances for constructing lineprotocol. As the key in the map is used measurement name, the value is StringBuilder instance. The instance of StringBuilder is reused for all points with the same measurememnt.

This optimalization reduces GC activity when writing large number of Points but increases the memory consumption when we have large number of different measurements. The cache can memory leake, because it is never evicted programmatically and it is garbage collected only when the holding thread dies. StringBuilder.setLength(length) is used for reusing instance, but is does not free the builder memory.

OutOfMemory can be reproduced by writing large number of Points with unique measurements with long line protocol length in one thread.

@majst01
Copy link
Collaborator

majst01 commented Sep 17, 2018

Yes, but do you have a idea to prevent that situation ? Having a massive amount of measurements is of course memory intensive, but the performance gain we have actually is massive.

rhajek added a commit to bonitoo-io/influxdb-java that referenced this issue Sep 17, 2018
@rhajek
Copy link
Contributor Author

rhajek commented Sep 17, 2018

I tried to replace

private static final ThreadLocal<Map<String, MeasurementStringBuilder>>

with simple thread local shared StringBuilder and it performs better cca 10% for large number of unique measurements (20k+). Performance for less complex lineprotocol is not degraded. I used inch-java to measure this.

private static final ThreadLocal<StringBuilder> CACHED_STRINGBUILDERS

Version without thread local map:

Host: http://localhost:8086
Client version: 2.14-sb2-SNAPSHOT
InfluxDB server version: 1.6.2
Concurrency: 5
Measurements: 50000
Tag cardinalities: [2, 2, 2]
points per series: 2
Total series: 400000
Total points: 800000
Total fields per point: 3
Batch Size: 5000
Database: j_stress (Shard duration: 7d)
Write Consistency: any
BatchPoints set prepared.
Start time: 1537185957047000000
Approx End time: 1537189557047000000
T=0 0 points written. Total throughput: � pt/sec | � val/sec. Current throughput: 0 val/sec Memory usage: 1316784KB
T=1 10,000 points written. Total throughput: 10,000 pt/sec | 30,000 val/sec. Current throughput: 10,000 val/sec Memory usage: 1309640KB
T=3 50,000 points written. Total throughput: 16,666.666 pt/sec | 50,000 val/sec. Current throughput: 40,000 val/sec Memory usage: 1248597KB
T=4 55,000 points written. Total throughput: 13,750 pt/sec | 41,250 val/sec. Current throughput: 5,000 val/sec Memory usage: 1320253KB
T=5 55,000 points written. Total throughput: 11,000 pt/sec | 33,000 val/sec. Current throughput: 0 val/sec Memory usage: 1320253KB
T=6 85,000 points written. Total throughput: 14,166.667 pt/sec | 42,500 val/sec. Current throughput: 30,000 val/sec Memory usage: 1241132KB
T=7 115,000 points written. Total throughput: 16,428.572 pt/sec | 49,285.719 val/sec. Current throughput: 30,000 val/sec Memory usage: 1123827KB
T=8 165,000 points written. Total throughput: 20,625 pt/sec | 61,875 val/sec. Current throughput: 50,000 val/sec Memory usage: 1247893KB
T=9 185,000 points written. Total throughput: 20,555.555 pt/sec | 61,666.664 val/sec. Current throughput: 20,000 val/sec Memory usage: 1302883KB
T=10 245,000 points written. Total throughput: 24,500 pt/sec | 73,500 val/sec. Current throughput: 60,000 val/sec Memory usage: 1284339KB
T=11 305,000 points written. Total throughput: 27,727.273 pt/sec | 83,181.82 val/sec. Current throughput: 60,000 val/sec Memory usage: 1210949KB
T=12 385,000 points written. Total throughput: 32,083.334 pt/sec | 96,250 val/sec. Current throughput: 80,000 val/sec Memory usage: 1254096KB
T=13 445,000 points written. Total throughput: 34,230.77 pt/sec | 102,692.312 val/sec. Current throughput: 60,000 val/sec Memory usage: 1185074KB
T=14 495,000 points written. Total throughput: 35,357.145 pt/sec | 106,071.438 val/sec. Current throughput: 50,000 val/sec Memory usage: 1280055KB
T=15 555,000 points written. Total throughput: 37,000 pt/sec | 111,000 val/sec. Current throughput: 60,000 val/sec Memory usage: 1198374KB
T=16 605,000 points written. Total throughput: 37,812.5 pt/sec | 113,437.5 val/sec. Current throughput: 50,000 val/sec Memory usage: 1286847KB
T=17 635,000 points written. Total throughput: 37,352.941 pt/sec | 112,058.828 val/sec. Current throughput: 30,000 val/sec Memory usage: 1245930KB
T=18 690,000 points written. Total throughput: 38,333.332 pt/sec | 115,000 val/sec. Current throughput: 55,000 val/sec Memory usage: 1366195KB
T=19 740,000 points written. Total throughput: 38,947.367 pt/sec | 116,842.102 val/sec. Current throughput: 50,000 val/sec Memory usage: 1222036KB
T=20 790,000 points written. Total throughput: 39,500 pt/sec | 118,500 val/sec. Current throughput: 50,000 val/sec Memory usage: 1190546KB
T=21 800,000 points written. Total throughput: 38,095.238 pt/sec | 114,285.719 val/sec. Current throughput: 10,000 val/sec Memory usage: 1190559KB
Total time: 21 seconds

2.13 version with thread local map:

Host: http://localhost:8086
Client version: 2.13
InfluxDB server version: 1.6.2
Concurrency: 5
Measurements: 50000
Tag cardinalities: [2, 2, 2]
points per series: 2
Total series: 400000
Total points: 800000
Total fields per point: 3
Batch Size: 5000
Database: j_stress (Shard duration: 7d)
Write Consistency: any
BatchPoints set prepared.
Start time: 1537186003598000000
Approx End time: 1537189603598000000
T=0 0 points written. Total throughput: � pt/sec | � val/sec. Current throughput: 0 val/sec Memory usage: 1381985KB
T=1 0 points written. Total throughput: 0 pt/sec | 0 val/sec. Current throughput: 0 val/sec Memory usage: 1311647KB
T=2 10,000 points written. Total throughput: 5,000 pt/sec | 15,000 val/sec. Current throughput: 10,000 val/sec Memory usage: 1336938KB
T=4 40,000 points written. Total throughput: 10,000 pt/sec | 30,000 val/sec. Current throughput: 30,000 val/sec Memory usage: 1293958KB
T=5 80,000 points written. Total throughput: 16,000 pt/sec | 48,000 val/sec. Current throughput: 40,000 val/sec Memory usage: 1261947KB
T=6 115,000 points written. Total throughput: 19,166.666 pt/sec | 57,500 val/sec. Current throughput: 35,000 val/sec Memory usage: 1437589KB
T=7 145,000 points written. Total throughput: 20,714.285 pt/sec | 62,142.855 val/sec. Current throughput: 30,000 val/sec Memory usage: 1240293KB
T=8 170,000 points written. Total throughput: 21,250 pt/sec | 63,750 val/sec. Current throughput: 25,000 val/sec Memory usage: 1174462KB
T=9 215,000 points written. Total throughput: 23,888.889 pt/sec | 71,666.664 val/sec. Current throughput: 45,000 val/sec Memory usage: 1416121KB
T=10 255,000 points written. Total throughput: 25,500 pt/sec | 76,500 val/sec. Current throughput: 40,000 val/sec Memory usage: 1110174KB
T=11 320,000 points written. Total throughput: 29,090.908 pt/sec | 87,272.727 val/sec. Current throughput: 65,000 val/sec Memory usage: 1336481KB
T=12 395,000 points written. Total throughput: 32,916.668 pt/sec | 98,750 val/sec. Current throughput: 75,000 val/sec Memory usage: 1193927KB
T=13 440,000 points written. Total throughput: 33,846.152 pt/sec | 101,538.453 val/sec. Current throughput: 45,000 val/sec Memory usage: 1254888KB
T=14 495,000 points written. Total throughput: 35,357.145 pt/sec | 106,071.438 val/sec. Current throughput: 55,000 val/sec Memory usage: 1160631KB
T=15 500,000 points written. Total throughput: 33,333.332 pt/sec | 100,000 val/sec. Current throughput: 5,000 val/sec Memory usage: 1192990KB
T=16 550,000 points written. Total throughput: 34,375 pt/sec | 103,125 val/sec. Current throughput: 50,000 val/sec Memory usage: 1279946KB
T=17 550,000 points written. Total throughput: 32,352.941 pt/sec | 97,058.828 val/sec. Current throughput: 0 val/sec Memory usage: 1279955KB
T=18 550,000 points written. Total throughput: 30,555.555 pt/sec | 91,666.664 val/sec. Current throughput: 0 val/sec Memory usage: 1279973KB
T=19 600,000 points written. Total throughput: 31,578.947 pt/sec | 94,736.844 val/sec. Current throughput: 50,000 val/sec Memory usage: 1190273KB
T=20 650,000 points written. Total throughput: 32,500 pt/sec | 97,500 val/sec. Current throughput: 50,000 val/sec Memory usage: 1276285KB
T=21 720,000 points written. Total throughput: 34,285.715 pt/sec | 102,857.141 val/sec. Current throughput: 70,000 val/sec Memory usage: 1240949KB
T=22 775,000 points written. Total throughput: 35,227.273 pt/sec | 105,681.82 val/sec. Current throughput: 55,000 val/sec Memory usage: 1408671KB
T=23 800,000 points written. Total throughput: 34,782.609 pt/sec | 104,347.828 val/sec. Current throughput: 25,000 val/sec Memory usage: 1408684KB
Total time: 23 seconds

@fmachado
Copy link
Contributor

fmachado commented Sep 17, 2018

@rhajek I have some questions here:

  • Your test is measuring throughput but the ticket description is related with memory usage and GC not being able to clean up non-referenced objects. Could you please provide a testable scenario where we could evaluate your assumptions?

I think the whole purpose of having a ThreadLocal<Map<String, MeasurementStringBuilder>> was to "cache" StringBuilder objects per measurement. If I got it right, you suggest with your PR that we change from ThreadLocal<Map<String, MeasurementStringBuilder>> to ThreadLocal<StringBuilder>.

  • If we are throwing away the previous "cache per measurement", why not just remove the ThreadLocal and just use StringBuilder? Are there any real benefits on using ThreadLocal without the Map?
  • You keep using StringBuilder.setLength(length) but if you check the Java source-code you will see there is no change to the internal char[] that is backing up StringBuilder. Are you sure this is required?

Reviewing again the existing code (not the one from your PR) I am wondering: what happens if I write to different databases using different Point instances but within the same thread? Will computeIfAbsent (line) return the previous object with existing data?

@fmachado
Copy link
Contributor

Reviewing again the existing code (not the one from your PR) I am wondering: what happens if I write to different databases using different Point instances but within the same thread? Will computeIfAbsent (line) return the previous object with existing data?

Answering my own question: after debugging some tests I discovered that internally StringBuilder.setLength is defining how much we can "see" from the internal char[] and we reset to the length of the measurement name (it's a tricky implementation IMO but it works). Anyway, if we have a single StringBuilder holding 1MB of data, it will be kept in memory while ThreadLocal is alive.

@rhajek
Copy link
Contributor Author

rhajek commented Sep 17, 2018

In our use case it is more than 50k+ unique measurements. In the cache is stored StringBuilder 50k times for each thread and each entry contains the whole line protocol. It cannot be GC until the thread is alive.

StringBuilder.setLength(0) hack and reusing instance performs faster then creating the new instance of StringBuilder. StringBuilder.setLength() does not reduce already allocated internal char array in StringBuilder. I found following about StringBuilder new instance / setLength in StackOverflow:
https://stackoverflow.com/questions/7168881/what-is-more-efficient-stringbuffer-new-or-delete0-sb-length

Caching of StringBuilder per each measurement is not reasonable to me, because it saves only writing several bytes (measure length) into that array. I am unable to find test scenario, when caching per measurement performs better than simple StringBuilder per thread.

@rhajek
Copy link
Contributor Author

rhajek commented Sep 17, 2018

This simple example ends with java.lang.OutOfMemoryError cca after 3500 iterations if you run java with -Xmx64m. With removed cache it performs without leaking.

package org.influxdb.tool.inch;

import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Random;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;

import org.influxdb.dto.Point;

import org.junit.jupiter.api.Test;

class TestLineProtocolOutOfMemory {

  @Test
  void testLargeNumberOfUniqueMeasurements() throws Exception {

    long iterations = 0;

    while (iterations < 1_000_000) {
      Point p = createPoint();
      iterations++;

      //generate line protocol
      p.lineProtocol();
      if (iterations % 100 == 0) {
        System.out.println(iterations +": "+ getMemoryInfo());
      }
    }
  }

  private Point createPoint() {

    int numberOfUniqueMeasurements = 10000;

    return Point.measurement("measurement_" + ThreadLocalRandom.current().nextInt( 0,numberOfUniqueMeasurements))
        .addField("field_" + randomString(2), randomString(5000))
        .addField("tag_" + randomString(2), randomString(3))
        .time(System.currentTimeMillis(), TimeUnit.MILLISECONDS).build();
  }

  private String randomString(int length) {
    byte[] array = new byte[length];
    new Random().nextBytes(array);
    return new String(Base64.getEncoder().encode(array), StandardCharsets.UTF_8);
  }

  private static String getMemoryInfo() {
    Runtime rt = Runtime.getRuntime();
    return "Memory usage: " +  (rt.totalMemory() - rt.freeMemory()) / 1024 + "KB";
  }

}

@ivankudibal ivankudibal modified the milestones: 2.13, 2.14 Sep 18, 2018
@fmachado
Copy link
Contributor

@rhajek you are using a scenario that I'm not sure it matches with any real use case: you are persisting 1mil. points in 10k different measurements and limiting the memory allocation pool to 64MB. Is this correct?

Also, did you check the commit responsible for adding the ThreadLocal<Map<String, MeasurementStringBuilder>> code? Did you understand the reasons why the developer added this code to the project? Do you agree/disagree on his/her opinion about how his code alleviates pressure on GC?

@rhajek
Copy link
Contributor Author

rhajek commented Sep 19, 2018

Our scenario is from real world, we have long living threads that writes large number (50k +) of unique measurements and during profiling we found that memory leak. I understand that caching of StringBuilder reduces internal array allocation and pressure on GC, but I don't understand what is the benefit to cache it per measurement.

Here is the commit with overoptimalization: 901bf4f

org.influxdb.PerformanceTests#testMaxWritePointsPerformance performs the same with/without that map.

@fmachado
Copy link
Contributor

@rhajek my guess based on what I found with the branch from the dev who wrote this code is that his original intention was to use an internal pool of objects.
I will add some comments to the PR so we can later merge it.

rhajek added a commit to bonitoo-io/influxdb-java that referenced this issue Sep 21, 2018
fmachado added a commit that referenced this issue Sep 25, 2018
- fixing memory leak and performance #521
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants