Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BatchProcessor can write points to the wrong retention policy. #162

Closed
larrywalker opened this issue Apr 17, 2016 · 6 comments
Closed

BatchProcessor can write points to the wrong retention policy. #162

larrywalker opened this issue Apr 17, 2016 · 6 comments

Comments

@larrywalker
Copy link

We are writing 2 events to 2 different tables at the same time each with their own retention policy (RP)

With batching enabled, if you are writing to measurements with different RP, one of the measurements can sometimes be written with the wrong RP. Then a query of the data with the correct RP makes it seem as though the measurement did not get written.

If stepping through the code, it slows down the writes enough that the events get put into different batches and therefore they get put in the correct measurements with the correct RP.

ONLY ONE EVENT STORED (EVENTS BATCHED INTO 1 RP)
[http] 2016/04/17 05:58:58 127.0.0.1 - root [17/Apr/2016:05:58:58 -0400] GET /query?db=dev&epoch=ms&p=%5BREDACTED%5D&q=SELECT+%2A+FROM+evttest_rp.evttest+WHERE+deveui%3D%2700250C0100000300%27+AND+time+%3E+now%28%29+-+86400s&u=root HTTP/1.1 200 2933 - okhttp/2.7.5 01037114-0483-11e6-910c-000000000000 11.485795ms
[wal] 2016/04/17 05:59:03 Flush due to idle. Flushing 48 series with 48 points and 1968 bytes from partition 1
[wal] 2016/04/17 05:59:03 write to index of partition 1 took 6.381651ms
[http] 2016/04/17 05:59:03 127.0.0.1 - root [17/Apr/2016:05:59:03 -0400] POST /write?consistency=one&db=dev&p=%5BREDACTED%5D&precision=n&rp=evtsys_rp&u=root HTTP/1.1 204 0 - okhttp/2.7.5 041a23d6-0483-11e6-910d-000000000000 1.456205ms
[wal] 2016/04/17 05:59:09 Flush due to idle. Flushing 2 series with 2 points and 290 bytes from partition 1
[wal] 2016/04/17 05:59:09 write to index of partition 1 took 1.16645ms

BOTH EVENTS STORED CORRECT – DIFFERENT RP - Stepping through code
[wal] 2016/04/17 06:03:13 write to index of partition 1 took 6.276625ms
[wal] 2016/04/17 06:03:23 Flush due to idle. Flushing 48 series with 48 points and 1968 bytes from partition 1
[wal] 2016/04/17 06:03:23 write to index of partition 1 took 6.882536ms
[http] 2016/04/17 06:03:29 127.0.0.1 - root [17/Apr/2016:06:03:29 -0400] POST /write?consistency=one&db=dev&p=%5BREDACTED%5D&precision=n&rp=evtsys_rp&u=root HTTP/1.1 204 0 - okhttp/2.7.5 a2a56255-0483-11e6-9119-000000000000 1.397597ms
[http] 2016/04/17 06:03:31 127.0.0.1 - root [17/Apr/2016:06:03:31 -0400] POST /write?consistency=one&db=dev&p=%5BREDACTED%5D&precision=n&rp=evttest_rp&u=root HTTP/1.1 204 0 - okhttp/2.7.5 a3b7eaaa-0483-11e6-911a-000000000000 1.512795ms
[wal] 2016/04/17 06:03:33 Flush due to idle. Flushing 48 series with 48 points and 1968 bytes from partition 1
[wal] 2016/04/17 06:03:33 write to index of partition 1 took 7.765287ms
[wal] 2016/04/17 06:03:34 Flush due to idle. Flushing 1 series with 1 points and 69 bytes from partition 1
[wal] 2016/04/17 06:03:34 write to index of partition 1 took 1.112102ms
[query] 2016/04/17 06:03:35 SELECT * FROM dev.evttest_rp.evttest WHERE deveui = '00250C0100000300' AND time > now() - 1d
[http] 2016/04/17 06:03:35 127.0.0.1 - root [17/Apr/2016:06:03:35 -0400] GET /query?db=dev&epoch=ms&p=%5BREDACTED%5D&q=SELECT+%2A+FROM+evttest_rp.evttest+WHERE+deveui%3D%2700250C0100000300%27+AND+time+%3E+now%28%29+-+86400s&u=root HTTP/1.1 200 2949 - okhttp/2.7.5 a5f59547-0483-11e6-911b-000000000000 6.442473ms
[wal] 2016/04/17 06:03:36 Flush due to idle. Flushing 1 series with 1 points and 221 bytes from partition 1
[wal] 2016/04/17 06:03:36 write to index of partition 1 took 1.298728ms

If think this code needs only batch points with the same retention policy???
Map<String, BatchPoints> databaseToBatchPoints = Maps.newHashMap();
List batchEntries = new ArrayList<>(this.queue.size());
this.queue.drainTo(batchEntries);

for (BatchEntry batchEntry : batchEntries) {
String dbName = batchEntry.getDb();
if (!databaseToBatchPoints.containsKey(dbName)) {
BatchPoints batchPoints = BatchPoints.database(dbName).retentionPolicy(batchEntry.getRp()).build();
databaseToBatchPoints.put(dbName, batchPoints);
}
Point point = batchEntry.getPoint();
databaseToBatchPoints.get(dbName).point(point);
}

for (BatchPoints batchPoints : databaseToBatchPoints.values()) {
BatchProcessor.this.influxDB.write(batchPoints);
}
} catch (Throwable t) {
// any exception would stop the scheduler
logger.log(Level.SEVERE, "Batch could not be sent. Data will be lost", t);
}

@andrewdodd
Copy link
Contributor

I think I know about this issue. I am going to merge the #108 PR soon which should hopefully fix this issue (though it may introduce a number more).

@larrywalker
Copy link
Author

We are interested on how this goes. Now that I understand the problem a bit more, it is likely that we have data going into the abyss quite frequently.

@andrewdodd
Copy link
Contributor

andrewdodd commented Apr 18, 2016

Yes, sending data to the abyss is something the current BatchProcessor does on the sly. Would you be willing to:
a) Test the changes from the #108 PR branch?
b) Test the master branch soon after I merge the #108 PR into the master branch?

@larrywalker
Copy link
Author

I could give the jar from the master branch a few cycles on my machine, but to really stress it in our main environment, I would need a release.  v2.3

On Monday, April 18, 2016 12:09 PM, Andrew Dodd <notifications@github.com> wrote:

Yes, it is something the current BatchProcessor does on the sly. Would you be willing to:
a) Test the changes from the #108 PR branch?
b) Test the master branch soon after I merge the #108 PR into the master branch?—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub

@larrywalker
Copy link
Author

larrywalker commented Jul 8, 2016

We have worked around this temporarily by creating an org.influxdb.InfluxDB object/connection for each measurement. This way influxDB cannot incorrectly choose a retention policy for a different measurement when building the batched write.

Even though this is a workaround, it would be nice to not need a separate object/connection to the db for each measurement.

@jiafu1115
Copy link
Contributor

jiafu1115 commented Nov 10, 2016

@larrywalker @majst01 I also found another issue similar with this one and I will fix this issue after #239 due to I want to avoid too much code conflict. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants