-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write performance #105
Comments
Do you get the same performance drop off if you don't use an index? If there is no index handling in the insert, then the performance should be the exact same as encoding time + normal bolt insert time. |
much flatter without index: So, I guess with timeseries data, you don't really want to use an index because the index is huge. Another way to do this might be to put each sample type in its own bucket. Or, there may be a more efficient way to implement an index -- perhaps a separate bucket for each sample Type, and each index is a separate record in the bucket -- then adding records would be fast? Databases are fun to think about -- lots of tradeoffs to be made. Thanks for the help! |
You should still be able to use indexes on time series data, but what I'm guessing is happening is that your index on "tag" might not be very unique. It's usually a good idea to have fairly unique values in indexes, however in a regular database it shouldn't impact performance that drastically during inserts. However, what I do with indexes in bolthold is a pretty naive implementation. I simply store then entire index under one key value, so the less unique the index, the more and more gets stored (and thus decoded, and encoded) on each insert. I'm guessing that's what's happening with your scenario here. I can make my index handling more like a "real" database, and split the values across multiple keys, but it'll take quite a bit of reworking. I'll open an issue for that. I appreciate you bringing this up. |
yes, I'm using a small # of Types relative to the # of samples -- maybe 6 or so, so they are not very unique. |
one more note -- without an index, and with 500,000 samples in DB, the insert time is still ~50ms/sample -- this is great -- means I can use bolthold to record about any amount of timeseries data on this device. Currently using around 715 bytes/sample -- would like to experiment with protobuf to see if that would be faster/more efficient. |
Your discussion helped me a lot. |
Having many indexes will definitely impact performance of inserts and updates, because those indexes will need to be maintained on every insert and update. I wouldn't recommend putting an index on a date/time if you can help it. Go If you have |
One problem I ran into using the Go Time type as a key is the gob encoded data of Go Time is not always monotonic with time, so seeks to a date would not always work. When I converted a time stamps to int64, and inserted bytes into key in big-endian format, seeks were then very fast and reliable. I may be missing something though, but it seems since Go Time is a struct, the encoded data for it will likely not always be monotonic. |
@timshannon Thank you for your advice.
@cbrake Thanks. I wil l try to use int64 (unix time) |
I've been using bolthold on an embedded Linux system (eMMC storage). I'm noticing that as the DB grows, the write performance falls off linearly.
I'm using an increasing timestamp for the key, so I would think that would be sequential VS random access.
Bellow is the insert code:
Once I get to 100,000 samples or so, the performance is really slow (2+ seconds to insert a sample). I'm thinking something is not quite right, as I read about people using multi TB bolt databases, but it seems with my use case, there is no way this could work.
I tried setting FreelistType to FreelistMapType -- that did not seem to make any difference.
Appreciate any thoughts is this normal, or can this be optimized.
Cliff
The text was updated successfully, but these errors were encountered: