-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design/status/roadmap on the clustered version? #9
Comments
The technical paper describes both our single-node design as well as the clustered design. But if something is unclear, happy to answer any questions. We haven't announced a timeline for the open-source clustered version yet, but it's on the order of months. |
@akulkarni Hi guys, I think you are quite busy, but do we have any news on scaling out timescaledb accross multiple nodes? In general the idea is if I can do something like with Cassandra, when the load is increasing, I just "plugin" into cluster new nodes, sync and go. If selected I plan to use the timescaledb in Mesos & Docker environment... |
Hi @archenroot, Are you worried about ingest load or query load? If ingest load, our single node instance regularly handles 100k inserts/sec (400k inserts/sec under certain configurations). (Which according to our benchmarks is equivalent to several Cassandra nodes.) But let me know if you need more than that. If query load, then we support read-clustering using PG streaming replication. Then you would round-robin your queries across the various nodes to increase query throughput. |
@akulkarni So, lets imagine I am doing IoT project, where the ultimate sensor data (including images from cameras) will be in volume of unique records at ~60,000,000 per day (I count the target of course). Writes: Writes Question: Reads: Multitenancy: But Multitenancy is out of topic question and there is nice discussion here: In our case it more looks we have dozen of tenants which has relativelly small amount of data (gross numbers):
Offtopic - can I combine timescaledb with CONTINUOUS VIEWS from pipeline DB? Last note, I expect none or minimum UPDATE statements in the system. |
Hi @archenroot, Apologies for the delay in responding. Reopening this issue so it doesn't get lost again. Responses: 1. Storage capacity - Two things worth mentioning here: First, we support data retention policies that would allow you to aggregate data into lower granularities and age out (ie delete) the raw data. We also support moderate elastic scaling via network-attached disks. 2. Writes - We currently do not support horizontal scale out for writes. But according to my calculations, you are doing <1000 inserts/sec (60,000,000/day), so our single node performance should be more than enough. (If instead you are looking for clustering for redundancy, then we would suggest setting up a read slave for failover.) 3. Reads - Yes, the read/slave instances would have to be also TimescaleDB. 4. Multi-tenancy - We have not yet built any native functionality for multi-tenancy, outside of what may already be available in PostgreSQL. Currently our users who require this have been handling themselves at the application layer. There may however be multi-tenant add-on options for PostgreSQL, but we haven’t been able to research those yet. 5. PipelineDB - I’m not sure if you’re suggesting to make the read slave just PipelineDB, or TimescaleDB + PipelineDB. If the former, then I don’t know how it would handle time-series data - we have limited experience. If the latter, then it is possible it may work, but again needs to be tested. (But if you do try out yourself, please let us know how it goes.) |
Hi @akulkarni, And thanks for the wonderful work with Timescale 👍 |
If you are aware of architecture like Mesos or DCOS, uzually you are running either storage or services and monitor how they are used, when you reach metrics treshold, you just deploy one or more instances and the system autoreconfigure cluster on the fly. Same happens when the load is falling down, you remove instances, as they are not needed anymore. That is the beauty of autoscalable systems. |
@jrevault One interesting aspect of even timescale's design today is that you can elastically add disks to a hypertable, so the storage capacity can scale with your storage needs. We've seen this applied particularly effectively with network-attached storage (e.g., EBS, Azure Storage) in the cloud. @archenroot Yep, quite familiar with both of those cluster-management services. That said, scaling up/down a storage system like Timescale is quite a bit different than a stateless, shared-nothing service like your web front-end or API service, as membership changes necessitate application-level support to ensure proper consistency in the stateful storage tier. :) |
@archenroot and @mfreed (sorry for this late answer but we were going live this WE) Our problem isn't adding disk space, it's a more a memory and CPU one ( well at least it's my understanding of it) |
Thanks for interesting ideas! I am evaluating TimescaleDB for a fairly large data collection project. I expect a fairly moderate data inflow rate, maybe 20 000 entries per second. But we want to retain the historical data to the extend possible. 20 000 entires per year accumulate quickly, we expect about 100 Tb of data per year. @akulkarni mentioned something very relevant in the post of Oct 24, 2017: TimescaleDB supports network-allocated storage, which could of course be scaled up to several petabytes. Judging from the TimescaleDB API[1], storing some chunks on a NAS is doable. But is it possible to makeTimescaleDB differentiate between the tablespaces? Can TimescaleDB be configured to create new chunks on the local, RAID storage, and move them to NAS when the data gets older/colder? Thanks! [1] Hypertable management, attach_tablespace() https://docs.timescale.com/v0.9/api#hypertable-management |
Hi @akulkarni @mfreed
Thanks |
@murugesan70 - I am not sure, but looks like VoltDB can support partition horizontal split of tables. Take a look. |
Hi everyone! I'm the PM heading up clustering. I'm looking for a couple people to help us provide feedback as we move forward with this. If you'd like to be considered as part of that (we can only take a couple people), please let me know! I'm available in our Slack community as Diana Hsieh, or you can ping me at diana at timescale.com. |
Hello~~ Public facing documentation is out for our private beta of clustering (which is currently only in limited release). Please go ahead and take a look! https://docs.timescale.com/clustering/introduction/architecture#timescaledb-clustering and |
Whenever paths are added to a rel, that path or another path that previously was on the rel can be freed. Previously, the compressed rel's paths could be freed when it was re-planned by the postgres planner after being created and planned by us. The new path the postgres planner added was cheaper and overwrote and pfreed the old path which we created and saved as a child path of the decompress node. Thus we ended up with a dangling reference to a pfreed path. This solution prevents this bug by removing the path we create from the compressed rel. Thus, the chunk rel now "owns" the path. Note that this does not prevent the compressed rel from being replanned and thus some throw-away planner work is still happening. But that's a battle for another day. The backtrace for the core planner overwriting our path is: ``` frame timescale#4: 0x0000000105c4ed0f postgres`pfree(pointer=0x00007fe20d01a628) at mcxt.c:1035 * frame timescale#5: 0x000000010594c998 postgres`add_partial_path(parent_rel=0x00007fe20d01ae10, new_path=0x00007fe20f800298) at pathnode.c:844 frame timescale#6: 0x00000001058ede4b postgres`create_plain_partial_paths(root=0x00007fe2113fc668, rel=0x00007fe20d01ae10) at allpaths.c:753 frame timescale#7: 0x00000001058edb93 postgres`set_plain_rel_pathlist(root=0x00007fe2113fc668, rel=0x00007fe20d01ae10, rte=0x00007fe20d0198c0) at allpaths.c:727 frame timescale#8: 0x00000001058ed78b postgres`set_rel_pathlist(root=0x00007fe2113fc668, rel=0x00007fe20d01ae10, rti=13, rte=0x00007fe20d0198c0) at allpaths.c:452 frame timescale#9: 0x00000001058e8e16 postgres`set_base_rel_pathlists(root=0x00007fe2113fc668) at allpaths.c:310 frame timescale#10: 0x00000001058e8b49 postgres`make_one_rel(root=0x00007fe2113fc668, joinlist=0x00007fe20d0121c8) at allpaths.c:180 frame timescale#11: 0x000000010591ee77 postgres`query_planner(root=0x00007fe2113fc668, tlist=0x00007fe2113fcb58, qp_callback=(postgres`standard_qp_callback at planner.c:3492), qp_extra=0x00007ffeea6ba2b8) at planmain.c:265 frame timescale#12: 0x00000001059229cb postgres`grouping_planner(root=0x00007fe2113fc668, inheritance_update=false, tuple_fraction=0) at planner.c:1942 frame timescale#13: 0x0000000105920546 postgres`subquery_planner(glob=0x00007fe218000328, parse=0x00007fe218000858, parent_root=0x0000000000000000, hasRecursion=false, tuple_fraction=0) at planner.c:966 frame timescale#14: 0x000000010591f1e7 postgres`standard_planner(parse=0x00007fe218000858, cursorOptions=256, boundParams=0x0000000000000000) at planner.c:405 frame timescale#15: 0x000000010642d9b4 timescaledb-1.5.0-dev.so`timescaledb_planner(parse=0x00007fe218000858, cursor_opts=256, bound_params=0x0000000000000000) at planner.c:152 ```
Whenever paths are added to a rel, that path or another path that previously was on the rel can be freed. Previously, the compressed rel's paths could be freed when it was re-planned by the postgres planner after being created and planned by us. The new path the postgres planner added was cheaper and overwrote and pfreed the old path which we created and saved as a child path of the decompress node. Thus we ended up with a dangling reference to a pfreed path. This solution prevents this bug by removing the path we create from the compressed rel. Thus, the chunk rel now "owns" the path. Note that this does not prevent the compressed rel from being replanned and thus some throw-away planner work is still happening. But that's a battle for another day. The backtrace for the core planner overwriting our path is: ``` frame #4: 0x0000000105c4ed0f postgres`pfree(pointer=0x00007fe20d01a628) at mcxt.c:1035 * frame #5: 0x000000010594c998 postgres`add_partial_path(parent_rel=0x00007fe20d01ae10, new_path=0x00007fe20f800298) at pathnode.c:844 frame #6: 0x00000001058ede4b postgres`create_plain_partial_paths(root=0x00007fe2113fc668, rel=0x00007fe20d01ae10) at allpaths.c:753 frame #7: 0x00000001058edb93 postgres`set_plain_rel_pathlist(root=0x00007fe2113fc668, rel=0x00007fe20d01ae10, rte=0x00007fe20d0198c0) at allpaths.c:727 frame #8: 0x00000001058ed78b postgres`set_rel_pathlist(root=0x00007fe2113fc668, rel=0x00007fe20d01ae10, rti=13, rte=0x00007fe20d0198c0) at allpaths.c:452 frame #9: 0x00000001058e8e16 postgres`set_base_rel_pathlists(root=0x00007fe2113fc668) at allpaths.c:310 frame #10: 0x00000001058e8b49 postgres`make_one_rel(root=0x00007fe2113fc668, joinlist=0x00007fe20d0121c8) at allpaths.c:180 frame #11: 0x000000010591ee77 postgres`query_planner(root=0x00007fe2113fc668, tlist=0x00007fe2113fcb58, qp_callback=(postgres`standard_qp_callback at planner.c:3492), qp_extra=0x00007ffeea6ba2b8) at planmain.c:265 frame #12: 0x00000001059229cb postgres`grouping_planner(root=0x00007fe2113fc668, inheritance_update=false, tuple_fraction=0) at planner.c:1942 frame #13: 0x0000000105920546 postgres`subquery_planner(glob=0x00007fe218000328, parse=0x00007fe218000858, parent_root=0x0000000000000000, hasRecursion=false, tuple_fraction=0) at planner.c:966 frame #14: 0x000000010591f1e7 postgres`standard_planner(parse=0x00007fe218000858, cursorOptions=256, boundParams=0x0000000000000000) at planner.c:405 frame #15: 0x000000010642d9b4 timescaledb-1.5.0-dev.so`timescaledb_planner(parse=0x00007fe218000858, cursor_opts=256, bound_params=0x0000000000000000) at planner.c:152 ```
Whenever paths are added to a rel, that path or another path that previously was on the rel can be freed. Previously, the compressed rel's paths could be freed when it was re-planned by the postgres planner after being created and planned by us. The new path the postgres planner added was cheaper and overwrote and pfreed the old path which we created and saved as a child path of the decompress node. Thus we ended up with a dangling reference to a pfreed path. This solution prevents this bug by removing the path we create from the compressed rel. Thus, the chunk rel now "owns" the path. Note that this does not prevent the compressed rel from being replanned and thus some throw-away planner work is still happening. But that's a battle for another day. The backtrace for the core planner overwriting our path is: ``` frame #4: 0x0000000105c4ed0f postgres`pfree(pointer=0x00007fe20d01a628) at mcxt.c:1035 * frame #5: 0x000000010594c998 postgres`add_partial_path(parent_rel=0x00007fe20d01ae10, new_path=0x00007fe20f800298) at pathnode.c:844 frame #6: 0x00000001058ede4b postgres`create_plain_partial_paths(root=0x00007fe2113fc668, rel=0x00007fe20d01ae10) at allpaths.c:753 frame #7: 0x00000001058edb93 postgres`set_plain_rel_pathlist(root=0x00007fe2113fc668, rel=0x00007fe20d01ae10, rte=0x00007fe20d0198c0) at allpaths.c:727 frame #8: 0x00000001058ed78b postgres`set_rel_pathlist(root=0x00007fe2113fc668, rel=0x00007fe20d01ae10, rti=13, rte=0x00007fe20d0198c0) at allpaths.c:452 frame #9: 0x00000001058e8e16 postgres`set_base_rel_pathlists(root=0x00007fe2113fc668) at allpaths.c:310 frame #10: 0x00000001058e8b49 postgres`make_one_rel(root=0x00007fe2113fc668, joinlist=0x00007fe20d0121c8) at allpaths.c:180 frame #11: 0x000000010591ee77 postgres`query_planner(root=0x00007fe2113fc668, tlist=0x00007fe2113fcb58, qp_callback=(postgres`standard_qp_callback at planner.c:3492), qp_extra=0x00007ffeea6ba2b8) at planmain.c:265 frame #12: 0x00000001059229cb postgres`grouping_planner(root=0x00007fe2113fc668, inheritance_update=false, tuple_fraction=0) at planner.c:1942 frame #13: 0x0000000105920546 postgres`subquery_planner(glob=0x00007fe218000328, parse=0x00007fe218000858, parent_root=0x0000000000000000, hasRecursion=false, tuple_fraction=0) at planner.c:966 frame #14: 0x000000010591f1e7 postgres`standard_planner(parse=0x00007fe218000858, cursorOptions=256, boundParams=0x0000000000000000) at planner.c:405 frame #15: 0x000000010642d9b4 timescaledb-1.5.0-dev.so`timescaledb_planner(parse=0x00007fe218000858, cursor_opts=256, bound_params=0x0000000000000000) at planner.c:152 ```
Whenever paths are added to a rel, that path or another path that previously was on the rel can be freed. Previously, the compressed rel's paths could be freed when it was re-planned by the postgres planner after being created and planned by us. The new path the postgres planner added was cheaper and overwrote and pfreed the old path which we created and saved as a child path of the decompress node. Thus we ended up with a dangling reference to a pfreed path. This solution prevents this bug by removing the path we create from the compressed rel. Thus, the chunk rel now "owns" the path. Note that this does not prevent the compressed rel from being replanned and thus some throw-away planner work is still happening. But that's a battle for another day. The backtrace for the core planner overwriting our path is: ``` frame #4: 0x0000000105c4ed0f postgres`pfree(pointer=0x00007fe20d01a628) at mcxt.c:1035 * frame #5: 0x000000010594c998 postgres`add_partial_path(parent_rel=0x00007fe20d01ae10, new_path=0x00007fe20f800298) at pathnode.c:844 frame #6: 0x00000001058ede4b postgres`create_plain_partial_paths(root=0x00007fe2113fc668, rel=0x00007fe20d01ae10) at allpaths.c:753 frame #7: 0x00000001058edb93 postgres`set_plain_rel_pathlist(root=0x00007fe2113fc668, rel=0x00007fe20d01ae10, rte=0x00007fe20d0198c0) at allpaths.c:727 frame #8: 0x00000001058ed78b postgres`set_rel_pathlist(root=0x00007fe2113fc668, rel=0x00007fe20d01ae10, rti=13, rte=0x00007fe20d0198c0) at allpaths.c:452 frame #9: 0x00000001058e8e16 postgres`set_base_rel_pathlists(root=0x00007fe2113fc668) at allpaths.c:310 frame #10: 0x00000001058e8b49 postgres`make_one_rel(root=0x00007fe2113fc668, joinlist=0x00007fe20d0121c8) at allpaths.c:180 frame #11: 0x000000010591ee77 postgres`query_planner(root=0x00007fe2113fc668, tlist=0x00007fe2113fcb58, qp_callback=(postgres`standard_qp_callback at planner.c:3492), qp_extra=0x00007ffeea6ba2b8) at planmain.c:265 frame #12: 0x00000001059229cb postgres`grouping_planner(root=0x00007fe2113fc668, inheritance_update=false, tuple_fraction=0) at planner.c:1942 frame #13: 0x0000000105920546 postgres`subquery_planner(glob=0x00007fe218000328, parse=0x00007fe218000858, parent_root=0x0000000000000000, hasRecursion=false, tuple_fraction=0) at planner.c:966 frame #14: 0x000000010591f1e7 postgres`standard_planner(parse=0x00007fe218000858, cursorOptions=256, boundParams=0x0000000000000000) at planner.c:405 frame #15: 0x000000010642d9b4 timescaledb-1.5.0-dev.so`timescaledb_planner(parse=0x00007fe218000858, cursor_opts=256, bound_params=0x0000000000000000) at planner.c:152 ```
It is mentioned in the FAQ that the clustered version is in active development. Can you share more details on the design, status, or roadmap? The technical paper doesn't seem to shed much light on this to me. Am I missing anything?
Anyway, great job TimescaleDB!
The text was updated successfully, but these errors were encountered: