-
Notifications
You must be signed in to change notification settings - Fork 9
CosmosDB / DocDB storage support #30
Comments
Hi, sorry been away. Would you mind sharing where you saw 20K RPS for EventHub? AFAIK Microsoft never really defined an official limit although I have heard of stories where they had issues with scale. UpdateHuh, found it - small print in here https://azure.microsoft.com/en-gb/pricing/details/event-hubs/ When we first used EH there was no such thing. Update 2Contacted Azure friends. With Dedicated tier you can go up to 2 million events per second:
From https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-dedicated-overview |
@aliostad glad to see there's a new tier for Event Hubs. I think it would be worth documenting these limits / options for Zipkin. Unfortunately not sure how to keep them up to date as there seems to be no standard way (e.g. API) to get the units, limits and pricing. Also this is an interesting detail about dedicated pricing:
It seems to be a big jump to over $250K / year. |
@clehene hi. Running at that level will not be cheaper with DocumentDB. I have done some PoC on extreme load and frankly the cost will be higher since it does more. Here is some details: The test that we were doing was with 4KB docs. Storing each one used by ~70RU. So 200K RU will cost £108K a year while giving you enough RU to store mere 3K events per second! (assuming 4KB each) Also on the read side you will start having problems with one-by-one read and delete and you do not have checkpointing that comes free with EventHub. So in short, CosmosDB - when it will have table support - will be good but right now only works if you use it with Azure Search (there is an option to store docs and have Azure Search index them). I can work on this if enough people interested but it seems we already have a working version? |
Side note: I would guess that the topic of using docdb as storage/query is
a different topic than if eventhub is used as a transport, right? Does it
make sense to discuss continue transport discussion here on this issue or
would it be clearer as a different issue?
|
@adriancole the main reason of discussing EH here, is in the context of the original question of whether it's worth having it in front of Cosmos DB. That said, @aliostad points related to CosmosDB are valid and relevant within the CosmosDB discussion and it would be worth expanding on a few topics like what would the ideal data model for Zipkin data in CosmosDB be from a size and query capability perspective? |
@clehene ok I think I understand. yeah for example there are folks who use storage directly instead of having a separate transport (I've heard of this used for both elasticsearch and also cassandra although it isn't common practice). There are impacts to how you'd design the data model if you think people would be doing this, and yeah there'd be no way for zipkin to prevent people from skipping a separate transport if they wanted to. |
Creating this as a child of #8
@praveenbarli has something working.
Some things to consider and ideally discuss before an implementation.
Zipkin backend model and queries. There's no formal specification but @adriancole has mentioned
As CosmosDB has multiple APIs (key-value, document and graph) it would be interesting to know what makes most sense for this backend and have a discussion on the model. Ideally we'd be able to make sure it's cost efficient at a good performance .
See pricing https://azure.microsoft.com/en-us/pricing/details/cosmos-db/
Perhaps data retention could also be discussed? What's typically used?
Note that if Event Hubs is used as a queue before data lands in storage, that will impose some limits on the throughput (EH is limited to 20K rps).
Since Cosmos DB can handle way more than that, perhaps it would make sense to be able to push directly without going through EH?
The text was updated successfully, but these errors were encountered: