-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to obtain > 40 RPS after migrating to our own Parse server #2030
Comments
You might want to have a look at the parse-server logs (run with VERBOSE=1 environment variable) and also have a look at indexes and enable profiling in your mongod to pinpoint slow running queries. |
You may wanna run on multiple smaller instances |
At NodeChef we have customers performing 150+ req/sec doing complex ad-hoc queries with just two 512 MB RAM app containers before they even run into the issue you describe. Our databases run on 8 physical cores that is the equivalent of either c4.2xlarge or r3.2xlarge on AWS. We use bare metal infrastructure providing the the best performance. We also provide you with RPS stats in real-time as well so you can gauge this for yourself. We can help you get started if interested. |
RAM is definitely not an issue with parse-server, as it's much more I/O bound and CPU bound, that's what I experience on my side on GAE, We'd be able to process ~100 rps on 2 instances while maintaining CPU < 50% with n1-highcpu-2 (2vCPU, 1.7Gb of ram each). |
Thanks for all your comments and suggestions. @bohemima: We have run the server with VERBOSE=1, but have had no further insight. Thanks for your suggestion of building indexes in Mongo DB to address slow queries. We have done this previously and may need to continue doing this. However, we don't feel this is the issue, as the queries that we run during the regular course of our application have optimized indexes. @flovilmart and @Knana: Thanks for your suggestion to run on multiple smaller instances and for the performance information you have provided. In order to test the limits of a single server, we disabled autoscaling. It is good to know that you were able to achieve 150+RPS with two containers. Your suggestions are all good suggestions, however we suspect our problem is in the interaction between the Parse server and our application. Thanks in advance. |
We are using NodeChef and currently run 10 256MB app containers. No queries to mongo take more than 20ms, and most take a tiny fraction of that, and yet we do have performance issues. The problem seems to be a rather complex cloud code function which performs a query which returns 10 objects, and then performs an additional 4 queries for each of the returned objects, resolving these 40 queries in a single big This wouldn't be an issue, I don't think, if each of those additional queries was hitting mongo directly. Which would make sense to me; they're cloud code functions running within the server. But they don't, each additional query hits the app containers which, apart from introducing inefficiencies, means that even running this single cloud code function causes requests to our parse api to queue up. The end result is that this single cloud code function can take up to ten seconds to complete. I just don't get it. |
Thanks Jason for taking the time to respond. On Wednesday 6 July 2016, Jason Hutchens notifications@github.com wrote:
|
@flovilmart and @Knana: We are revisiting this issue which remains a problem for us. Thanks once again for your previous helpful suggestions and insights. I had a couple of new questions for both of you. As we know, RPS is not available from the Elastic Beanstalk Monitoring graphs - only latency, CPU utilization, network bytes in and out, network packets in and out. My questions are: Q1. How are you measuring RPS? Are you calculating it using some benchmark tests or are you inferring it based on what it was when you had pointed your app to hosted parse.com or something else altogether? Q2. Are you using primarily Cloud Code functions (/functions endpoint) or are you using the /batch endpoint or /class endpoint or a mixture of some or all of these? @jasonhutchens , this may be of interest to you: in one of our benchmark tests, using:
we saw a latency of 20-25 seconds with VERBOSE environmental variable set. When we removed the VERBOSE environment variable latency dropped an order of magnitude to between 1 and 2 seconds, all though 4 instances remained active. |
thanks @sohagfan; I'll benchmark with verbose logging disabled. although I still don't understand why Parse queries made from cloud code functions still need to be routed back through the api layer when they could bypass a lot of that? |
@jasonhutchens: sure, no problem. Hope it helps. You may be aware of this already, but in case you aren't: you have to actually delete the environmental variable for it to stop taking effect; it doesn't matter if the variable exists and its value is 1 or 0; setting the value to 0 makes no difference, it would be as though the variable is still set. |
@sohagfan no, we're running on NodeChef; I'll let them know about this |
@sohagfan see below responses to your questions
hope this helps. |
@flovilmart please let me know what tests you ran to determine you got ~100 rps with 2 n1-highcpu-2 machine? We are running tests on similar machines on AWS Elastic Beanstalk through REST API and are seeing much worse performance numbers. Should we also expect a difference in performance between Cloud Clode methods triggered using the REST API v clients (through Parse iOS and Android SDKs)? Also, what trigger did you setup for autoscaling Latency/CPU/Networkout? @drew-gross @hramos Happy to hear what others are doing as well. We are stuck with our load testing at the moment and it's preventing us from moving to production. Let me know what tests you recommend for load testing our dev environment so that we can feel comfortable moving to production. Most of our load comes from clients and we are not sure how to simulate this due to the latency issues we have noticed when running benchmark tests through the REST API. |
For now there should be no difference between cloud code and the client SDK's as all cloud code request go though the HTTP interface. There is a pull request that attempts to run cloud code with direct access to the JS interface instead of the HTTP one. |
@flovilmart I'd be interested in how you got to your 100+ RPS number as well. In my tests using a single n1-highcpu-2 on GCE and testing with Locust, I can get between 20-30 RPS before the CPUs peg. I didn't try 2 but suspect I would only get around double what I'd get with a single instance. For reference, my load tests using Locust via REST against a clustered Parse instance(PM2: https://nodejs.org/api/cluster.html#cluster_cluster and http://pm2.keymetrics.io/docs/usage/cluster-mode/): f1-micro-1: ~5 RPS @ ~40ms response time |
@reasonman we don't use PM2 nor cluster, I spawned 10 AWS instances, and queried random objects from our DB. In general we see the request times below 30ms with stack driver. Since then, we changed our setup to cap at 50rps and the CPU is still below 35%. Note that the DB is in the same zone as the servers. |
@flovilmart I'm using Parse server on nodechef and definitely have problems with the SDK routing everything through the HTTP interface when running cloud code functions. Right now this blocks us from moving our production apps from Parse to Parse Server. I spun up a simple nodechef Parse instance with one app server to demonstrate the issue. I deployed the following
I called I then called So routing queries through the HTTP interface is causing requests to queue up, basically serialising what should be parallel operations. What's concerning to me is that many of our production cloud code functions perform many more than 10 queries when called. My expectation is that calling Now, having said all that, I suspect Parse also suffered from the same issue. I'm just curious to know what it would take to have the SDK talk directly to Mongo from cloud code to avoid having functions spawn requests that queue up at the HTTP interface? I'm hopeful this would be a performance win, and would avoid the problem of a single function call blocking access to our entire cluster (which is what happens at the moment with our staging apps, which use 10 app servers, and which have cloud code functions that may fire of 100 queries, causing all 10 app servers to do work to serve a single request). |
There is a PR for that #2316 |
@flovilmart cheers, I'll test that on our nodechef instance (with help from the team there) in the hope that this will improve our numbers. thanks! |
Actually, there are some things to fix in that Pr before you can roll it out confidently |
@flovilmart yes, understood... I just meant we'd do some testing, and I can confirm that the |
Our setup is as follows:
We are getting an average of only around 25 RPS and a peak of 40 RPS. When we exceed the peak, we see high latency and connection dropped errors in the logs.
An example error from the nginx log is as follows:
"2016/06/09 06:46:34 [error] 2684#0: *254 upstream prematurely closed connection while reading response header from upstream, client: , server: , request: "GET /1/classes/. . . host: “www.example.com""
With the same application, on the hosted Parse.com, we were able to scale to as high as required. We were able to request 70+ RPS successfully without any requests dropped.
Are there any configuration changes in any of the setup mentioned above (EB, Node.js, nginx, Parse Server, Mongo DB driver, mLab) or some other that we have not mentioned or missed to get a better performance?
If you have a better performance, what is your setup?
Any pointers / comments will be much appreciated.
The text was updated successfully, but these errors were encountered: