-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarking Runs Forever [Question] #618
Comments
Which workload is this happening in? Which JVM are you using? Which OS? The biggest problem I've had when trying to use time bound is the lag when waiting for workload d to handle its set up. |
Workload: Core Workload - I have customized this to benchmark Web Services but have not touched the part that deals with any threads (Worker, Client or Terminator). |
Customized the code or customized the workload configuration file? |
Customized the CoreWorkload.java file since YCSB doesn't support benchmarking of Web Services and I needed many other runtime configurable parameters. I have that code uploaded on GitHub here: https://github.com/shivam-maharshi/YCSB4WebServices/blob/master/src/main/java/com/yahoo/ycsb/workloads/CoreWorkload.java. This is not a clean code and is only required for my specific benchmarking purpose. |
Just to set expectations: it's going o be difficult to figure out if the issue is in YCSB or your changes (more so because it looks like you started with the base YCSB+some changes rather than forking the repo). Could you explain what the fundamental issue is with i.e. writing a datastore binding for your web service? That would make it easier to isolate issues in the core framework from the changes specific to your use case. I'll try to get a sense of the code this evening. |
Yes you are right. I had a conversation with Andy Kruth and I've already forked YCSB and started to create a Rest Client binding module. That work is under progress. The reason I started to not write a binding previously was because writing it cleanly was gonna take a little more time, which I unfortunately did not have. With the current framework I couldn't have done these, which I required.
Hence I decided to go with this approach, even when I knew it was wrong. Long story short, if you can get sense of the code, it would be great. However I feel that this is not introduced due to changes in CoreWorkload, since it does not manipulate threads in any way. I am hoping to create a pull request for the REST client I am working on this week. If we run into the same issue with that module as well, then it would be easier to deal with it. Thanks for your response! |
@shivam-maharshi - Can you attach the command you are running, the workload configuration file you are using and also the output? These might help figure out context of the issue you are seeing. |
@shivam-maharshi - did you solve your issue? |
@kruthar - I resolved this issue for myself by setting a maximum timeout for individual requests, in CRUD operations, in the client-binding. (Implementation was simply running a timer thread in parallel to stop the operation if it exceeded the given time limit.) Coming back to the main point. YCSB benchmarking can run forever even if "maxexceutiontime" property is mentioned in the workload file. This is because Terminator Thread only tries to finish the benchmarking once "maxexceutiontime" has passed but does not guarantee it. Since it only joins the worker threads (benchmarking threads) to wind up the benchmarking and not interrupt/kill them. Hence for the scenario where a worker thread gets stuck - waiting to receive response/next bytes from the server side and never gets response from it, the terminator thread will indefinitely try to join the worker thread but will never be successful. This can happen when the DB Server does not respond to a request for long time. Why hasn't it been reported so far? Once you've read this please let me know. I will close this item since it is not an issue, it is a design decision taken by YCSB. |
should we be sending an interrupt to the client threads? |
That can be done but it can have an implication on the benchmarking results if not handled properly. For example if 10 client threads have just sent out some requests and "maxexecutiontime" has been reached. Now if we decide to interrupt those client threads, then the question arises that how should that be handled? Should the operations for those interrupted client threads be considered as fail or pass? IMO it would make sense to not count those operations in the benchmarking results at all. |
Okay, I'm fine closing this as-is. Would probably be worth mentioning the important of driver timeouts when we get around to making a datastore binding contribution guide. |
I just ran into this with the accumulo client while testing 0.8.0-RC3 (#678). Due to a cluster misconfiguration I filled HDFS during a load phase. Since I had client-side buffering on for the accumulo client, the terminator thread waited for shutdown to complete, which waited for the buffered writes to flush, which happily waited for Accumulo to recover. + ycsb-accumulo-binding-0.8.0-RC3/bin/ycsb load accumulo -P ycsb-accumulo-binding-0.8.0-RC3/workloads/workloade -p table=ycsb_workloade -cp /etc/accumulo/conf -p accumulo.columnFamily=family -p accumulo.instanceName=accumulo -p accumulo.zooKeepers=YYYYY -p accumulo.username=ycsb -p accumulo.password=XXXXXX -s -p maxexecutiontime=1200 -threads 30 -jvm-args=-Xmx8192m -p recordcount=2147483647 -p insertstart=0 -p insertcount=429496729 -p exportfile=/root/ycsb-load_workloade-accumulo-test-1.gce.cloudera.com-measurements.json -p exporter=com.yahoo.ycsb.measurements.exporter.JSONArrayMeasurementsExporter
real 469m43.223s
user 8m17.917s
sys 5m9.610s Once I saw the cluste's status I fixed things, which then allowed Accumulo to recover, which then allowed the final writes to flush, which finally allowed the thread to complete and the client to exit. Bit over my target limit of 20 minutes though. ;) |
i meet the same problem when i test cassandra using ycsb,how to solve?please |
same problem with SQL mysql/mariadb - YCSB 0.17.0 |
I ran YCSB with both parameters - operationcounts & maxexecutiontime. Now according to the documentation, the benchmarking should stop whichever occurs earlier. However my benchmarking never stops and I receive this on my prompt periodically - "Still waiting for thread Thread 1 to complete. Workload status: true". I see the code of the TerminatorThread which tries to Join the benchmarking threads after waiting for maxexecutiontime period. However if it is unable to join it in first go, then it keeps on retrying after every 2 second. Which is why my benchmarking runs forever.
I am creating a Step Benchmarking and hence I need to run one workload for strictly 10 minutes and then start another workload. Is there any way I can make sure the benchmarking stops withing say 10 minutes + 30 second (padded stopping time.) ? Any request that doesn't respond within this time can be safely considered a fail.
The text was updated successfully, but these errors were encountered: