Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking Runs Forever [Question] #618

Closed
shivam-maharshi opened this issue Feb 7, 2016 · 15 comments
Closed

Benchmarking Runs Forever [Question] #618

shivam-maharshi opened this issue Feb 7, 2016 · 15 comments

Comments

@shivam-maharshi
Copy link
Contributor

I ran YCSB with both parameters - operationcounts & maxexecutiontime. Now according to the documentation, the benchmarking should stop whichever occurs earlier. However my benchmarking never stops and I receive this on my prompt periodically - "Still waiting for thread Thread 1 to complete. Workload status: true". I see the code of the TerminatorThread which tries to Join the benchmarking threads after waiting for maxexecutiontime period. However if it is unable to join it in first go, then it keeps on retrying after every 2 second. Which is why my benchmarking runs forever.

I am creating a Step Benchmarking and hence I need to run one workload for strictly 10 minutes and then start another workload. Is there any way I can make sure the benchmarking stops withing say 10 minutes + 30 second (padded stopping time.) ? Any request that doesn't respond within this time can be safely considered a fail.

@shivam-maharshi shivam-maharshi changed the title Benchmarking Never Ending [Question] Benchmarking Runs Forever Question] Feb 7, 2016
@shivam-maharshi shivam-maharshi changed the title Benchmarking Runs Forever Question] Benchmarking Runs Forever [Question] Feb 7, 2016
@busbey
Copy link
Collaborator

busbey commented Feb 7, 2016

Which workload is this happening in? Which JVM are you using? Which OS?

The biggest problem I've had when trying to use time bound is the lag when waiting for workload d to handle its set up.

@shivam-maharshi
Copy link
Contributor Author

Workload: Core Workload - I have customized this to benchmark Web Services but have not touched the part that deals with any threads (Worker, Client or Terminator).
Java: Java HotSpot 64 Bit Server VM - 1.8.0_72
OS: Mac OS X El Captain 10.11.3

@busbey
Copy link
Collaborator

busbey commented Feb 7, 2016

Customized the code or customized the workload configuration file?

@shivam-maharshi
Copy link
Contributor Author

Customized the CoreWorkload.java file since YCSB doesn't support benchmarking of Web Services and I needed many other runtime configurable parameters. I have that code uploaded on GitHub here: https://github.com/shivam-maharshi/YCSB4WebServices/blob/master/src/main/java/com/yahoo/ycsb/workloads/CoreWorkload.java. This is not a clean code and is only required for my specific benchmarking purpose.

@busbey
Copy link
Collaborator

busbey commented Feb 7, 2016

Just to set expectations: it's going o be difficult to figure out if the issue is in YCSB or your changes (more so because it looks like you started with the base YCSB+some changes rather than forking the repo).

Could you explain what the fundamental issue is with i.e. writing a datastore binding for your web service? That would make it easier to isolate issues in the core framework from the changes specific to your use case.

I'll try to get a sense of the code this evening.

@shivam-maharshi
Copy link
Contributor Author

Yes you are right. I had a conversation with Andy Kruth and I've already forked YCSB and started to create a Rest Client binding module. That work is under progress.

The reason I started to not write a binding previously was because writing it cleanly was gonna take a little more time, which I unfortunately did not have. With the current framework I couldn't have done these, which I required.

  1. Configurable Zipf's constant value.
  2. Separate URL traces for Reads/Writes from a file.
  3. A non Scrambled Zipf's generator for Field length chooser.

Hence I decided to go with this approach, even when I knew it was wrong. Long story short, if you can get sense of the code, it would be great. However I feel that this is not introduced due to changes in CoreWorkload, since it does not manipulate threads in any way. I am hoping to create a pull request for the REST client I am working on this week. If we run into the same issue with that module as well, then it would be easier to deal with it.

Thanks for your response!

@kruthar
Copy link
Collaborator

kruthar commented Feb 8, 2016

@shivam-maharshi - Can you attach the command you are running, the workload configuration file you are using and also the output? These might help figure out context of the issue you are seeing.

@kruthar
Copy link
Collaborator

kruthar commented Feb 23, 2016

@shivam-maharshi - did you solve your issue?

@shivam-maharshi
Copy link
Contributor Author

@kruthar - I resolved this issue for myself by setting a maximum timeout for individual requests, in CRUD operations, in the client-binding. (Implementation was simply running a timer thread in parallel to stop the operation if it exceeded the given time limit.)

Coming back to the main point. YCSB benchmarking can run forever even if "maxexceutiontime" property is mentioned in the workload file. This is because Terminator Thread only tries to finish the benchmarking once "maxexceutiontime" has passed but does not guarantee it. Since it only joins the worker threads (benchmarking threads) to wind up the benchmarking and not interrupt/kill them. Hence for the scenario where a worker thread gets stuck - waiting to receive response/next bytes from the server side and never gets response from it, the terminator thread will indefinitely try to join the worker thread but will never be successful. This can happen when the DB Server does not respond to a request for long time.

Why hasn't it been reported so far?
The reason that this hasn't been reported so far is because most of the clients do have a connection or read timeout configured in the client-binding. Hence if the worker thread gets stuck, they will automatically be failed once those timeouts are reached. Since most (in-fact all) clients have timeouts I feel that the YCSB handling for "maxexecutiontime" is fine the way it is currently.

Once you've read this please let me know. I will close this item since it is not an issue, it is a design decision taken by YCSB.

@busbey
Copy link
Collaborator

busbey commented Feb 23, 2016

should we be sending an interrupt to the client threads?

@shivam-maharshi
Copy link
Contributor Author

That can be done but it can have an implication on the benchmarking results if not handled properly. For example if 10 client threads have just sent out some requests and "maxexecutiontime" has been reached. Now if we decide to interrupt those client threads, then the question arises that how should that be handled? Should the operations for those interrupted client threads be considered as fail or pass? IMO it would make sense to not count those operations in the benchmarking results at all.

@busbey
Copy link
Collaborator

busbey commented Feb 25, 2016

Okay, I'm fine closing this as-is. Would probably be worth mentioning the important of driver timeouts when we get around to making a datastore binding contribution guide.

@busbey
Copy link
Collaborator

busbey commented Apr 10, 2016

I just ran into this with the accumulo client while testing 0.8.0-RC3 (#678). Due to a cluster misconfiguration I filled HDFS during a load phase. Since I had client-side buffering on for the accumulo client, the terminator thread waited for shutdown to complete, which waited for the buffered writes to flush, which happily waited for Accumulo to recover.

+ ycsb-accumulo-binding-0.8.0-RC3/bin/ycsb load accumulo -P ycsb-accumulo-binding-0.8.0-RC3/workloads/workloade -p table=ycsb_workloade -cp /etc/accumulo/conf -p accumulo.columnFamily=family -p accumulo.instanceName=accumulo -p accumulo.zooKeepers=YYYYY -p accumulo.username=ycsb -p accumulo.password=XXXXXX -s -p maxexecutiontime=1200 -threads 30 -jvm-args=-Xmx8192m -p recordcount=2147483647 -p insertstart=0 -p insertcount=429496729 -p exportfile=/root/ycsb-load_workloade-accumulo-test-1.gce.cloudera.com-measurements.json -p exporter=com.yahoo.ycsb.measurements.exporter.JSONArrayMeasurementsExporter


real    469m43.223s
user    8m17.917s
sys     5m9.610s

Once I saw the cluste's status I fixed things, which then allowed Accumulo to recover, which then allowed the final writes to flush, which finally allowed the thread to complete and the client to exit. Bit over my target limit of 20 minutes though. ;)

@2victoria
Copy link

i meet the same problem when i test cassandra using ycsb,how to solve?please

@amgads
Copy link

amgads commented Feb 16, 2023

same problem with SQL mysql/mariadb - YCSB 0.17.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants