Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running Workload D and E in Parallel #1061

Closed
ntrhieu89 opened this issue Nov 12, 2017 · 4 comments
Closed

Running Workload D and E in Parallel #1061

ntrhieu89 opened this issue Nov 12, 2017 · 4 comments

Comments

@ntrhieu89
Copy link

ntrhieu89 commented Nov 12, 2017

Hi,

I am trying to run workloads D and E in parallel (using multiple nodes to issue actions).
The problem is the insert operation. How to prevent two different nodes to insert the same new user to the data store?

This link https://github.com/brianfrankcooper/YCSB/wiki/Running-a-Workload-in-Parallel gives an instruction for loading the data store with multiple nodes but doesn't mention anything about the run phase.

Thanks,
Hieu Nguyen

@busbey
Copy link
Collaborator

busbey commented May 19, 2018

the insertstart and insertcount properties are used to constrain the keys chosen for all operations, including when workloads are in the run phase.

@busbey
Copy link
Collaborator

busbey commented Jun 14, 2018

closing as stale. if you're still having trouble please reopen.

@perdelt
Copy link

perdelt commented Apr 6, 2023

Hi, I still have this problem. I load data with

recordcount=1000000
operationcount=500000
workload=site.ycsb.workloads.CoreWorkload

readallfields=true

readproportion=0
updateproportion=0
scanproportion=0.95
insertproportion=0.05

requestdistribution=zipfian

maxscanlength=100

scanlengthdistribution=uniform

insertstart=0
insertcount=500000

and

recordcount=1000000
operationcount=500000
workload=site.ycsb.workloads.CoreWorkload

readallfields=true

readproportion=0
updateproportion=0
scanproportion=0.95
insertproportion=0.05

requestdistribution=zipfian

maxscanlength=100

scanlengthdistribution=uniform

insertstart=500000
insertcount=500000

without a problem. I then have two processes for workload E
First

recordcount=2000000
operationcount=500000
workload=site.ycsb.workloads.CoreWorkload

readallfields=true

readproportion=0
updateproportion=0
scanproportion=0.95
insertproportion=0.05

requestdistribution=zipfian

maxscanlength=100

scanlengthdistribution=uniform

insertstart=1000000
insertcount=500000

Second

recordcount=2000000
operationcount=500000
workload=site.ycsb.workloads.CoreWorkload

readallfields=true

readproportion=0
updateproportion=0
scanproportion=0.95
insertproportion=0.05

requestdistribution=zipfian

maxscanlength=100

scanlengthdistribution=uniform

insertstart=1500000
insertcount=500000

and I receive a lot of PK violation errors.
Can you help me, where am I doing something wrong?
Many thanks!

@seybi87
Copy link
Contributor

seybi87 commented Apr 20, 2023

Hi @perdelt

the problems with the PV violations are caused by the specified recordcount values in the parallel RUN phase. The recordcount also specifies the key range of the inserted records, i.e. in the RUN phase you have specified for both workloads recordcount=2000000.

In consequence, the Workloads E1 and E2 will insert records from the key range insertstart to recordcount:

  • Workload E-1 1000000 to 2000000
  • Workload E-2 1500000 to 2000000

Setting the recordcount=1500000 for Workload E-1 should solve the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants