Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream job results in chunks #23

Merged
merged 4 commits into from
Apr 12, 2022
Merged

Conversation

Mandrenkov
Copy link
Collaborator

Context:

Currently, fetching large job results (e.g., those on the order of GBs) can take an unreasonably long time due to the memory overhead associated with the default fetching strategy of the requests library. Fetching the same job result from a web browser (e.g., Firefox) tends to be much faster presumably due to the streaming nature of the download.

Description of the Change:

  • Modified Connection.request() to respect the stream keyword argument during error handling.
  • Modified Job.result to stream job results from the Xanadu Cloud platform in chunks.

Benefits:

  • Large job results are fetched significantly faster, even more so as the size of the job result grows.

Possible Drawbacks:
None.

Related GitHub Issues:
None.

Copy link
Contributor

@doctorperceptron doctorperceptron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Since everything is now streamed I'm assuming that streaming small files isn't any slower than before?

xcc/connection.py Show resolved Hide resolved
xcc/job.py Show resolved Hide resolved
@Mandrenkov
Copy link
Collaborator Author

Thanks for the review, @doctorperceptron!

Looks good! Since everything is now streamed I'm assuming that streaming small files isn't any slower than before?

Excellent question! To check if this is the case, I submitted a small job and fetched the results 10 times using this branch as well as main. For reference, I submitted the job using

xcc job submit simulon_gaussian "name x\nversion 1.0\ntarget simulon_gaussian (shots=2)\nMeasureFock() | [0, 1, 2, 3]"

The results show there is no significant difference in the fetching time of this job between the branches.

Branch main

            Mean        Std.Dev.    Min         Median      Max
real        1.507       0.175       1.152       1.532       1.716
user        0.573       0.051       0.508       0.569       0.710
sys         0.694       0.045       0.582       0.697       0.767

Branch sc-17551-stream-job-results

            Mean        Std.Dev.    Min         Median      Max
real        1.595       0.219       1.075       1.611       1.993
user        0.508       0.033       0.445       0.515       0.545
sys         0.682       0.032       0.627       0.679       0.735

Copy link
Contributor

@doctorperceptron doctorperceptron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! 🚀

@Mandrenkov Mandrenkov merged commit afefd12 into main Apr 12, 2022
@Mandrenkov Mandrenkov deleted the sc-17551-stream-job-results branch April 12, 2022 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants