-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add gaussian broadening to SLIT source #214
Conversation
- `param2.x`: width of gaussian broadening in direction perpendicular to both slit and v - `param2.y`: width of gaussian broadening in direction of slit
@lkeegan, thanks for the patch. I think this is a great extension to the slit source and I'd be happy to include it. I do have a few minor requests
the increase of register counts ranges from 3-5. It is not a huge deal, although currently MCX's speed is largely capped by the register counts, so any optimization of registers would give a better chance to run more simultaneous blocks. I understand that the use of new registers is for improving readability, which is also important. I am wondering if it is possible for you to try reusing some of your registers, (of course, add comments to indicate the updated meaning during multiple uses). If you pull again, and add the let me know if you have any trouble doing this. thanks again |
@lkeegan, I managed to cut down the registers from 9 to only 2. let me merge this patch first and add my modifications, please test if the function still work as expected. |
here is a diff of register counts compared to pre-merged mcx
3 templates still see growth of registers (1, 4, 8), compensated by drop of registers in 2 other templates (-1 and -6). tested on CUDA 10.2. other cuda versions may see different numbers. |
@@ -1462,6 +1462,29 @@ __device__ inline int launchnewphoton(MCXpos* p, MCXdir* v, Stokes* s, MCXtime* | |||
rotatevector(v, 1.f, 0.f, sphi, cphi); | |||
} | |||
|
|||
if (MCX_SRC_SLIT && (launchsrc->param2.x > 0.f || launchsrc->param2.y > 0.f)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the first condition should be gcfg->srctype == MCX_SRC_SLIT
. I will post my fix, just leave here for a note.
@fangq many thanks for the quick review and the performance improvements & fixes to the PR! I tested the master branch and it all works great, thanks! If you are interested I also tried Also, if reducing the number of registers used is key for good performance then it looks like separating the parallel and perpendicular contributions to
|
thanks, I wasn't aware of this compiler explorer feature, thanks for sharing it! yes, if you run nsight-compute to profile mcx kernels, you can see that the current throughput is largely limited by register counts (similarly for mcxcl and mmc). mcx has minimal memory overhead and thread divergence. from the godbolt page, it seems the output was generated from on the flip side, I would not go to the other extreme of destroy readability to trade small gains in registers either. |
param2.x
: width of gaussian broadening in direction perpendicular to both slit and vparam2.y
: width of gaussian broadening in direction of slit