-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate when barriers need to be generated #18
Labels
Comments
Bastacyclop
added
the
generated code
Changes to the code that our compiler generates
label
Feb 13, 2020
This test generates too many barriers, which can lead to invalid results in #74: shine/src/test/scala/apps/harrisCornerDetectionHalideCheck.scala Lines 199 to 207 in 7876d7b
The generated code looks like: __kernel
void harris(global float* restrict output, int n0, int n1, const global float* restrict x0, local float* restrict x2694, local float* restrict x2801){
/* mapWorkgroup */
for (int wg_id_3073 = get_group_id(1);(wg_id_3073 < ((7 + n0) / 8));wg_id_3073 = (wg_id_3073 + get_num_groups(1))) {
/* mapWorkgroup */
for (int wg_id_3080 = get_group_id(0);(wg_id_3080 < ((3 + n1) / 8));wg_id_3080 = (wg_id_3080 + get_num_groups(0))) {
/* mapLocal */
for (int l_id_3082 = get_local_id(1);(l_id_3082 < 12);l_id_3082 = (l_id_3082 + get_local_size(1))) {
/* mapLocal */
for (int l_id_3087 = get_local_id(0);(l_id_3087 < 3);l_id_3087 = (l_id_3087 + get_local_size(0))) {
/* oclReduceSeq */
{
// produce x2817 from x0
vstore4(x2817, 0, (&(x2801[((4 * l_id_3087) + (14 * l_id_3082))])));
}
}
// FIXME: this barrier should not be generated
barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE);
}
// FIXME: this barrier should only fence local memory
barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE);
/* mapLocal */
for (int l_id_3084 = get_local_id(1);(l_id_3084 < 10);l_id_3084 = (l_id_3084 + get_local_size(1))) {
/* mapLocal */
for (int l_id_3172 = get_local_id(0);(l_id_3172 < 3);l_id_3172 = (l_id_3172 + get_local_size(0))) {
/* oclReduceSeq */
{
// produce x2748 from x2801
// produce x2720 from x2801
vstore4(x2748, 0, (&(x2694[((4 * l_id_3172) + (24 * l_id_3084))])));
vstore4(x2720, 0, (&(x2694[((12 + (4 * l_id_3172)) + (24 * l_id_3084))])));
}
}
// FIXME: this barrier should not be generated
barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE);
}
// FIXME: this barrier should only fence local memory
barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE);
/* mapLocal */
for (int l_id_3086 = get_local_id(1);(l_id_3086 < 8);l_id_3086 = (l_id_3086 + get_local_size(1))) {
/* mapLocal */
for (int l_id_3265 = get_local_id(0);(l_id_3265 < 2);l_id_3265 = (l_id_3265 + get_local_size(0))) {
// produce output from x2694
}
// FIXME: this barrier should not be generated
barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE);
}
// FIXME: this barrier should only fence local memory
barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE);
}
}
} |
closed in favour of #224 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Currently, we are generating barriers only (and always) after generating a
parForLocal
. This is probably too much in many cases and may not be enough in some others.The text was updated successfully, but these errors were encountered: