-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathTODO
118 lines (93 loc) · 3.88 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
Version roadmap
---------------
High priority (1.0 blockers):
* make NVIDIA OpenCL SDK examples to work
* make Intel OpenCL SDK examples to work
* fix issues when calling kernels with struct or vector
value parameters: https://github.com/pocl/pocl/issues/1
Medium priority:
* complete the kernel runtime library.
* complete the host runtime library.
* device supporting AMD GPU cards.
* Check all the function pointers in the ICD dispatch struct.
Known ambiguous OpenCL 1.2 features
-----------------------------------
The OpenCL 1.2 and later standards are very ambiguous when it
comes to sub-devices. On the one hand, they claim that sub-devices
can be used wherever their parent devices can be used, on the
other hand various parts of the standard hint that they should be
treated independently.
In particular, it's not clear whether sub-devices can be used
within a context that only holds their parent device, or not. This
might even depend on whether the context was created "from type"
or not.
The experimental implementation in pocl currently assumes that
sub-devices are to be treated independently from their parent
device. This means, for example, that sub-devices cannot be used
in a context that does not contain them (but contains their parent
device). Note that this is different from the AMD behavior (which
is tested in the DeviceFission AMD APP SDK example), but follows
e.g. Intel's behavior. Clarification from the standard body is
needed on which behavior is correct.
There is room for optimizations in the current implementation,
particularly for what concerns the program build system, since
sub-devices share the bitcode with their parent device and
building could be done only once. Such an optimization will
actually become necessary if the other behavior (sub-devices as
slaves of their parent device) is ever implemented in the future.
Known missing OpenCL 1.2 features
---------------------------------
Missing APIs used by the tested OpenCL example suites are
entered here. This is not a complete list of unimplemented
APIs in pocl, but one that has been updated whenever
missing APIs have been encountered in the test cases.
(*) == Used by the opencl-book-samples.
(R) == Used by the Rodinia benchmark suite.
(P) == Used by pyopencl
(B) == Used by the Parboil benchmarks
4. THE OPENCL PLATFORM LAYER
* 4.1 Querying platform info (properly)
* 4.3 Partitioning device
* 4.4 Contexts
5. THE OPENCL RUNTIME
* 5.1 Command queues
* 5.2.1 Creating buffer objects
* 5.2.4 Mapping buffer objects
* 5.3 Image objects
* 5.3.3 Reading, Writing and Copying Image Objects
* 5.4 Querying, Umapping, Migrating, ... Mem objects
* 5.4.1 Retaining and Releasing Memory Objects
* 5.4.2 Unmapping Mapped Memory Objects
* 5.5 Sampler objects
* 5.5.1 Creating Sampler Objects
* 5.6.1 Creating Program Objects
* 5.7.1 Creating Kernel Objects
* 5.9 Event objects
* clWaitForEvents (*)
* 5.10 Markers, Barriers and Waiting for Events
* clEnqueueMarker (deprecated in OpenCL 1.2) (*, B)
* 5.12 Profiling
6. THE OPENCL C PROGRAMMING LANGUAGE
* 6.12.11 Atomic functions
* cl_khr_local_int32_base_atomics (Chapter_14/histogram)
* 6.12.14.2 Built-in Image Read Functions
* read_imagef (R[particlefilter])
* read_imageui (B[sad])
OpenCL 1.2 Extensions
* 9.7 Sharing Memory Objects with OpenGL / OpenGL
ES Buffer, Texture and Renderbuffer Objects
* 9.7.6 Sharing memory objects that map to GL objects
between GL and CL contexts
* clEnqueueAcquireGLObjects (*)
Miscellaneous
Other
-----
* configure should check for 'clang'
* build system should use $(CXX) everywhere,
now some parts assume g++ and it fails if
only c++ is installed
Optimization opportunities
--------------------------
* Even when using an in-order queue, schedule kernels
in parallel in case their input buffers are not depending
on the unfinished ones (should be legal per OpenCL 1.2 5.11).