Skip to content

Commit

Permalink
[Texture support][Part 1] TIR lowering and OpenCL support (#7686)
Browse files Browse the repository at this point in the history
* Add support for kTexture storage rank.

* Add scaffolding for texture_flatten pass.

* Add scaffolding for texture allocation.

* Implement 2d texture flattening to builtin tir.text2d_alloca.

* Lower BufferStore/Load to builtin texture store/load.

* Add vectorizable attribure to texture load and store.

* Support auto-vectorization on the innermost (RGBA) axis.

* Add read/write_imagef opencl codegen for builtin texture load/store.

* Add TextureType support.

* Add InferTextureAccess pass to deduce __read_only
and __write_only access qualifiers for texture vars.
Also refactor use of restrict keyword to be var dependent.

* Implement texture allocation as external function in TIR lowering.

* Remove commented lines.

* Add nd->2d texture flattening.

* Bug fixes in opencl codegen (row<>col, access quals.)

* Improve texture codegen by explicitly allocating local vector
for the texture load. Also support indexing individual elements
of the RGBA vector.

* Remove automatic vectorization
code as it is no longer needed.

* Improve SSA local use when storing texture read to scalar buffer.

* Define texture flattening convention such
that the outer Nd-1 axes are stored as rows,
and the last axis is stored as columns.

* Add tir lowering and opencl codegen support for float16 textures.

* Disable SSA when texture load is immediately casted.

* Allow RGBA extent to be of length 1.

* Add pass to forward externally allocated textures
in place of textures realized from cache_read. Fix
to better follow indexing spec.

* Add buffer_common.h to house buffer offset simplification routines.

* More refactor and clean up in texture lowering.

* Add IsTextureType to tir and allow buffer
var type annotation to be TextureType in addition
to PointerType.

* Bug fix in texture access qualifier inference pass

* Step toward handling external texture buffer forwarding
when external buffer is not stored directly to cache_read realized buffer.
For example when it is conditionally stored via an IfThenElse node when
padding is used.

* [Part 2/3] Support texture:weight lowering convention for externally provided
texture buffers. Need to propagate this to allocated textures when
cache_read(texture) is used for weights.

* Bug fix in texture access qualifier inference pass

* Tighten constraint on external buffer forwarding --
cache_read(texture) cancellation -- to avoid incorrect
programs. Currently only forward through if_then_else node
and direct external loads. For if_then_else, still need
proper analysis of structural equality between buffers
and access patterns to determine if an external buffer
can replace the texture buffer realized via cache_read.

* Use texture lowering convention from texture runtime util.

* Use updated texture lowering utilities

* Use inherited visitor overloads in texture flattener.

* Add check in codegen for float/half until
read/write_image codegen supports other types.

* Rename tir texture builtins

* Remove codegen and tir runtime dependence on for TVMBackendAlloc/FreeTexture.

* Dispatch texture allocas via target specialized tir.tvm_call_packed

* Remove kTexture scope and use kGlobal with texture tag.

* Remove TextureType.

* Remove TextureType from OpenCL codegen.

* Remove TextureType from TIR lowering.

* Remove dependency on MergeMulMod.

* Revert "Add buffer_common.h to house buffer offset simplification routines."

This reverts commit 0276282.

* Prune include list

* Add more documentation to texture flattening.

* Add TextureFlatten transform to refactored tvm lower API.

* Apply clang formatting.

* Blacken python APIs.

* Apply cpplint changes.

* Attempt to extract storage scope from pointer scope.

* Remove ExternalBufferForwarding (cache_read cancellation) for now.

* Apply MyPy.

* Clang format

* Only visit RealizeBuffer body for texture storage.

* Fix bad merge.

* Utilize OpenCL preprocessor to switch between
sampler-less and codegen provided sampler for
texture reads depending on whether the opencl
runtime is 2.0 compliant.

* Add texture codegen test example.

* Refactor tests to use pytest parameterization.

Blacken tests.

* Respond to CRs.
  • Loading branch information
csullivan authored Aug 20, 2021
1 parent 18a2ee1 commit c6f62aa
Show file tree
Hide file tree
Showing 14 changed files with 1,952 additions and 9 deletions.
14 changes: 14 additions & 0 deletions include/tvm/tir/builtin.h
Original file line number Diff line number Diff line change
Expand Up @@ -600,6 +600,20 @@ TVM_DLL const Op& vectorcombine();
* \brief atomic add instruction, corresponding e.g. to atomicAdd in CUDA
*/
TVM_DLL const Op& atomic_add();
/*!
* \brief Create a texture 2d memory allocation
*/
TVM_DLL const Op& texture2d_alloca();

/*!
* \brief Store to texture 2d memory
*/
TVM_DLL const Op& texture2d_store();

/*!
* \brief Load from texture 2d memory
*/
TVM_DLL const Op& texture2d_load();

/*! \brief The kind of structure field info used in intrinsic */
enum TVMStructFieldKind : int {
Expand Down
9 changes: 9 additions & 0 deletions include/tvm/tir/transform.h
Original file line number Diff line number Diff line change
Expand Up @@ -437,6 +437,15 @@ TVM_DLL Pass LowerMatchBuffer();
*/
TVM_DLL Pass FlattenBuffer();

/*
* \brief Flatten the multi-dimensional read/write
* to two dimensional texture Load/Store and realize
* texture buffer allocations.
*
* \return The Pass
*/
TVM_DLL Pass TextureFlatten();

/*!
* \brief Unify all the thread bindings for "blockIdx.x/y/z", "threadIdx.x/y/z", and
* "vthread.x/y/z". Before the unification, two vars that are bound to a thread axis (e.g.,
Expand Down
15 changes: 15 additions & 0 deletions python/tvm/tir/transform/transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,21 @@ def StorageFlatten(cache_line_size, create_bound_attribute: bool = False):
return _ffi_api.StorageFlatten(cache_line_size, create_bound_attribute) # type: ignore


def TextureFlatten():
"""Flatten the multi-dimensional read/write to 2D.
Parameters
----------
Returns
-------
fpass : tvm.transform.Pass
The result pass
"""
return _ffi_api.TextureFlatten() # type: ignore


def InjectCopyIntrin(pragma_key: str, fintrin):
"""Inject virtual thread loops.
Expand Down
1 change: 1 addition & 0 deletions src/driver/driver_api.cc
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,7 @@ Array<tvm::transform::Pass> CreatePassList(bool disable_loop_partition) {

// PHASE 1
pass_list.push_back(tir::transform::InjectPrefetch());
pass_list.push_back(tir::transform::TextureFlatten());
pass_list.push_back(tir::transform::StorageFlatten(64, instrument_bound_checkers));
pass_list.push_back(tir::transform::LowerInitBlock());
pass_list.push_back(tir::transform::PlanAndUpdateBufferAllocationLocation());
Expand Down
10 changes: 8 additions & 2 deletions src/target/source/codegen_c.cc
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,8 @@ void CodeGenC::AddFunction(const PrimFunc& f) {
}
}

if (no_alias && restrict_keyword_.length() != 0) {
stream << ' ' << restrict_keyword_;
if (no_alias) {
PrintRestrict(v, stream);
}
} else {
PrintType(GetType(v), stream);
Expand Down Expand Up @@ -1018,6 +1018,12 @@ void CodeGenC::PrintVecElemLoadExpr(DataType t, int i, const std::string& value,
return;
}

void CodeGenC::PrintRestrict(const Var& v, std::ostream& os) {
if (restrict_keyword_.length() != 0) {
os << ' ' << restrict_keyword_;
}
}

static bool CheckOutermostBracketMatch(const std::string& s) {
if (!s.empty() && s.front() == '(' && s.back() == ')') {
size_t len = s.size();
Expand Down
2 changes: 2 additions & 0 deletions src/target/source/codegen_c.h
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,8 @@ class CodeGenC : public ExprFunctor<void(const PrimExpr&, std::ostream&)>,
virtual std::string CastFromTo(std::string value, DataType from, DataType target);
// Get load of single element with expression
virtual void PrintVecElemLoadExpr(DataType t, int i, const std::string& value, std::ostream& os);
// Print restrict keyword for a given Var if applicable
virtual void PrintRestrict(const Var& v, std::ostream& os);

protected:
// Print reference to struct location
Expand Down
211 changes: 209 additions & 2 deletions src/target/source/codegen_opencl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -27,18 +27,63 @@
#include <vector>

#include "../../runtime/opencl/opencl_module.h"
#include "../../runtime/texture.h"
#include "../../runtime/thread_storage_scope.h"
#include "../build_common.h"

namespace tvm {
namespace codegen {

CodeGenOpenCL::CodeGenOpenCL() { restrict_keyword_ = "restrict"; }
class InferTextureAccess : public StmtExprVisitor {
public:
static constexpr const uint8_t kReadAccess = 1;
static constexpr const uint8_t kWriteAccess = 2;

InferTextureAccess() {}
std::unordered_map<const VarNode*, std::string> Infer(const Stmt& n) {
StmtExprVisitor::VisitStmt(n);
std::unordered_map<const VarNode*, std::string> storage_scope_qualifiers;
for (auto& texture : var_access_map_) {
if (texture.second == kReadAccess) {
storage_scope_qualifiers.insert({texture.first, "texture_read"});
} else if (texture.second == kWriteAccess) {
storage_scope_qualifiers.insert({texture.first, "texture_write"});
} else if (texture.second == (kReadAccess | kWriteAccess)) {
storage_scope_qualifiers.insert({texture.first, ""});
}
}
return storage_scope_qualifiers;
}
void VisitExpr_(const CallNode* op) {
if (op->op.same_as(builtin::texture2d_load())) {
var_access_map_[op->args[0].as<VarNode>()] |= kReadAccess;
} else if (op->op.same_as(builtin::texture2d_store())) {
var_access_map_[op->args[0].as<VarNode>()] |= kWriteAccess;
} else {
StmtExprVisitor::VisitExpr_(op);
}
StmtExprVisitor::VisitExpr_(op);
}

private:
std::unordered_map<const VarNode*, uint8_t> var_access_map_;
};

CodeGenOpenCL::CodeGenOpenCL() {
// Set OpenCL specific restrict keyword
restrict_keyword_ = "restrict";
}

void CodeGenOpenCL::InitFuncState(const PrimFunc& f) {
CodeGenC::InitFuncState(f);
this->SetTextureScope(InferTextureAccess().Infer(f->body));
for (Var arg : f->params) {
if (arg.dtype().is_handle()) {
auto ptr_type = arg->type_annotation.as<PointerTypeNode>();
if (ptr_type && runtime::IsTextureStorage(std::string(ptr_type->storage_scope))) {
// Storage scope qualifiers for textures are inferred
// and set prior to function codegen.
continue;
} else if (arg.dtype().is_handle()) {
alloc_storage_scope_[arg.get()] = "global";
}
}
Expand Down Expand Up @@ -75,6 +120,40 @@ std::string CodeGenOpenCL::Finish() {
decl_stream << "#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable\n"
"#pragma OPENCL EXTENSION cl_khr_global_int32_extended_atomics : enable\n\n";
}

// Enable OpenCL 1.2 sampler-less texture reads, but utilize
// provided sampler in OpenCL 2.0.
if (enable_compliant_texture_reads_) {
// TODO(csullivan, lunderberg): Extend device attribute querying to support remote devices
// generically through the device API such that a target can be created from a specific device's
// attributes and utilized during codegen. Potential generlization of #8127 (c02cafb) for remote
// devices.
//
// E.g. Only provide an image sampler when the local or remote device supports OpenCL 2.0,
// see below for context.
//
// For backwards compatibility with OpenCL 1.2, sampler-less read_image calls are used.
// By default in sampler-less read_image calls OpenCL defaults to
// sampler_ = "CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST";
// See section 6.12.14.3 Built-in Image Sampler-less Read Functions in the OpenCL 1.2
// specification. For OpenCL 2.0 it can be preferable to use,
// sampler_ = "CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST";
// For now we rely on OpenCL preprocessor directives to utilize the correct behavior
// depending on the OpenCL version detected at OpenCL compile time.
decl_stream << "#ifdef __OPENCL_VERSION__\n"
<< "#if __OPENCL_VERSION__ == CL_VERSION_2_0\n"
<< "#define READ_IMAGEH(image, sampler, coord) "
<< "read_imageh(image, sampler, coord)\n"
<< "#define READ_IMAGEF(image, sampler, coord) "
<< "read_imagef(image, sampler, coord)\n"
<< "#else\n"
<< "#define READ_IMAGEH(image, sampler, coord) "
<< "read_imageh(image, coord)\n"
<< "#define READ_IMAGEF(image, sampler, coord) "
<< "read_imagef(image, coord)\n"
<< "#endif\n"
<< "#endif\n\n";
}
return CodeGenC::Finish();
}

Expand Down Expand Up @@ -162,6 +241,23 @@ void CodeGenOpenCL::PrintType(DataType t, std::ostream& os) { // NOLINT(*)
LOG(FATAL) << "Cannot convert type " << t << " to OpenCL type";
}

void CodeGenOpenCL::PrintType(const Type& type, std::ostream& os) { // NOLINT(*)
if (auto* ptr = type.as<PrimTypeNode>()) {
return PrintType(ptr->dtype, os);
} else if (auto* ptr = type.as<PointerTypeNode>()) {
if (runtime::IsTextureStorage(std::string(ptr->storage_scope))) {
os << "image2d_t";
} else {
PrintType(ptr->element_type, os);
os << '*';
}
} else if (IsVoidType(type)) {
os << "void";
} else {
LOG(FATAL) << "Type " << type << " does not have a corresponding C Type";
}
}

void CodeGenOpenCL::PrintVecAddr(const VarNode* buffer, DataType t, PrimExpr base,
std::ostream& os) { // NOLINT(*)
if (!HandleTypeMatch(buffer, t.element_of())) {
Expand Down Expand Up @@ -210,6 +306,19 @@ void CodeGenOpenCL::PrintStorageScope(const std::string& scope, std::ostream& os
os << "__global ";
} else if (scope == "shared") {
os << "__local ";
} else if (scope == "texture_read") {
os << "__read_only ";
} else if (scope == "texture_write") {
os << "__write_only ";
}
}

void CodeGenOpenCL::PrintRestrict(const Var& v, std::ostream& os) {
// Apply restrict qualifer for non-texture types only
if (auto* ptr = v->type_annotation.as<PointerTypeNode>()) {
if (!runtime::IsTextureStorage(std::string(ptr->storage_scope))) {
os << ' ' << restrict_keyword_;
}
}
}

Expand All @@ -229,6 +338,39 @@ std::string CodeGenOpenCL::CastFromTo(std::string value, DataType from, DataType
return os.str();
}

void CodeGenOpenCL::VisitStmt_(const StoreNode* op) {
if (auto call = op->value.as<CallNode>()) {
if (call->op.same_as(builtin::texture2d_load())) {
need_texture_ssa_ = false;
// If storing a texture load into a buffer, don't use an
// intermediate local unless the buffer allocation is a
// single element selected from the texture read.
auto it = allocation_size_.find(op->buffer_var.get());
if (it != allocation_size_.end() && it->second == 1) {
need_texture_ssa_ = true;
}
}
}
CodeGenC::VisitStmt_(op);
need_texture_ssa_ = true;
}

void CodeGenOpenCL::VisitExpr_(const CastNode* op, std::ostream& os) {
if (auto call = op->value.as<CallNode>()) {
if (call->op.same_as(builtin::texture2d_load())) {
need_texture_ssa_ = false;
}
}
CodeGenC::VisitExpr_(op, os);
need_texture_ssa_ = true;
}

void CodeGenOpenCL::VisitStmt_(const AllocateNode* op) {
allocation_size_.insert(
{op->buffer_var.get(), op->constant_allocation_size() * op->dtype.lanes()});
CodeGenC::VisitStmt_(op);
}

void CodeGenOpenCL::VisitExpr_(const CallNode* op, std::ostream& os) {
if (op->op.same_as(builtin::address_of())) {
// Overload tvm_address_of to add storage scope (e.g. __global).
Expand All @@ -243,6 +385,64 @@ void CodeGenOpenCL::VisitExpr_(const CallNode* op, std::ostream& os) {
os << " *)" << this->GetVarID(load->buffer_var.get()) << " + ";
this->PrintExpr(load->index, os);
os << ')';
} else if (op->op.same_as(builtin::texture2d_store())) {
auto* ptr_type = op->args[0].as<VarNode>()->type_annotation.as<PointerTypeNode>();
ICHECK(ptr_type != nullptr) << "Texture Var's must be of PointerType";
ICHECK(runtime::IsTextureStorage(std::string(ptr_type->storage_scope)))
<< "builtin::texture2d_store() only supports storing to texture buffers";
DataType buffer_type = ptr_type->element_type.as<PrimTypeNode>()->dtype;
if (buffer_type.is_float16()) {
os << "write_imageh(";
} else if (buffer_type.is_float()) {
os << "write_imagef(";
} else {
LOG(FATAL) << "Unsupported type: " << buffer_type
<< ", currently only float and half are supported for image2d OpenCL codegen.";
}
this->PrintExpr(op->args[0], os);
os << ", ";
os << "(int2)(";
this->PrintExpr(op->args[1], os);
os << ", ";
this->PrintExpr(op->args[2], os);
os << "), ";
this->PrintExpr(op->args[3], os);
os << ")";
} else if (op->op.same_as(builtin::texture2d_load())) {
enable_compliant_texture_reads_ = true;
std::stringstream ss;
if (op->dtype.is_float16()) {
ss << "READ_IMAGEH(";
} else if (op->dtype.is_float()) {
ss << "READ_IMAGEF(";
} else {
LOG(FATAL) << "Unsupported type: " << op->dtype
<< ", currently only float and half are supported for image2d OpenCL codegen.";
}
this->PrintExpr(op->args[0], ss);
ss << ", ";
ss << "CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST, ";
ss << "((int2)(";
this->PrintExpr(op->args[1], ss);
ss << ", ";
this->PrintExpr(op->args[2], ss);
ss << ")))";

// Only use local SSA if texture is not already being stored
if (need_texture_ssa_) {
std::string rhs = SSAGetID(ss.str(), op->dtype.with_lanes(4));
if (op->args.back().as<RampNode>()) {
os << rhs;
} else {
os << "((";
this->PrintType(op->dtype.with_lanes(1), os);
os << "*)&" << rhs << ")[";
this->PrintExpr(op->args.back(), os);
os << "]";
}
} else {
os << ss.str();
}
} else if (op->op.same_as(builtin_call_extern_)) {
auto func = Downcast<StringImm>(op->args[0]);
// Enable atomics extension if used.
Expand Down Expand Up @@ -280,6 +480,13 @@ void CodeGenOpenCL::VisitExpr_(const FloatImmNode* op, std::ostream& os) { // N
}
}

void CodeGenOpenCL::SetTextureScope(
const std::unordered_map<const VarNode*, std::string>& scope) { // NOLINT(*)
for (auto& texture : scope) {
alloc_storage_scope_.insert(texture);
}
}

runtime::Module BuildOpenCL(IRModule mod, Target target) {
using tvm::runtime::Registry;
bool output_ssa = false;
Expand Down
Loading

0 comments on commit c6f62aa

Please sign in to comment.