Skip to content

Commit

Permalink
[CIR][CIRGen] Support wide string literals (llvm#399)
Browse files Browse the repository at this point in the history
This commit supports the codegen of wide string literals, including
`wchar_t` string literals, `char16_t` string literals, and `char32_t`
string literals.

I'm not following the proposal in llvm#374. The clang frontend doesn't
record the literal string. It only records the encoded code units for
wide string literals. So I believe that a dedicated string attribute
with an encoding tag as described in llvm#374 may not be that helpful as I
thought.
  • Loading branch information
Lancern authored and lanza committed Apr 17, 2024
1 parent e4ef651 commit b254b47
Show file tree
Hide file tree
Showing 2 changed files with 57 additions and 2 deletions.
33 changes: 31 additions & 2 deletions clang/lib/CIR/CodeGen/CIRGenModule.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1103,8 +1103,37 @@ CIRGenModule::getConstantArrayFromStringLiteral(const StringLiteral *E) {
return builder.getString(Str, eltTy, finalSize);
}

assert(0 && "not implemented");
return {};
auto arrayTy =
getTypes().ConvertType(E->getType()).dyn_cast<mlir::cir::ArrayType>();
assert(arrayTy && "string literals must be emitted as an array type");

auto arrayEltTy = arrayTy.getEltType().dyn_cast<mlir::cir::IntType>();
assert(arrayEltTy &&
"string literal elements must be emitted as integral type");

auto arraySize = arrayTy.getSize();
auto literalSize = E->getLength();

// Collect the code units.
SmallVector<uint32_t, 32> elementValues;
elementValues.reserve(arraySize);
for (unsigned i = 0; i < literalSize; ++i)
elementValues.push_back(E->getCodeUnit(i));
elementValues.resize(arraySize);

// If the string is full of null bytes, emit a #cir.zero instead.
if (std::all_of(elementValues.begin(), elementValues.end(),
[](uint32_t x) { return x == 0; }))
return builder.getZeroAttr(arrayTy);

// Otherwise emit a constant array holding the characters.
SmallVector<mlir::Attribute, 32> elements;
elements.reserve(arraySize);
for (uint64_t i = 0; i < arraySize; ++i)
elements.push_back(mlir::cir::IntAttr::get(arrayEltTy, elementValues[i]));

auto elementsAttr = mlir::ArrayAttr::get(builder.getContext(), elements);
return builder.getConstArray(elementsAttr, arrayTy);
}

// TODO(cir): this could be a common AST helper for both CIR and LLVM codegen.
Expand Down
26 changes: 26 additions & 0 deletions clang/test/CIR/CodeGen/wide-string.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
// RUN: %clang_cc1 -std=c++17 -triple x86_64-unknown-linux-gnu -fclangir-enable -emit-cir %s -o %t.cir
// RUN: FileCheck --input-file=%t.cir %s

const char16_t *test_utf16() {
return u"你好世界";
}

// CHECK: cir.global "private" constant internal @{{.+}} = #cir.const_array<[#cir.int<20320> : !u16i, #cir.int<22909> : !u16i, #cir.int<19990> : !u16i, #cir.int<30028> : !u16i, #cir.int<0> : !u16i]> : !cir.array<!u16i x 5>

const char32_t *test_utf32() {
return U"你好世界";
}

// CHECK: cir.global "private" constant internal @{{.+}} = #cir.const_array<[#cir.int<20320> : !u32i, #cir.int<22909> : !u32i, #cir.int<19990> : !u32i, #cir.int<30028> : !u32i, #cir.int<0> : !u32i]> : !cir.array<!u32i x 5>

const char16_t *test_zero16() {
return u"\0\0\0\0";
}

// CHECK: cir.global "private" constant internal @{{.+}} = #cir.zero : !cir.array<!u16i x 5>

const char32_t *test_zero32() {
return U"\0\0\0\0";
}

// CHECK: cir.global "private" constant internal @{{.+}} = #cir.zero : !cir.array<!u32i x 5>

0 comments on commit b254b47

Please sign in to comment.