8-bit Character support on architectures were the smallest addressable unit size is 64-bit in Clang and LLVM
8-bit characters in VIAMPP
- Track: LLVM devroom
- Room: D.llvm
- Day: Saturday
- Start: 15:00
- End: 15:35
- Video with Q&A: D.llvm
- Video only: D.llvm
- Chat: Join the conversation!
Clang and LLVM have a great history of supporting a great variety of CPUs, from 8- to 64-bits assuming they all have a smallest size of an addressable unit of 8-bits words. Despite the fact that a lot of types and there alignment can be defined with the “target datalayout” string, the “character” and “short” type have been hard-coded into clang and llvm.
Clang and LLVM have a great history of supporting a great variety of CPUs, from 8- to 64-bits assuming they all have a smallest size of an addressable unit of 8-bits words. Despite the fact that a lot of types and there alignment can be defined with the “target datalayout” string, the “character” and “short” type have been hard-coded into clang and llvm. Once you compile with clang you will get for example:
@.str = private unnamed_addr constant [6 x i8] c"Hallo\00", align 8
Some proposals exist to that offer a solutions to this problem (e.g. FOSDEM 2012: “Adding 16-bit Character Support in LLVM” or https://lists.llvm.org/pipermail/llvm-dev/2019-May/132080.html: “On removing magic numbers assuming 8-bit bytes”). Following this ideas one has to apply changes to over 120 files (clang and llvm v12.0.0) and keeping a patch set nearly impossible.
Looking for simpler solution for this problem we explored a couple of alternative solutions. Two design goals have to be satisfied:
don’t change CHAR_BIT
keep CharWidth at 8-bits
Only the modification of the character alignment to 64-bits is allowed. With modifying only 8 files (some of them only dealing with character assertions) we end up with the desired result of:
@.str = private unnamed_addr constant [6 x i64] [i64 72, i64 97, i64 108, i64 108, i64 111, i64 0], align 8
This solutions can also easily be adopted to machines with a minimal addressable unit of 16- or 32-bits. Also “WChar” can be addressed with minimal changes.
As this is solution is still under testing, the amount of files changed might be further reduced, and should allow for a small and simple patch set.
Speakers
Thomas Pietsch |