[copmiler-rt] Initial support for building profile library on the GPU (#185552)
Summary:
As suggested in https://github.com/llvm/llvm-project/pull/177665, we
should build a GPU version of the compiler-rt profile library instead of
writing it in-line in the lowering. This PR does not define anything GPU
specific, it simply re-uses the baremetal handling. Later PRs will
prevent the GPU specific handling we would want to do to optimize
counter handling on the GPU.
Note that this will require using the cache file, or setting these
options
manually for existing users. Hopefully if people are using the cache
file
as they should it won't break anything.
Add sancov support for large AArch64 binaries. (#185374)
In AArch64 calls have a +/-128MB range
(https://developer.arm.com/documentation/ddi0602/2025-12/Base-Instructions/BL--Branch-with-link-).
In cases where the .text is larger than that, the linker adds functions
that just jumps to the sanitizer functions and places them to some code
location where the rest of the binary can call it. These functions have
the prefix __AArch64ADRPThunk__.
This commit marks calls to this function as coverage points.
[AArch64][GlobalISel] Add G_SQDMULL node
Previously, GISel was failing to lower the sqdmulls.scalar intrinsic. This is just a variation of sqdmull, but on two 32-bit S registers.
To fix this, create a G_SQDMULL node, and lower sqdmulls.scalar to that. This node is linked to the SD patterns for sqdmull, which allow this version of the intrinsic to lower.
[AArch64][PAC] Don't skip global legalization for AUTH_TCRETURN (#182513)
The 77bcab835aca1 folds llvm.ptrauth.resign intrinsic in case intrinsic
discriminant and key match those in call ptrauth bundle. However
assertion is now fired in AArch64AsmPrinter when PAC is enabled and
we're tail calling a global, because AUTH_TCRETURN expects address to be
stored in register.