[RFC][CodeGen] Add generic target feature checks for intrinsics
This PR adds target-independent infrastructure for annotating LLVM intrinsics
with required subtarget feature expressions.
It introduces a TargetFeatures string field to intrinsic TableGen records.
TableGen emits an intrinsic-to-feature mapping table.
Both SelectionDAG and GlobalISel now perform this check before lowering target
intrinsics. This allows targets to opt in by annotating intrinsic definitions
directly, rather than adding custom checks during lowering, legalization, or
instruction selection.
This PR uses one AMDGPU intrinsic as an example.
[AMDGPU][NFC] Templatise and roundtrip gfx12_asm_vop3_dpp16.s
This is effectively the changes between the non-template versions
of gfx11/12_asm_vop3_dpp16.s applied on top of the templatised
gfx11_asm_vop3_dpp16.s.
[ARM][LLD] Rewrite thunk tests to make output smaller [NFC] (#202551)
Some thunk tests can leave large relocatable objects and executables
around. In some cases it is possible to alter the linker script to make
the output smaller. These changes have been separated out as they have
more substantial changes to CHECK lines.
Related to #202261
[RFC][CodeGen] Add generic target feature checks for intrinsics
This PR adds target-independent infrastructure for annotating LLVM intrinsics
with required subtarget feature expressions.
It introduces a TargetFeatures string field to intrinsic TableGen records.
TableGen emits an intrinsic-to-feature mapping table.
Both SelectionDAG and GlobalISel now perform this check before lowering target
intrinsics. This allows targets to opt in by annotating intrinsic definitions
directly, rather than adding custom checks during lowering, legalization, or
instruction selection.
This PR uses one AMDGPU intrinsic as an example.
[AMDGPU][NFC] Generate opt checks with script (#203926)
Update testcase so that opt checks are generated with
update_test_checks.py.
Signed-off-by: John Lu <John.Lu at amd.com>
[libomp] Add kmp_vector (ADT 2/2)
See rationale in the commit adding kmp_str_ref.
This commit introduces kmp_vector, a class intended primarily for small
vectors. It currently only includes methods I need at the moment, but
it's easily extensible.
AMDGPU: Reland: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3
For V_DOT2_F32_F16 and V_DOT2_F32_BF16 add their VOPDName and mark
them with usesCustomInserter which will be used to add pre-RA register
allocation hints to preferably assign dst and src2 to the same physical
register. When the hint is satisfied, canMapVOP3PToVOPD recognises the
instruction as eligible for VOPD pairing by checking if it is VOP2 like:
dst==src2, no source modifiers, no clamp, and src1 is a register.
Mark both instructions as commutable to allow a literal in src1 to be
moved to src0, since VOPD only permits a literal in src0.
Original patch had a bug where it did not check if physical src
registers match register class of appropriate operand in fullVOPD
instructions, check is now done via isValidVOPDSrc.
[AArch64][LLD] Update tests to reduce file size [NFC] (#202547)
Remove large object and executable files after running test. Some tests
need to run within a single OutputSection and cannot use a Linker Script
to increase distance without a large object and corresponding executable
file.
Fixes AArch64 part of #202261
[ARM][LLD] Remove large files at end of test [NFC] (#202548)
Some range extension and erratum fix thunks can't easily use a linker
script to make gaps that don't result in a large output. Explicitly
remove the large object and linker output files to reduce storage usage.
Related to #202261
[ARM][LLD] Reduce thunk test case size, linkerscript changes [NFC] (#202549)
These changes either do some refactoring to use split-file and then
delete the outputs as the size saving is not large. Or it adapts the
linker script to reduce the size by introducing sparse program segments.
All these cases are fairly simple changes, and have made minimal changes
to the CHECK lines.
Related to #202261
[SPIR-V] Take ArrayRef instead of owning containers in selection helpers (NFC) (#203908)
Avoid per call heap allocations where call sites pass braced list
temporaries
AMDGPU: Validate VOPD/VOPD3 physical source registers against operand RC
Replace isVGPR checks with isValidVOPDSrc that validates physical source
registers against the actual combined VOPD/VOPD3 instruction's operand
register classes. Now we also validate operands for VOPD instructions.
[lldb] Reformat doxygen comments in lldb-enumerations.h (NFC) (#203079)
Convert doxygen comments to precede the enumerator to which they apply
(using `///`). This placement of documentation is more consistent with
how functions and classes are documented. Additionally, with the column
limit, the documentation was quite crammed as it was. Lastly, comments
have been reflowed, so that make full use of horizontal space.
Assisted-by: claude