[SystemZ] Simplify f128 atomic load/store (#90977)
Change definition of expandBitCastI128ToF128 and expandBitCastF128ToI128
to allow for simplified use in atomic load/store.
Update logic to split 128-bit loads and stores in DAGCombine to also
handle the f128 case where appropriate. This fixes the regressions
introduced by recent atomic load/store patches.
[DAG] Fold bitreverse(shl/srl(bitreverse(x),y)) -> srl/shl(x,y) (#89897)
Noticed while investigating GFNI per-element vector shifts (we can form SHL but not SRL/SRA)
Alive2: https://alive2.llvm.org/ce/z/fSH-rf
[LoongArch] Optimize *W Instructions at MI level (#90463)
Referring to RISC-V, adding an MI level pass to optimize *W instructions
for LoongArch.
First it removes unneeded sext(addi.w rd, rs, 0) instructions. Either
because the sign extended bits aren't consumed or because the input was
already sign extended by an earlier instruction.
Then:
1. Unless explicit disabled or the target prefers instructions with W
suffix, it removes the -w suffix from opw instructions whenever all
users are dependent only on the lower word of the result of the
instruction. The cases handled are:
* addi.w because it helps reduce test differences between LA32 and LA64
w/o being a pessimization.
2. Or if explicit enabled or the target prefers instructions with W
suffix, it adds the W suffix to the instruction whenever all users are
[4 lines not shown]
[AMDGPU] don't mark control-flow intrinsics as convergent (#90026)
This is really a workaround to allow control flow lowering in the
presence of convergence control tokens. Control-flow intrinsics in LLVM
IR are convergent because they indirectly represent the wave CFG, i.e.,
sets of threads that are "converged" or "execute in lock-step". But they
exist during a small window in the lowering process, inserted after the
structurizer and then translated to equivalent MIR pseudos. So rather
than create convergence tokens for these builtins, we simply mark them
as not convergent.
The corresponding MIR pseudos are marked as having side effects, which
is sufficient to prevent optimizations without having to mark them as
convergent.
[lldb] Add SBType::GetByteAlign (#90960)
lldb already mostly(*) tracks this information. This just makes it
available to the SB users.
(*) It does not do that for typedefs right now see llvm.org/pr90958
Reapply "SystemZ: Fold copy of vector immediate to gr128" (#91099)
This reverts commit a415b4dfcc02e3e82b8c8a7836f7c04b9d65dc9b.
Modify the instruction in place to transform it into a REG_SEQUENCE,
which is what other implementations of foldImmediate do. Also start
erasing the def instruction if there are no other uses.
Fixes #91110.
Revert "Remove redundant move in return statement" (#91169)
Reverts llvm/llvm-project#90546
This broke some bots, seems like some toolchain don’t consider the
implicit move here.
[clang] Enable FPContract with optnone (#91061)
Previously treatment of the attribute `optnone` was modified in
https://github.com/llvm/llvm-project/pull/85605 ([clang] Set correct
FPOptions if attribute 'optnone' presents). As a side effect FPContract
was disabled for optnone. It created unneeded divergence with the
behavior of -O0, which enables this optimization.
In the discussion
https://github.com/llvm/llvm-project/pull/85605#issuecomment-2089350379
it was pointed out that FP contraction should be enabled even if all
optimizations are turned off, otherwise results of calculations would be
different. This change enables FPContract at optnone.
Reapply "AMDGPU: Implement llvm.set.rounding (#88587)" series (#91113)
Revert "Revert 4 last AMDGPU commits to unbreak Windows bots"
This reverts commit 0d493ed2c6e664849a979b357a606dcd8273b03f.
MSVC does not like constexpr on the definition after an extern
declaration of a global.
Remove redundant move in return statement (#90546)
This pull request removes unnecessary move in the return statement to
suppress compilation warnings.
Co-authored-by: Xiaolei Shi <xiaoleis at nvidia.com>
[RISCV] Use virtual registers for AVL instrs in coalesce-vsetvli.mir. NFC
All GPR registers will still be virtual at this stage, so update the test
to reflect that.
[clang][dataflow] Fix crash when `operator=` result type is not destination type. (#90898)
The existing code was full of comments about how we assume this is
always the
case, but it's not mandated by the standard, and there is code out there
that
returns a different type. So check that the result type is in fact the
same as
the destination type before attempting to copy to the result.
To make sure that we don't bail out in more cases than intended, I've
extended
existing tests to verify that in the common case, we do return the
destination
object (by reference or value, as the case may be).
[RISCV] Teach .option arch to support experimental extensions. (#89727)
Previously `.option arch` denied extenions are not belongs to RISC-V
features. But experimental features have experimental- prefix, so
`.option arch` can not serve for experimental extension.
This patch uses the features of extensions to identify extension
existance.
Reland "[Modules] No transitive source location change (#86912)"
This relands 6c31104.
The patch was reverted due to incorrectly introduced alignment. And the
patch was re-commited after fixing the alignment issue.
Following off are the original message:
This is part of "no transitive change" patch series, "no transitive
source location change". I talked this with @Bigcheese in the tokyo's
WG21 meeting.
The idea comes from @jyknight posted on LLVM discourse. That for:
```
// A.cppm
export module A;
...
[246 lines not shown]
[clang-format] Don't allow comma in front of structural enum (#91056)
Assume that a comma in front of `enum` means it is actually a part of an
elaborated type in a template parameter list.
Fixes https://github.com/llvm/llvm-project/issues/47782
[ADT] Reimplement operator==(StringRef, StringRef) (NFC) (#91139)
I'm planning to deprecate and eventually remove StringRef::equals in
favor of operator==. This patch reimplements operator== without using
StringRef::equals.
I'm not sure if there is a good way to make StringRef::compareMemory
available to operator==, which is not a member function. "friend"
works to some extent but breaks corner cases, which is why I've chosen
to "inline" compareMemory.
fix formatting issues with ODS docs around assembly format directives (#91149)
- Some sentences are incorrectly split across list items.
- Some pre-formatted syntax is left in plaintext
- Some lines end in spaces
Co-authored-by: Jeremy Kun <j2kun at users.noreply.github.com>
[clang backend] In AArch64's DataLayout, specify a minimum function alignment of 4. (#90702)
This addresses an issue where the explicit alignment of 2 (for C++ ABI
reasons) was being propagated to the back end and causing under-aligned
functions (in special sections).
This is an alternate approach suggested by @efriedma-quic in PR #90415.
Fixes #90358
[AArch64][SelectionDAG] Lower multiplication by a constant to shl+sub+shl+sub (#90199)
Change the costmodel to lower a = b * C where C = 1 - (1 - 2^m) * 2^n to
sub w8, w0, w0, lsl #m
sub w0, w0, w8, lsl #n
Fix https://github.com/llvm/llvm-project/issues/89430