[MLIR][TableGen] Fix ArrayRefParameter in struct format roundtrip (#189065)
When an ArrayRefParameter (or OptionalArrayRefParameter) appears in a
non-last position within a struct() assembly format directive, the
printed
output is ambiguous: the comma-separated array elements are
indistinguishable from the struct-level commas separating key-value
pairs.
Fix this by wrapping such parameters in square brackets in both the
generated printer and parser. The printer emits '[' before and ']' after
the array value; the parser calls parseLSquare()/parseRSquare() around
the
FieldParser call. Parameters with a custom printer or parser are
unaffected
(the user controls the format in that case).
Fixes #156623
Assisted-by: Claude Code
[mlir][OpenMP] Add iterator support to depend clause
Extend the depend clause to support `!omp.iterated<Ty>` handles
alongside plain depend vars, so the IR can represent both forms.
libclc: Simplify fract implementation
This is nan propagating, so it's unnatural to implement it
in terms of the nan avoiding fmin. Implement with compare and
select, which is the least constrained way to implement the clamp.
AMDGPU: Match fract from compare and select and minimum
Implementing this with any of the minnum variants is overconstraining
for the actual use. Existing patterns use fmin, then have to manually
clamp nan inputs to get nan propagating behavior. It's cleaner to express
this with a nan propagating operation to start with.
AMDGPU: Match fract pattern with swapped edge case check
A fract implementation can equivalently be written as
r = fmin(x - floor(x))
r = isnan(x) ? x : r;
r = isinf(x) ? 0.0 : r;
or:
r = fmin(x - floor(x));
r = isinf(x) ? 0.0 : r;
r = isnan(x) ? x : r;
Previously this only matched the previous form. Match
the case where the isinf check is the inner clamp. There are
a few more ways to write this pattern (e.g., move the clamp of
infinity to the input) but I haven't encountered that in the wild.
The existing code seems to be trying too hard to match noncanonical
variants of the pattern. Only handles the result that all 4 permutations
of compare and select produce out of instcombine.
[XeVM] Use `ocloc` for binary generation. (#188331)
XeVM currently doesn't support native binary generation. This PR enables
Ahead of Time (AOT) compilation of gpu module to native binary using
`ocloc`.
Currently, only works with LevelZeroRuntimeWrappers.
[HLSL][DirectX][SPIRV] Implement the `fma` API (#185304)
This PR adds `fma` HLSL intrinsic (with support for matrices)
It follows all of the steps from #99117.
Closes #99117.
[msan] Disambiguate "Strict" vs. "Heuristic" when dumping instructions (#188873)
When -msan-dump-strict-instructions and
-msan-dump-heuristic-instructions are simultaneously enabled, it is
unclear from the output whether each instruction is strictly vs.
heuristically handled. [*] This patch fixes the issue by tagging the
output.
The actual instrumentation of the code is unaffected by this change.
[*] A workaround is to compile the code once with only
-msan-dump-strict-instructions, and a second time with
-msan-dump-heuristic-instructions, but this unnecessarily doubles the
compilation time.
[DA] Refactor signature of weakCrossingSIVtest and check inputs (NFCI) (#187117)
Passing SCEVAddRecExpr objects directly to weakCrossingSIVtest and
checking the validity of the input operands
[libc] Remove header templates from several C standard headers. (#188878)
Switches the following headers to hdrgen-produced ones by referencing
some macro from C standard and the file containing the declarations in
corresponding YAML files:
* limits.h (referenced _WIDTH / _MAX / _MIN families).
* locale.h (referenced LC_ family).
* time.h (referenced CLOCKS_PER_SEC).
* wchar.h (referenced WEOF).
[DTLTO] Improve performance of adding files to the link (#186366)
The in-process ThinLTO backend typically generates object files in
memory and adds them directly to the link, except when the ThinLTO cache
is in use. DTLTO is unusual in that it adds files to the link from disk
in all cases.
When the ThinLTO cache is not in use, ThinLTO adds files via an
`AddStreamFn` callback provided by the linker, which ultimately appends
to a `SmallVector` in LLD. When the cache is in use, the linker supplies
an `AddBufferFn` callback that adds files more efficiently (by moving
`MemoryBuffer` ownership).
This patch adds a mandatory `AddBufferFn` to the DTLTO ThinLTO backend.
The backend uses this to add files to the link more efficiently.
Additionally:
- Move AddStream from CGThinBackend to InProcessThinBackend, for reader
clarity.
- Modify linker comments that implied the AddBuffer path is
[12 lines not shown]
[RISCV][NFC] Use enum types to improve debuggability (#188418)
So that we can see the enum values instead of integral values when
dumping in debuggers.