[NFC][VPlan] Add initial tests for future VPlan-based stride MV
I tried to include both the features that current
LoopAccessAnalysis-based transformation supports (e.g., trunc/sext of
stride) but also cases where the current implementation behaves poorly,
e.g., https://godbolt.org/z/h31c3zKxK; as well as some other potentially
interesting scenarios I could imagine.
The are two test files with the same content. One is for VPlan dump change of
the future transformation alone (I'll update `-vplan-print-after` in the next
PR), another is for the full vectorizer pipeline. The latter have two `RUN:`
lines:
* No multiversioning, so the next PR diff can show the transformation itself
* Stride multiversionin performed in LAA, so that we can compare future
VPlan-based transformation vs old behavior.
icp: add SHA512 implementation using Intel SHA512 extensions
Generated from crypto/sha/asm/sha512-x86_64.pl in
openssl/openssl at 241d4826f8.
Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Attila Fülöp <attila at fueloep.org>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18233
simd: detect and surface support for Intel SHA512 extensions
Recent Intel CPUs (starting with Arrow Lake and Lunar Lake) include new
vectorised SHA512 instructions. Detect them and make them available to
the rest of the system.
Note the internal name "sha512ext". This is to disambiguate from other
uses of "sha512".
Sponsored-by: TrueNAS
Reviewed-by: Tony Hutter <hutter2 at llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1 at llnl.gov>
Reviewed-by: Attila Fülöp <attila at fueloep.org>
Signed-off-by: Rob Norris <rob.norris at truenas.com>
Closes #18233
Revert "[lldb] Batch breakpoint step-over for threads stopped at the … (#183378)
…same site (re-land) (#182944)"
This reverts commit 94d9f1b3cbb02700d9cd3339c1dbf44c0d13b550.
www/caddy: Remove NTML plugin as it causes issues with service control that can not worked around with anymore.
The NTML plugin and caddy core diverged too much and its considered unmaintained. While there clean up all service control workarounds that were implemented.
Since removing the service control (caddy_control.py) script would make it hard to somehow funnel caddyfile validation in, this has been removed too.
Our input is heavily validated so the Caddyfile will be valid in almost all cases, and in cases its not the log will show the error.
[mlir][xegpu] Add vector layout conflict handling in XeGPU layout propagation pass. (#182402)
This PR adds support for layout conflict handling for vector operands. A
conflict for a vector operand occurs when a value consumed at a given
operand is not in the expected layout in the context of the consumer
(for example `vector.multi_reduction` op's source require a specific
layout inferred from its current result layout). To resolve this
conflict, we insert an `xegpu.convert_layout` right after the producer
(essentially duplicating the producer with expected layout) and use the
new value in the consumer.
[mlir][llvmir][OpenMP] Translate affinity clause in task construct to llvmir
Translate affinity entries to LLVMIR by passing affinity information to
createTask (__kmpc_omp_reg_task_with_affinity is created inside PostOutlineCB).
[Flang][mlir][OpenMP] Support affinity clause codegen in Flang
This patch translate flang ast to OpenMP dialect for affinity clause
including the iterator modifier.
[TableGen] Complete the support for artificial registers
Artificial registers were added in eb0c510ecde667cd911682cc1e855f73f341d134
as a means of giving super-registers heavier weights than that
of their subregisters, even when they only contain a single
physical subregister.
Artifical registers thus do exist in code and participate in
register unit weight calculations, but are not supposed to be
available for register allocation.
This patch completes the support for artificial registers to:
- Ignore artificial registers when joining register unit uber
sets. Artificial registers may be members of classes that
together include registers and their sub-registers, making it
impossible to compute normalised weights for uber sets they
belong to.
[28 lines not shown]