[PGO][HIP] Fix profile-only Windows link by gating ROCm interceptor macro (#200859)
PR #200111 stops compiling InstrProfilingPlatformROCm.cpp (which defines
the
HIP GPU helper __llvm_profile_hip_collect_device_data) in profile-only
builds.
But the compile define -DCOMPILER_RT_BUILD_PROFILE_ROCM=1 was still
added
whenever the COMPILER_RT_BUILD_PROFILE_ROCM option was on (the default),
so
InstrProfilingFile.c still referenced the helper from
__llvm_profile_write_file
even though it was never built.
On ELF the declaration is weak, so the undefined symbol folds to null
and the
address-guarded call is skipped. COFF/Windows has no such fallback:
error LNK2019: unresolved external symbol
[10 lines not shown]
[lldb-dap] Use SetTarget for launch and attach commands (#200133)
Without this patch event listener registration was skipped, as a result
`Modules` view in UI was not displayed in case of launching target via
`launchCommands` or `attachCommands`.
[DirectX] Implement lowering of llvm.dx.resource.samplebias to the SampleBias DXIL Op (#199745)
Fixes #192548
This PR implements the lowering of the `llvm.dx.resource.samplebias`
intrinsic to the `SampleBias` DXIL Op.
Although I reckon that other `lowerSample*` functions in
`DXILOpLowering.cpp` will have shared logic, this is the first one to be
implemented. Consolidating common logic between future `lowerSample*`
functions can be left to a later PR implementing the second or other
`lowerSample*` function.
Assisted-by: Claude Opus 4.6
[AMDGPU] Verify data size of load-to-LDS intrinsics (#200587)
An out-of-range size immarg (e.g. 0) produced an illegal i0 memory type
during SelectionDAG building and crashed the backend instead of being
rejected up front
[SROA] Canonicalize homogeneous structs to fixed vectors (opt-in, after memcpyopt) (#165159)
SROA sometimes keeps temporary allocas around for homogeneous structs
like
`{ i32, i32, i32, i32 }` because the partition has only memcpy/memset
traffic
and no scalar typed users to drive vector promotion. On targets like
AMDGPU
these allocas turn into scratch memory and hurt performance. This PR
adds a
helper `tryCanonicalizeStructToVector` that converts such a partition to
a
fixed vector type when every non-debug, non-lifetime user is a memory
intrinsic, so the alloca can promote through normal vector load/store
paths.
The element-shape rule accepts any homogeneous element count, any
integer
width, any FP type, and integral pointer types, as long as the struct is
tightly packed.
[26 lines not shown]
amd64: do not switch back and restore UEFI IDT in wrmsr_early_safe_end()
The memory where the pre-OS IDT was located might be already consumed by
kernel.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D57321
amd64: there is no reason to copy ucode around in ucode_load_bsp()
PR: 294630
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differrential revision: https://reviews.freebsd.org/D57368
[LLDB] Detect cycles during Type resolution (#200304)
I got LLDB crash reports from the Swift plugin where (presumably
malformed) debug info sends lldb_private::Type into an infite recursion.
Most likely this is a bug in the DWARF parser, however, even malformed
inputs shouldn't crash LLDB so this patch adds cycle detection.
rdar://177856769
Assisted-by: claude
[NFC][clang-sycl-linker] Apply LLVM coding standards to ClangSYCLLinker.cpp (#200543)
Bring the file in line with llvm/docs/CodingStandards.rst without
changing
behavior:
- Restore the canonical //===---===// file-header banner.
- Move free functions out of the anonymous namespace and mark them
`static`; keep only types (LinkerOptTable, LinkResult, SplitModule,
IRSplitMode, EntryPointCategorizer) inside anonymous namespaces.
- Rename a local `OutputFile` in createTempFile to `Path` to stop it
shadowing the file-scope `OutputFile`.
- Rename the inner `Err` in runCodeGen to `MatErr` to stop it shadowing
the surrounding `SMDiagnostic Err`.
- Normalize parameter-name comments to the `/*Name=*/value` form.
- Strip quotes from Doxygen `\param 'Name'` directives.
Co-Authored-By: Claude
[clang] fix transformation of SubstNonTypeTemplateParmExpr nodes from typealiases and concepts
This makes sure SubstNonTypeTemplateParmExpr produced from non-specialization
decls (Type alias templates and concepts) are correctly transformed.
This makes the SubstNonTypeTemplateParmExpr store the parameter type directly,
and uses that instead of relying on the AssociatedDecl.
Fixes #191738
Fixes #196375
[clang] fix transformation of SubstNonTypeTemplateParmExpr nodes from typealiases and concepts
This makes sure SubstNonTypeTemplateParmExpr produced from non-specialization
decls (Type alias templates and concepts) are correctly transformed.
This makes the SubstNonTypeTemplateParmExpr store the parameter type directly,
and uses that instead of relying on the AssociatedDecl.
Fixes #191738
Fixes #196375
[cuda][flang] Diagnose missing CUDA intrinsic modules in Flang semantics (#200509)
- Replace CUDA intrinsic module `CHECK`s with actionable diagnostics
when `cudadevice` or `__cuda_builtins` cannot be read.
- Avoid dereferencing missing CUDA module scopes during implicit CUDA
symbol import.
- Add a semantics test covering the missing CUDA intrinsic module
diagnostic.
[lldb][Windows] Use captured error in ConnectionGenericFile::Read (#200803)
Use the captured value on both branches so the reported error matches
the one that was tested against.
[libc] Add netinet/udp.h containing struct udphdr (#200839)
This patch adds a generated <netinet/udp.h> containing the `udphdr`
structure definition.
There are two styles ("linux" and "BSD") of udphdr field names (and both
of them can be found in the wild), so I follow the glibc and bionic
approach of using an anonymous union. (musl uses a #define on the field
names, which doesn't seem that great).
I've added the target to `include/CMakeLists.txt` and registered it
under target lists in `headers.txt` for the supported Linux platforms
(x86_64, aarch64, and riscv).
To verify layout and alignment correctness, I've added a layout and
field compatibility unit test under `test/src/netinet/udp_test.cpp`.
Assisted by Gemini.
[Clang][AMDGPU] Restore the non-RDC compilation pipeline
The new offload driver uses the LTO compilation pipeline even for non-RDC
compilation. This PR restores the conventional non-RDC flow, where the backend
generates executable code directly, which is then bundled into the HIP fat
binary.
We can revert this change in the future if we decide to deprecate the
distinction between non-RDC and RDC compilation and unify the compilation flow.