u/MindlessPapaya8463

Are missed peephole/canonicalization optimizations worth reporting to GCC/Clang?

I’ve been comparing GCC 15/trunk and Clang on small 32-bit bit-vector expressions, and I’ve found a few proven equivalences where one compiler canonicalizes a pattern while the other does not. The optimized forms typically yield modest scalar speed improvements.

Two examples:

uint32_t is_nonzero = (x | (0u - x)) >> 31;

Clang folds this to `x != 0`, producing a clean `test` / `setne` sequence on x86. GCC, including trunk, currently emits a more literal `neg/or/shr`-style sequence.

uint32_t carry64 = (uint32_t)((((uint64_t)x) + y) >> 32);

uint32_t carrycmp = (x + y) < y; // or < x

return carry64 == carrycmp;

This is mathematically always true for 32-bit unsigned `x` and `y`.

Clang folds the `(x + y) < x` spelling to a constant true result, but not the `(x + y) < y` spelling on the targets I tested. GCC currently does not fold either spelling.

My questions are:

- Do maintainers generally appreciate reports for small peephole/canonicalization misses like these?

- Is there a rough threshold where a pattern is considered too niche to justify the compile-time cost or added middle-end complexity?

- Is it better to file these as separate issues, or group related identities into one report?

I can provide minimal reproducers, Z3 proofs, and benchmark data if useful.

Note: I used AI to clean up the wording of this post. The compiler testing, proofs, and benchmark data were generated by my own scripts.

reddit.com
u/MindlessPapaya8463 — 2 days ago