Are missed peephole/canonicalization optimizations worth reporting to GCC/Clang?
I’ve been comparing GCC 15/trunk and Clang on small 32-bit bit-vector expressions, and I’ve found a few proven equivalences where one compiler canonicalizes a pattern while the other does not. The optimized forms typically yield modest scalar speed improvements.
Two examples:
uint32_t is_nonzero = (x | (0u - x)) >> 31;
Clang folds this to `x != 0`, producing a clean `test` / `setne` sequence on x86. GCC, including trunk, currently emits a more literal `neg/or/shr`-style sequence.
uint32_t carry64 = (uint32_t)((((uint64_t)x) + y) >> 32);
uint32_t carrycmp = (x + y) < y; // or < x
return carry64 == carrycmp;
This is mathematically always true for 32-bit unsigned `x` and `y`.
Clang folds the `(x + y) < x` spelling to a constant true result, but not the `(x + y) < y` spelling on the targets I tested. GCC currently does not fold either spelling.
My questions are:
- Do maintainers generally appreciate reports for small peephole/canonicalization misses like these?
- Is there a rough threshold where a pattern is considered too niche to justify the compile-time cost or added middle-end complexity?
- Is it better to file these as separate issues, or group related identities into one report?
I can provide minimal reproducers, Z3 proofs, and benchmark data if useful.
Note: I used AI to clean up the wording of this post. The compiler testing, proofs, and benchmark data were generated by my own scripts.