Hello
Any updates that fixed the problem of using only 16 registers of the 32 available for KNL and Skylake?
I tried qopt-zmm-usage=low/high and so on and it seems that nothing works... Here is an assembly example:
40122e: 62 71 7c 48 28 fd vmovaps %zmm5,%zmm15
401234: 62 d1 85 48 5c ec vsubpd %zmm12,%zmm15,%zmm5
40123a: 62 51 ed 48 58 fa vaddpd %zmm10,%zmm2,%zmm15
401240: 62 c1 7c 48 28 ca vmovaps %zmm10,%zmm17
401246: 62 31 7c 48 28 e0 vmovaps %zmm16,%zmm12
40124c: 62 61 7c 48 28 e9 vmovaps %zmm1,%zmm29
401252: 62 d1 9d 48 58 c8 vaddpd %zmm8,%zmm12,%zmm1
401258: 62 61 7c 48 28 e4 vmovaps %zmm4,%zmm28
40125e: 62 b1 7c 48 28 e1 vmovaps %zmm17,%zmm4
401264: 62 61 7c 48 28 d9 vmovaps %zmm1,%zmm27
40126a: 62 f1 dd 48 5c ca vsubpd %zmm2,%zmm4,%zmm1
An alternative is to use only the 16 registers... is there a flag that actually works and restricts the number of registers to 16?
Thanks