看到「The Alder Lake SHLX Anomaly (tavianator.com)」這篇提到 Alder Lake (Intel 的第 12 代 CPU) 的 SHLX
變慢的問題,原文在「The Alder Lake SHLX anomaly」這邊。
這個指令蠻簡單的,理論上不論在什麼情況下都應該是 1 cycle,但作者發現在 Alder Lake 的 P core 上面反而需要 3 cycles (Alder Lake 的 E core 反而也是 1 cycle):
Left-shift is one of the simplest things to implement in hardware, so it's quite surprising that it should take 3 whole CPU cycles. It's been 1 cycle on every other CPU I'm aware of. It's even 1 cycle on Alder Lake's efficiency cores! Only the performance cores have this particular performance problem.
文章作者有在 Hacker News 上面回應 (id=42580251),說明找出這個問題花了不少力氣:
One fun thing about this is that spilling+restoring the register will fix it, so if any kind of context switch happens (thread switch, page fault, interrupt, etc.), the register will get pushed to the stack and popped back from it, and the code suddenly gets 3x faster. Makes it a bit tricky to reproduce reliably, and led me down a few dead ends as I was writing this up.
我猜 microcode 可以修正?