Enable vtune support and try to improve cache locality
Enable the VTune JIT support in llvm, and add it as a jit listener. I think it's mostly confirming my suspicion that the slowdown is cache-related... it's not being very helpful with determining why (it's in some function that it can't analyze). I updated the memory allocator to have strong thread-affinity (ie a thread now generally gets back memory that it had previously freed), but that doesn't seem to have any effect. Going to punt on further investigations for now, pretty happy though that there's an overall speedup with the grwl, even if there are still issues.
Showing
Please register or sign in to comment