Optimising to get the best out of your hardware

Getting stuff to run out of cache is sort of like the old sim city game; you can’t force the sims to live there - all you can do is set up the right conditions and they will move in if they like it.

The OS, CPU, and compiler work together and if everything is right cache residency just happens.
Here are some links that should give you a grounding on how the different levels of memory work together and tips on how to get the most speed out of your code.

First things first: How does it work?
What every programmer should know about memory. Parts 1 through 9









What can you do to make the cache work well for big data?

Putting Your Data and Code in Order: Optimization and Memory - Intel style!


Making Caches Work for Graph Analytics

What are the “big boys” doing?
In-Memory Big Data Management and Processing: A Survey
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7097722

An investigation on why things don’t always work the way you think they should. This show the use of good tools to see how your system is working with your code.
https://software.intel.com/en-us/articles/optimize-data-structures-and-memory-access-patterns-to-improve-data-locality

Finally - some tech chat that has a few great pointers on the topic: