Bounty: 400
Context
The main benchmarks that computers are measured on are FLOPs, MIPs, and some related ones, which measure the amount of some basic operations that a certain processor can do. It is very clear to me how these benchmarks relate to the processor’s ability to execute realworld algorithms. For example, most scientific and graphics algorithms have a very clear requirement of a certain amount of floating point operations and this is their primary computational cost, so the FLOPs of a GPU contains a lot of information about how fast the GPU will compute those algorithms. (though not complete information, since there are other bottlenecks, such as communication bandwidth limits and efficient scheduling, etc).
The TEPS benchmark
Graph500 is a competition for supercomputers that uses a different benchmark "Traversed Edges Per Second", which is supposed to measure some notion of the communication bandwidth ability of the computer.
I understand the intuitive justification for such a benchmark, since datacommunication is a key bottleneck in many applications.
However, I don’t understand how this benchmark is computed, since there doesn’t seem to be a basic explanation on the site. And I don’t understand how exactly this specific benchmark is supposed to relate to realworld computational problems like machine learning tasks. At a basic level, how does TEPS work, and why?

How is TEPs computed? Is there a clearer explanation somewhere than the one on the Graph500 website? (I couldn’t find it after searching google scholar).

TEPS is somehow measuring the amount of traversed edges in a graph, but what is this graph supposed to be analogous to? e.g. if we compare it to a machine learning task, what would a node be? A single data sample? A single memory location?

What insight does the TEPs give us above just directly reading the communication bandwidth of the computers off of their specification, if we want to predict how well the supercomputer will do on some ML/bigdata task (or some other task)?