This example shows a common approach for having all processors act on all data, with each processor holding part of the data. Each processor circulates a piece of data to the next processor in line, starting with the data that it has. If this is done using MPI_Isend and MPI_Irecv, with the MPI_Wait calls after the computation, there is the possibility for the communication to overlap with the computation. But this can cause problems for MPI implementations that use Rendezvous with sender push, and which poll for MPI activity.

This example is easier to describe as code; feel free to look at the solution and the log files. To simplify the interpretation, the code outputs the time that communication, computation, and both together take.