This version uses nonblocking operations for both sending and receiving; primarily, this is to handle the buffering issues. In order to increase the efficiency, MPI persistent operations are used.

This is very similar to the simple nonblocking example.