A data race happens when to processes want to modify a shared variable concurrently without protecting themselves from the effect of the other process.
Let A a shared variable. Let P1 and P2 two processes that access this variable. Those two processes are making the same following operation: "read A in tmp variable (local to the precess); do tmp = tmp + 1 ; write tmp in A". If the A variable is not protected by a lock, resulting executions could not correspond to what is espected. For example, here is two examples if one do not lock A:
case #1: A=0 P1: read A -> tmp1 (so tmp1 is 0) P2: read A -> tmp2 (so tmp2 is 0) P1: tmp1 = tmp1 + 1 (so tmp1 is 1) P2: tmp2 = tmp2 + 1 (so tmp2 is 1) P1: tmp1 -> write A (so A is 1) P2: tmp2 -> write A (so A is 1)
case #2: A=0 P1: read A -> tmp1 (so tmp1 is 0) P1: tmp1 = tmp1 + 1 (so tmp1 is 1) P1: tmp1 -> write A (so A is 1) P2: read A -> tmp2 (so tmp2 is 1) P2: tmp2 = tmp2 + 1 (so tmp2 is 2) P2: tmp2 -> write A (so A is 2)
To avoid this kind of problem, one uses a lock:
A=0: P1: lock A P1: read A -> tmp1 (so tmp1 is 0) P2: lock A (so P2 is blocked) P1: tmp1 = tmp1 + 1 (so tmp1 is 1) P1: tmp1 -> write A (so A is 1) P1: unlock A (so P2 is unblocked) P2: read A -> tmp2 (so tmp2 is 1) P2: tmp2 = tmp2 + 1 (so tmp2 is 2) P2: tmp2 -> write A (so A is 2) P2: unlock A
This is an inter-blocking that occurs when two processes want to access at shared variables mutually locked. For example, let A and B two locks and P1 and P2 two processes:
P1: lock A P2: lock B P1: lock B (so P1 is blocked by P2) P2: lock A (so P2 is blocked by P1)Process P1 is blocked because it is waiting for the unlocking of B variable by P2. However P2 also needs the A variable to finish its computation and free B. So we have a deadlock.
In this example, the problem is very simple. But imagine what can happen in a 2 millions of lines of code (like the linux kernel) with hundreds of locks. :-)