By: P.E. McKenney
Published in: PLoPD2
Pages: 501-535
Category: Parallel Programming
Locking designs for parallel programs.
Url: http://c2.com/ppr/mutex/mutexpat.html
Pages: 510-511
To eliminate the complexity of parallelization, construct an entirely sequential program. Eliminate all synchronization primitives and the overhead and complexity associated with them.
Pages: 511-513
Use code locking when a small amount of execution time is spent in critical sections or when you only need modest scaling. This is the simplest locking design.
Pages: 511-513
To get better speedups than straightforward parallelizations such as Code Locking, partition data structures so that each portion can be processed in parallel. Each portion has its own independent critical section.
Pages: 515-519
Each CPU or thread should own its own data and shouldn't need locking primitives to access it, but must use a communication mechanism to access another CPU's or thread's data.
Pages: 520-521
To get speedups in programs that can't use aggressive locking patterns throughout, use an aggressive locking pattern for most of the workload (the fastpath) and a more conservative approach for the rest.
Pages: 521-523
To speedup programs that rarely modify shared data, allow readers to proceed in parallel, but require writers to exclude readers and other writers.
Pages: 523-524
To speedup programs when updates are complex and expensive, but infrequent operations such as insertion and deletion require coarse-grained locking, partition data structures into coarse- and fine-grained portions. For example, use a single lock for the internal nodes and links of a search tree, but use a separate lock for each of the leaves.
Pages: 524-527
To get speedups in global memory allocators, create a per-CPU or per-process cache of data structure instances. A CPU owns the instances in its cache, so it won't incur overhead and contention penalties to allocate and free them.
Pages: 527-529
To get speedups in programs with frequent, small critical sections on machines with high synchronization overheads, combine small critical sections into larger ones.
Pages: 529-530
To get speedups in programs with infrequent, large critical sections on machines with low synchronization overheads, split large critical sections into smaller ones.