use Parallel Loop pattern when performing same independent operation for each element
of a collection or for a fixed number of iterations
steps of a loop are independent if they don't write to memory locations or files
that are read by other steps
similar to
for and
foreach loops except order of execution isn't defined
only guarantee is all iterations will be run
degree of parallelism isn't specified in code
run-time environment executes loop using as many cores as possible
Parallel For Loops
use Parallel.For method to iterate range of interger indices
sequential for loop
int n = ...
for (int i = 0; i < n; i++)
{
...
}
parallel for loop
int n = ...
Parallel.For (0, n, i =>
{
// lambda expression
...
});
Parallel.For is static method with overloads
signature used above is
Parallel.For(int fromInclusive, int toExclusive, Action<int> body);
the body param can be a
lambda expression
instance of a delegate type
an anonymous method using delegate keyword
named method
Parallel ForEach Loops
use Parallel.ForEach method to iterate user-providded values
sequential foreach loop
IEnumerable myEnumerable = ...
foreach (var obj in myEnumerable)
{
...
}
parallel foreach loop
IEnumerable myEnumerable = ...
Parallel.ForEach (myEnumerable, obj =>
{
// lambda expression
...
});
Parallel.ForEach is static method with overloads
signature used above is
Parallel.ForEach(IEnumerable<TSource> source, Action<TSource> body);
the body param can be a
lambda expression
instance of a delegate type
an anonymous method using delegate keyword
named method
Parallel LINQ (PLINQ)
most all LINQ-to-object expressions have parallel counterpart
IEnumerable<MyObject> source = ...
// LINQ
var query1 = from i in source select Normalize(i);
// PLINQ
var query2 = from i in source.AsParallel() select Normalize(i);
Expectations
degree of parallelism depends upon number of available cores
Amdahl's Law predicts a point of diminishing returns
Parallel.For, Parallel.ForEach and PLINQ methods collect exceptions into an
AggregateException
object and rethrown in the context of the calling thread
many variations provide configuration options
Parallel.For has 12 overloads
Parallel.ForEach has 20 overloads
PLINQ has close to 200 extension methods
common examples of dependent loop bodies
-
writing to shared variables
any variable declared outside a loop body is a shared variable
int n = ...;
int total = 0;
for(int i = 1; i < n; i++)
{
total += data[i];
}
shared references to type implicitly allow all members to be shared
use Parallel Aggregation
-
using properties of an object model need to know if properties refer to shared
state or state that is local top the object itself
int n = ...;
for(int i = 1; i < n; i++)
{
// for all values of i someObject[i].Parent could refer to a single shared object
someObject[i].Parent.Update();
}
-
referencing data types that are not thread safe
need to use task-local state in the loop body
-
loop-carried dependence
body performs arithmetic on the loop index
int n = ...;
for(int i = 1; i < n; i++)
{
data[i] = data[i] + data[i - 1];
}
Breaking Out of Loops Early
the need to break a sequential loop is a common occurrence
int n = ...;
for(int i = 1; i < n; i++)
{
bool someCondition = ...;
if (someCondition)
{
break;
}
}
more complicated with parallel loops
multiple steps may be active & steps are not executed in predetermined manner
Parallel break
allows all steps with lower indices to run before breaking the loop
int n = ...;
Parallel.For(0, n, (i, loopState) =>
{
bool someCondition = ...;
if (someCondition)
{
loopState.Break;
return;
}
});
method signature is
Parallel.For(int fromInclusive, int toInclusive, Action<int, ParallelLoopState> body);
calling ParallelLoopState>Break begins orderly shutdown of the loop processing
long-running loop bodies may want to check for break condition by evaluating the
loopstate using
LowestBreakIteration property - returns nullable long int whose HasValue
property is true if loop has been broken
ShouldExitCurrentIteration property - checks for breaks & other stopping
conditions
possible for more than one step to call Break method, lowest index rules
ParallelLoopResult
both Parallel.For & Parallel.ForEach return a ParallelLoopResult object
determine if Break was called by examining values of
IsCompleted &
LowestBreakIteration
properties
, if the IsCompleted value is false and the LowestBreakIteration's
HasValue property is true the loop was broken
specific index where loop was terminated can be found in the LowestBreakIteration's
Value property
int n = ...;
Parallel.For(0, n, (i, loopState) =>
{
bool someCondition = ...;
if (someCondition)
{
loopState.Break;
return;
}
});
if(loopResult.IsCompleted && loopResult.LowestBreakIteration.HasValue)
{
Console.WriteLine("Loop broke at index {0}", loopResult.LowestBreakIteration.Value
}
Parallel stop
terminates loop allowing no new steps to begin
External Loop Cancellation
no Stop method for PLINQ query
overloaded For & ForEach methods
Parallel.For(int fromInclusive, int toExclusive, ParallelOptions parallelOptions, Action<int> body);
use
CancellationTokenSource type to signal cancellation and
CancellationToken
structure to detect & respond to a cancellation request
void doLoop(CancellationTokenSource cts)
{
int n = ...
CancellationToken token = cts.Token;
var options = new ParallelOptions{ CancellationToken = token };
try
{
Parallel.For(0, n, options, (i) =>
{
...
if(token.IsCancellationRequested)
{
...
return;
}
}
}
catch (OperationCanceledException ex)
{
// ... handle loop cancellation
}
}
use
WithCancellation extension method to add external cancellation capabilities
to a PLINQ query
Exception Handling
if an unhandled exception is thrown no new steps are started
by default steps that are running are allowed to complete
after running steps complete the parallel loop will throw an exception in the context
of calling thread
long-running steps can check the ParallelLoopState's IsExceptional property, returns
true if exception is pending
multiple exceptions can be thrown, up to the number of concurrent steps
AggregateException
has
InnerExceptions property contains a collection of all exception thrown
in the loop
exceptions take priority over Break and Stop methods of ParallelLoopState
Special Handling of Small Loop Bodies
if loop body performs only a small amount of work better performance can be achieved
by partitioning iterations into larger units of work
Partitioner object divides indices into non-overlapping intervals named ranges
int n = ...;
double[] result = new double[n];
Parallel.ForEach(Partioner.Create(0, n), (range) =>
{
for(int i = range.Item1; i < range.Item2; i++)
{
result[i] = (double)(i * i);
}
});
Partioner.Create method returns the type Partitioner<Tuple<int, int>>
which implements IEnumerable
each tuple represents index values to be used in loop iterations
number of ranges depends upon number of available cores
default number of ranges is ~3 times number of cores
overload of Create method allows size of ranges to be specified
int n = 1000000;
double[] result = new double[n];
Parallel.ForEach(Partioner.Create(0, n, 50000), (range) =>
{
for(int i = range.Item1; i < range.Item2; i++)
{
result[i] = (double)(i * i);
}
});
in the example each range will span 50000 index values
Controlling the Degree of Parallelism
generally let system manage cores but may want additional control
reduce degree of parallelism to simulate less capable hardware while performance
testing
increase degree of parallelism to a number larger than the number of cores when
iterations wait a lot of time on I/O operations
degree of parallelism can be considered two ways
1 - refers to number of cores used for simultaneous iterations
2 - the number of tasks that can be used simultaneously by the parallel loop, ParallelOptions.MaxDegreeOfParallelism
property is maximum number of worker tasks that will be scheduled at any one time
int n = ...;
var options = new ParallelOptions(){ MaxDegreeOfParallelism = 2 };
Parallel.For(0, n, options, i =>
{
...
});
maximum number of worker threads for PLINQ queries can be set with the
ParallelQuery<T>.WithDegreeOfParallelism
property
IEnumerable<T> myCollection = ...
myCollection.AsParallel().WithDegreesOfParallelism(8).ForAll( obj => ... );
when large degree of parallelism is specified, use ThreadPool type's SetMinThreads
methods so threads are created without delay
Using Task-Level State in a Loop Body
overloaded methods
each parallel loop has its own Random object
double[] result = new double[NumberOfSteps];
// int fromInclusive, int toExclsuive
Parallel.For(0, NumberOfSteps,
// parallel options
new ParallelOptions(),
// Func<TLocal> localInit
() => { return new Random(MakeRandomSeed()); },
// Func<TSource, ParallelLoopState, TLocal, TLocal> body
(i, loopState, random) =>
{
result[i] = random.NextDouble();
return random;
},
// Action<TLocal> localFinally
_ => { });
Using a Custom Task Scheduler for a Parallel Loop
substitute custom task scheduling logic for default
int n = ...
TaskScheduler myTaskScheduler = ...
var options = new ParallelOptions(){ TaskScheduler = myTaskScheduler };
Parallel.For(0, n, options, i =>
{
...
});
signature of method used
Parallel.For(int fromInclusive,
int toExclsuive,
ParallelOptions options,
Action<int> body);
cannot specify custom task scheduler for PLINQ queries
Anti-Patterns
Step Size Other Than One
often indicates a data dependency
negative step size indicates ordering is significant
before converting sequential for loop reverse the order of indices, code running
correctly indicates steps are independent
Hidden Loop Body Dependencies
sharing instance of type that is not thread-safe
shared variables must be protected & synchronized
synchronization can reduce performance
Small Loop Bodies With Few Iterations
overhead dominates benefits
Processor Oversubscription & Undersubscription
oversubscription
more worker threads than cores
generally optimum number of worker threads is number of available cores divided
by average fraction of core utilization per task
higher number incurs overhead of context switching
undersubscription
having too few tasks misses opportunity to effectively use available processor cores
generally most efficient to use built-in load balancing algorithms in .NET framework
Mixing the Parallel Class & PLINQ
can use PLINQ query as source collection for a Parallel.For loop
introduces unnecessary repetion
Duplicates in the Input Enumeration
duplicate objects in an enumeration can lead to race condition
Design Notes
Adaptive Partitioning
useful when size of units of work increases over time
able to meet needs of both small & large input ranges
Adaptive Concurrency
PLINQ uses fixed number of tasks to execute a query, by default creates same number
of tasks as there are logical cores, or uses value passed to WithDegreeOfParallelism
method if specified
Parallel.For & Parallel.ForEach can use variable number of tasks, thread pool
adapts dynamically to changing workloads
Support For Nested Loops & Server Applications
with nested parallel loops each loop uses fewer threads
Parallel class attempts to deal with multiple App Domains in server applications
the same way