Tracking Parallel Simulation Progress in MATLAB
Date: 4-29 2017
Tags: matlab, programming
Running numerical simulations with various parameter settings in MATLAB can take a very long time. Printing the current progress to the command window can help ease our anxious hearts and estimate the remaining time required to finish all simulations (so we can take a break and have a cup of tea). Usually our script will look like this:
for ii = 1:n_param
run_simulation(param(ii));
fprintf('Progress: %.1f\n', ii/n_param);
end
However, the above code does not work with parfor
, the paralleled for loop,
because parfor
does not execute the loops in the iteration order. We will
get seemingly randomly ordered outputs (just like multithreaded programming):
Progress: 60.0%
Progress: 20.0%
Progress: 70.0%
Progress: 10.0%
We may change our code into the following:
n_completed = 0;
parfor ii = 1:n_param
run_simulation(param(ii));
n_completed = n_completed + 1;
fprintf('Progress: %.1f\n', n_completed/n_param);
end
However, this will lead to race conditions
as each MATLAB worker will try to update n_completed
. Actually the MATLAB editor
is smart enough to report an error if we write something like this. We can hack
around this error by storing n_completed
in an external file using file IO
functions. However, race conditions still exist as two MATLAB workers may try to
access the file at the same time.
Instead of using parfor
, we can manually schedule the evaluation of our
functions on MATLAB workers with parfeval
. The first three arguments of
parfeval
specify the parallel pool we will use, the handle of our function,
and the number of outputs. The remaining will be the passed as the input
arguments of our function. Detailed documentation can be accessed via
doc parfeval
. The function parfeval
will immediately return a FevalFuture
object, which can be used to fetch the results via fetchOutputs
or
fetchNext
. The function fetchNext
will block the execution until new
results are available. Therefore we can report the progress upon each return of
parfeval
. With these in mind, we can rewrite our code as:
p = gcp(); % Get the current parallel pool
for ii = 1:n_param
% Assuming our function has only one output
% This line of code will not block the execution.
f(ii) = parfeval(p, @run_simulation, 1, param(ii));
end
results = cell(ii, 1);
for ii = 1:n_param
% Fetch the next available result
% This will block the execution until new results are available or all
% results are fetched.
[idx, value] = fetchNext(f);
% Store the new result. Note that fetchNext also returns the index of the
% newly fetched result. We can utilize this to keep our results in the
% original iteration order.
results{idx} = value;
% Finally, we can report the progress here.
fprintf('Progress: %.1f\n', ii/n_param);
end
In this case, as soon as a worker finishes its job, fetchNext
will return and
the second loop will advance and report the current progress.
Note: unlike
parfor
, Ctrl + C will not cancel all pending jobs immediately. You will need to runcancel(f)
to cancel all unfinished jobs scheduled byparfeval
.
The function parfeval
was introduced in MATLAB 2013b. Therefore the above
method should work on all recent MATLAB versions. Interestingly, MATLAB 2017a
recently introduced a new class named parallel.pool.DataQueue
, making it
possible to send data from workers to client using a data queue.
This opens up lots of new possibilities. Actually, the
official documentation
actually gives a nice example on how to use this class to create a progress bar.
Therefore, with MATLAB 2017a or later, the solution will become:
% Initialized the queue
q = parallel.pool.DataQueue;
% After receiving new data, update_progress() will be called
afterEach(q, @update_progress);
n_completed = 0;
parfor ii = 1:n_param
run_simulation(param(ii));
% Send data to the queue
send(q, ii);
end
% Put this function at a proper location
function update_progress()
n_completed = n_completed + 1;
fprintf('Progress: %.1f\n', n_completed/n_param);
end