Mianzhi Wang

Ph.D. in Electrical Engineering

Using OpenMP in MATLAB

With properly vectorized code, MATLAB can be blazingly fast. However, in some cases, your algorithm may not be easily vectorized and the MATLAB implementation may perform poorly. Fortunately, you can write your faster implementation in C/C++ and compile then into MEX functions that can be called in MATLAB just like normal MATLAB functions.

Tip: Writing MEX functions is more difficult than writing MATLAB functions. MATLAB's built-in functions are well-optimized in general. Before getting your hands on writing MEX functions, it is always recommended to profile your MATLAB implementation and figure out the performance bottleneck.

In MATLAB, with the help of parallel computing toolbox, you can easily write MATLAB programs that fully utilize multi-core CPUs. It turns out that, with the help of OpenMP[1], you can do the similar in MEX functions. Below is a toy MEX function named mexAdd.cpp that uses OpenMP to add two vectors:

#include "mex.h"
#include "matrix.h"
#include <omp.h>

 * Call signature: C = mexAdd(A, B)
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) 
    // Input validation (omitted)
    // ...
    // ...

    mwSize n1 = mxGetNumberOfElements(prhs[0]);
    mwSize n2 = mxGetNumberOfElements(prhs[1]);
    if (n1 != n2)
        mexErrMsgIdAndTxt("example:add:prhs", "A and B must have the same number of elements.");
    double *A = mxGetPr(prhs[0]);
    double *B = mxGetPr(prhs[1]);
    // Allocate output matrix.
    mxArray *mC = mxCreateDoubleMatrix(n1, 1, mxREAL);
    double *C = mxGetPr(mC);
    // Compute the sum in parallel.
    #pragma omp parallel for default(none)
    for (int i = 0;i < n1;i++)
        C[i] = A[i] + B[i];
    // Return the sum.
    plhs[0] = mC;

When compiling the above source code into a MEX binary, you need to modify the compilation flags to enable OpenMP.

For MSVC on Windows, add /openmp to the compilation flags:

mex -v COMPFLAGS="$COMPFLAGS /openmp" mexAdd.cpp

For GCC on Linux systems, add -fopenmp to both CFLAGS and LDFLAGS:

mex -v CFLAGS='$CFLAGS -fopenmp' -LDFLAGS='$LDFLAGS -fopenmp' mexAdd.cpp

Now you can use mexAdd(a, b) in MATLAB to invoke the compiled MEX function.

Note: According to MSDN, MSVC currently have OpenMP 2.0 support. Therefore, your MEX function will not compile under MSVC if you use higher level features in OpenMP.

Tip: The -v option enables detailed output. If you want to debug your MEX function, add the -g option.

While the OpenMP support in MEX functions works out of the box, you have keep the following rule in mind: do not call MEX API in parallel regions. MEX API is not thread safe. Therefore, if you need to dynamically allocate memory inside a parallel region, do not use mxMalloc or mxCalloc. Instead, use malloc/new and remember to free/delete it[2].

Happy parallelizing your function MEX.

  1. If you are not familiar with OpenMP, you can check the quick introduction by Joel Yliluoma here, or the official hands on tutorial here.

  2. malloc (or new) is generally thread-safe. However, it does come with some overhead in a multi-threading environment.