Mianzhi Wang

Ph.D. in Electrical Engineering

Vectorization, Kronecker Product, and Khatri-Rao Product

In array and radar signal processing, especially when co-array models are concerned, one may frequently encounter the vectorization operation, the Kronecker product, and the Khatri-Rao product. This article will give a brief review of these three operations and their commonly used properties.


Given an M×NM \times N matrix A=[a1,a2,,aN]\mathbf{A} = [\mathbf{a}_1, \mathbf{a}_2, \ldots, \mathbf{a}_N], where ai\mathbf{a}_i is the ii-th column of AA. The vectorization of A\mathbf{A} is defined as

vec(A)=[a1T,a2T,,aNT]T.\mathrm{vec}(\mathbf{A}) = [\mathbf{a}_1^T, \mathbf{a}_2^T, \ldots, \mathbf{a}_N^T]^T.

Basically, the vectorization operation rearranges the elements of A\mathbf{A} into a long vector by stacking all the columns together. In a general signal processing scenario, we may have NN observations, x1\mathbf{x}_1, x2\mathbf{x}_2, ..., xn\mathbf{x}_n. We can either arrange them into a matrix X=[x1,x2,,xN]\mathbf{X} = [\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_N], or a long vector vec(X)\mathrm{vec}(\mathbf{X}), depending on the problem structure. In some cases, vectorizing a matrix may be helpful. However, the additional information given by the matrix structure (e.g., low-rankness) will be lost.

The vectorization operation itself does not have many interesting properties. One of the useful ones is related to trace:

tr(AB)=vec(AT)Tvec(B)=vec(AH)Hvec(B).\mathrm{tr}(\mathbf{A}\mathbf{B}) = \mathrm{vec}(\mathbf{A}^T)^T \mathrm{vec}(\mathbf{B}) = \mathrm{vec}(\mathbf{A}^H)^H \mathrm{vec}(\mathbf{B}).

As we will show later, the vectorization operation shows more interesting properties when combined with the Kronecker product and the Khatri-Rao product.

Kronecker Product

Given an M×NM \times N matrix AA and a P×QP \times Q matrix B, the Kronecker product AB\mathbf{A}\otimes\mathbf{B} is defined as

AB=[a11Ba12Ba1NBa21Ba22Ba2NBaM1BaM2BaMNB],\mathbf{A}\otimes\mathbf{B} = \begin{bmatrix} a_{11}\mathbf{B} & a_{12}\mathbf{B} & \cdots & a_{1N}\mathbf{B}\\ a_{21}\mathbf{B} & a_{22}\mathbf{B} & \cdots & a_{2N}\mathbf{B}\\ \vdots & \vdots & \ddots & \vdots \\ a_{M1}\mathbf{B} & a_{M2}\mathbf{B} & \cdots & a_{MN}\mathbf{B} \end{bmatrix},

which is an MP×NQMP \times NQ matrix.

The Kronecker product has many interesting properties. First, it is distributive and associative:

  • Distributivity: (a) (A+B)C=AC+BC(\mathbf{A} + \mathbf{B}) \otimes \mathbf{C} = \mathbf{A} \otimes \mathbf{C} + \mathbf{B} \otimes \mathbf{C}; (b) A(B+C)=AB+AC\mathbf{A} \otimes (\mathbf{B} + \mathbf{C}) = \mathbf{A} \otimes \mathbf{B} + \mathbf{A} \otimes \mathbf{C}.
  • Associativity: (AB)C=A(BC)(\mathbf{A} \otimes \mathbf{B}) \otimes \mathbf{C} = \mathbf{A} \otimes (\mathbf{B} \otimes \mathbf{C}).

The Kronecker product is also "distributive" with respect to the (Hermitian) transpose, trace, and Frobenius norm:

  • Transpose: (AB)T=ATBT(\mathbf{A} \otimes \mathbf{B})^T = \mathbf{A}^T \otimes \mathbf{B}^T.
  • Hermitian transpose: (AB)H=AHBH(\mathbf{A} \otimes \mathbf{B})^H = \mathbf{A}^H \otimes \mathbf{B}^H.
  • Trace: tr(AB)=tr(A)tr(B)\mathrm{tr}(\mathbf{A} \otimes \mathbf{B}) = \mathrm{tr}(\mathbf{A}) \mathrm{tr}(\mathbf{B}).
  • Frobenius norm: ABF=AFBF\|\mathbf{A} \otimes \mathbf{B}\|_F = \|\mathbf{A}\|_F \|\mathbf{B}\|_F.

One of the most important and useful properties of the Kronecker product is the product rule:

Proposition 1. Let A\mathbf{A}, B\mathbf{B}, C\mathbf{C}, D\mathbf{D} be M×NM \times N, P×QP \times Q, N×KN \times K, and Q×LQ \times L, respectively, then

(AB)(CD)=(AC)(BD).(\mathbf{A} \otimes \mathbf{B})(\mathbf{C} \otimes \mathbf{D}) = (\mathbf{A}\mathbf{C}) \otimes (\mathbf{B}\mathbf{D}).

Proof. By the definition of the Kronecker product,

(AB)(CD)=[a11Ba12Ba1NBa21Ba22Ba2NBaM1BaM2BaMNB][c11Dc12Dc1KDc21Dc22Dc2KDcN1DcN2DcNKD]=[(i=1Na1ici1)BD(i=1Na1ici2)BD(i=1Na1iciK)BD(i=1Na2ici1)BD(i=1Na2ici2)BD(i=1Na2iciK)BD(i=1NaMici1)BD(i=1NaMici2)BD(i=1NaMiciK)BD]=(AC)(BD),\begin{aligned} &(\mathbf{A} \otimes \mathbf{B})(\mathbf{C} \otimes \mathbf{D})\\ =&\begin{bmatrix} a_{11}\mathbf{B} & a_{12}\mathbf{B} & \cdots & a_{1N}\mathbf{B}\\ a_{21}\mathbf{B} & a_{22}\mathbf{B} & \cdots & a_{2N}\mathbf{B}\\ \vdots & \vdots & \ddots & \vdots \\ a_{M1}\mathbf{B} & a_{M2}\mathbf{B} & \cdots & a_{MN}\mathbf{B} \end{bmatrix} \begin{bmatrix} c_{11}\mathbf{D} & c_{12}\mathbf{D} & \cdots & c_{1K}\mathbf{D}\\ c_{21}\mathbf{D} & c_{22}\mathbf{D} & \cdots & c_{2K}\mathbf{D}\\ \vdots & \vdots & \ddots & \vdots \\ c_{N1}\mathbf{D} & c_{N2}\mathbf{D} & \cdots & c_{NK}\mathbf{D} \end{bmatrix}\\ =& \begin{bmatrix} (\sum_{i=1}^N a_{1i}c_{i1})\mathbf{B}\mathbf{D} & (\sum_{i=1}^N a_{1i}c_{i2})\mathbf{B}\mathbf{D} & \cdots & (\sum_{i=1}^N a_{1i}c_{iK})\mathbf{B}\mathbf{D} \\ (\sum_{i=1}^N a_{2i}c_{i1})\mathbf{B}\mathbf{D} & (\sum_{i=1}^N a_{2i}c_{i2})\mathbf{B}\mathbf{D} & \cdots & (\sum_{i=1}^N a_{2i}c_{iK})\mathbf{B}\mathbf{D} \\ \vdots & \vdots & \ddots & \vdots \\ (\sum_{i=1}^N a_{Mi}c_{i1})\mathbf{B}\mathbf{D} & (\sum_{i=1}^N a_{Mi}c_{i2})\mathbf{B}\mathbf{D} & \cdots & (\sum_{i=1}^N a_{Mi}c_{iK})\mathbf{B}\mathbf{D} \\ \end{bmatrix}\\ =& (\mathbf{A}\mathbf{C}) \otimes (\mathbf{B}\mathbf{D}), \end{aligned}

which completes the proof. ∎

With the product rule, one can show that the following properties also hold:

  • Inverse: (AB)1=A1B1(\mathbf{A} \otimes \mathbf{B})^{-1} = \mathbf{A}^{-1} \otimes \mathbf{B}^{-1}.
  • Rank: rank(AB)=rank(A)rank(B)\mathrm{rank}(\mathbf{A} \otimes \mathbf{B}) = \mathrm{rank}(\mathbf{A}) \mathrm{rank}(\mathbf{B}).
  • Determinant: Let A\mathbf{A} and B\mathbf{B} be M×MM \times M and N×NN \times N, respectively. Then det(AB)=det(A)Ndet(B)M\det(\mathbf{A} \otimes \mathbf{B}) = \det(\mathbf{A})^N \det(\mathbf{B})^M.
  • Positive-definiteness: If both A\mathbf{A} and B\mathbf{B} are positive-definite, then AB\mathbf{A} \otimes \mathbf{B} is also positive-definite.

Remark 1. The proof for the inversion one is pretty straight-forward because

(A1B1)(AB)=(A1A)(B1B)=I,(\mathbf{A}^{-1} \otimes \mathbf{B}^{-1})(\mathbf{A} \otimes \mathbf{B}) = (\mathbf{A}^{-1}\mathbf{A}) \otimes (\mathbf{B}^{-1}\mathbf{B}) = \mathbf{I},

and the other direction also holds. The one about rank can be shown using singular-value decomposition, and the one about positive-definiteness can be shown with eigen-decomposition. The one about determinant is tricker. The key idea is using the following equality: AB=(AI)(IB)\mathbf{A} \otimes \mathbf{B} = (\mathbf{A} \otimes \mathbf{I})(\mathbf{I} \otimes \mathbf{B}).

For a complete review of the properties of the Kronecker product, the readers are directed to the wiki page, Kathrin Schäcke's On the Kronecker Product, or Chapter 11 in A matrix handbook for statisticians[1]. Readers pursuing a more abstract understanding may also check out the tensor product.

Khatri-Rao Product

The Khatri-Rao product is usually defined as the column-wise Kronecker product. In other words, given an M×NM \times N matrix A\mathbf{A} and a P×NP \times N matrix B\mathbf{B}, the Khatri-Rao product is defined as

AB=[a1b1a2b2aNbN],\mathbf{A}\odot\mathbf{B} = \begin{bmatrix} \mathbf{a}_1 \otimes \mathbf{b}_1 & \mathbf{a}_2 \otimes \mathbf{b}_2 & \cdots \mathbf{a}_N \otimes \mathbf{b}_N \end{bmatrix},

which is an MP×NMP \times N matrix. The Khatri-Rao product appears frequently in the difference co-array model (e.g., for co-prime and nested arrays[2]) or sum-coarray model (e.g., in MIMO radar[3][4]). Although the definition of the Khatri-Rao product is based on the Kronecker product, the Khatri-Rao product does not have many nice properties.

A Property That Connects the Three

A handy property connecting the vectorization, the Kronecker product, and the Khatri-Rao product is given below.

Proposition 2. Let A\mathbf{A}, X\mathbf{X}, B\mathbf{B} be M×NM \times N, N×PN \times P, P×QP \times Q, respectively. Then

vec(AXB)=(BTA)vec(X).\mathrm{vec}(\mathbf{A}\mathbf{X}\mathbf{B}) = (\mathbf{B}^T \otimes \mathbf{A}) \mathrm{vec}(\mathbf{X}).

Moreover, if X\mathbf{X} is a diagonal matrix, then

vec(AXB)=(BTA)diag(X),\mathrm{vec}(\mathbf{A}\mathbf{X}\mathbf{B}) = (\mathbf{B}^T \odot \mathbf{A}) \mathrm{diag}(\mathbf{X}),

where diag(X)\mathrm{diag}(\mathbf{X}) is a vector representing the main diagonal of X\mathbf{X}.

Proof. Let Bi\mathbf{B}_i denote the ii-th row of B\mathbf{B}.

vec(AXB)=i=1Nj=1Pxijvec(aiBj)=i=1Nj=1Pxij(BjTai)=j=1P(BjTA)xj=(BTA)vec(X)\begin{aligned} \mathrm{vec}(\mathbf{A}\mathbf{X}\mathbf{B}) =& \sum_{i=1}^N \sum_{j=1}^P x_{ij} \mathrm{vec}(\mathbf{a}_i \mathbf{B}_j)\\ =& \sum_{i=1}^N \sum_{j=1}^P x_{ij} (\mathbf{B}_j^T \otimes \mathbf{a}_i)\\ =& \sum_{j=1}^P (\mathbf{B}_j^T \otimes \mathbf{A})\mathbf{x}_j\\ =& (\mathbf{B}^T \otimes \mathbf{A}) \mathrm{vec}(\mathbf{X}) \end{aligned}

The proof for the second equality follows the same idea. ∎

Here are some examples where Proposition 2 is used.

Example 1. Consider the following optimization problem with a matrix variable:

minXAXBF2.\min_{\mathbf{X}} \|\mathbf{A} \mathbf{X} - \mathbf{B}\|_F^2.

Apply Proposition 2, we can rewrite the above optimization problem as

minvec(X)(IA)vec(X)vec(B)22,\min_{\mathrm{vec}(\mathbf{X})} \| (\mathbf{I} \otimes \mathbf{A}) \mathrm{vec}(\mathbf{X}) - \mathrm{vec}(\mathbf{B}) \|_2^2,

which is indeed a least squares problem. Note that the original formulation is more compact.

Example 2. Consider the DOA estimation problem using a linear array. Adapting the unconditional model[5] with uncorrelated sources, the covariance matrix of the received snapshots is given by

R=APAH+σ2I,\mathbf{R} = \mathbf{A}\mathbf{P}\mathbf{A}^H + \sigma^2\mathbf{I},

where A\mathbf{A} is the array steering matrix, P\mathbf{P} is a diagonal matrix whose main diagonal represents source powers, and σ2\sigma^2 is the power of additive noises. Vectorizing R\mathbf{R} leads to

vec(R)=(AA)p+σ2vec(I),\mathrm{vec}(\mathbf{R}) = (\mathbf{A}^* \odot \mathbf{A}) \mathbf{p} + \sigma^2\mathrm{vec}(\mathbf{I}),

where p=diag(P)\mathbf{p} = \mathrm{diag}(\mathbf{P}). We can observe that (11) resembles a snapshot from a virtual array whose steering matrix is given by (AA)(\mathbf{A}^* \odot \mathbf{A}). This idea is exploited to obtain enhanced degrees of freedom[2][6].

Proposition 2 also leads to the following interesting corollary:

Corollary 1. Let A\mathbf{A}, B\mathbf{B}, C\mathbf{C}, D\mathbf{D} be M×NM \times N, N×PN \times P, P×QP \times Q, and Q×RQ \times R, respectively. Then

tr(ABCD)=vec(AT)T(DTB)vec(C).\mathrm{tr}(\mathbf{A}\mathbf{B}\mathbf{C}\mathbf{D}) = \mathrm{vec}(\mathbf{A}^T)^T (\mathbf{D}^T \otimes \mathbf{B}) \mathrm{vec}(\mathbf{C}).

Proof. Immediately obtained by combining Proposition 2 and the fact that tr(AB)=vec(AT)Tvec(B)\mathrm{tr}(\mathbf{A}\mathbf{B}) = \mathrm{vec}(\mathbf{A}^T)^T \mathrm{vec}(\mathbf{B}). ∎

Corollary 1 can be quite useful in the derivation of the Cramér-Rao bound (CRB) for Gaussian models, as shown in the following example.

Example 3. Let x\mathbf{x} follow the complex circularly-symmetric Gaussian distribution CN(0,R(θ))\mathcal{CN}(\mathbf{0}, \mathbf{R}(\mathbf{\theta})), where θRK\mathbf{\theta} \in \mathbb{R}^K denotes the parameters to be estimated. The (i,j)(i,j)-th element of the Fisher information matrix (FIM) is then given by[7]

FIMij=tr(RθiR1RθjR1).\mathrm{FIM}_{ij} = \mathrm{tr}\left(\frac{\partial\mathbf{R}}{\partial\theta_i}\mathbf{R}^{-1} \frac{\partial\mathbf{R}}{\partial\theta_j}\mathbf{R}^{-1}\right).

In many cases, evaluating each element according the above formula can be quite tedious. Let r=vec(R)\mathbf{r} = \mathrm{vec}(\mathbf{R}). Using Corollary 1 and the fact the R\mathbf{R} is Hermitian, we can rewrite the above formula as

FIMij=[rθi]H(RTR)1rθj.\mathrm{FIM}_{ij} = \left[\frac{\partial \mathbf{r}}{\partial\theta_i}\right]^H (\mathbf{R}^T \otimes \mathbf{R})^{-1} \frac{\partial \mathbf{r}}{\partial\theta_j}.


rθ=[rθ1 rθ2  rθK].\frac{\partial \mathbf{r}}{\partial\mathbf{\theta}} = \left[ \frac{\partial \mathbf{r}}{\partial\theta_1}\ \frac{\partial \mathbf{r}}{\partial\theta_2}\ \cdots\ \frac{\partial \mathbf{r}}{\partial\theta_K} \right].

We immediate obtain a compact representation of the FIM:

FIM=[rθ]H(RTR)1rθ.\mathrm{FIM} = \left[\frac{\partial \mathbf{r}}{\partial\mathbf{\theta}}\right]^H (\mathbf{R}^T \otimes \mathbf{R})^{-1} \frac{\partial \mathbf{r}}{\partial\mathbf{\theta}}.

In some cases, this may simplify the derivation of the corresponding CRB.

  1. G. A. F. Seber, A matrix handbook for statisticians. Hoboken, N.J.: Wiley-Interscience, 2008.

  2. M. Wang and A. Nehorai, "Coarrays, MUSIC, and the Cramér-Rao bound," IEEE Transactions on Signal Processing, vol. 65, no. 4, pp. 933–946, Feb. 2017.

  3. J. Li and P. Stoica, "MIMO radar with colocated antennas," IEEE Signal Processing Magazine, vol. 24, no. 5, pp. 106–114, Sep. 2007.

  4. C. C. Weng and P. P. Vaidyanathan, "Nonuniform sparse array design for active sensing," in 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), 2011, pp. 1062–1066.

  5. P. Stoica and A. Nehorai, "Performance study of conditional and unconditional direction-of-arrival estimation," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 38, no. 10, pp. 1783–1795, Oct. 1990.

  6. P. Pal and P. P. Vaidyanathan, “Nested arrays: a novel approach to array processing with enhanced degrees of freedom,” IEEE Transactions on Signal Processing, vol. 58, no. 8, pp. 4167–4181, Aug. 2010.

  7. S. M. Kay, Fundamentals of statistical signal processing. Englewood Cliffs, N.J: Prentice-Hall PTR, 1993.