# Mianzhi Wang

Ph.D. in Electrical Engineering

# Teaching an MLP to Estimate DOAs

In this article, we will teach a simple multilayer perceptron (MLA) to estimate direction-of-arrivals (DOAs) from the sample covariance matrices. This is an experiment to demonstrate the capability of a neural network. There are more efficient and robust DOA estimation algorithms such as MUSIC[1] and ESPRIT[2].

We consider a uniform linear array (ULA) whose inter-element spacing is set to half of the carrier wavelength. We consider $K$ far-field narrow-band sources impinging on the array. We adopt the stochastic model[3] and further assume that the sources are uncorrelated. The received snapshot at index $t$ can then be expressed as

$\mathbf{y}(t) = \mathbf{A}(\mathbf{\theta}) \mathbf{s}(t) + \mathbf{n}(t),$
(1)

where both $\mathbf{s}(t)$ and $\mathbf{n}(t)$ are complex circularly-symmetric Gaussian. The steering matrix $\mathbf{A}(\mathbf{\theta})$ is given by $[\mathbf{a}(\theta_1), \cdots, \mathbf{a}(\theta_K)]$, where

$\mathbf{a}(\theta_k) = [1, e^{j\pi\sin\theta_k}, \cdots, e^{j(M-1)\pi\sin\theta_k}]^T.$
(2)

Collecting $N$ such snapshots, we can obtain the estimate of the covariance matrix as

$\hat{\mathbf{R}} = \sum_{t=1}^N \mathbf{y}(t) \mathbf{y}^H(t).$
(3)

We will be using such sample covariance matrices as our input. The generation of such a sample covariance matrix can be implemented with numpy as follows:

def steering_matrix(doas, wavelength, sensor_locations):
"""
Generates a steering matrix.

:param doas: A numpy array of DOAs. Must be a row vector.
:param wavelength: Wavelength.
:param sensor_locations: A numpy array of sensor locations.
:returns: A steering matrix.
"""
return np.exp(1j * 2 * math.pi / wavelength * np.reshape(sensor_locations, (-1, 1)) * np.sin(doas))

def generate_cov_mat(doas, wavelength, sensor_locations, noise_power=1., snr=0., n_snapshots=1):
"""
Generates a sample covariance matrix.

:param doas: A numpy array of DOAs. Must be a row vector.
:param wavelength: Wavelength.
:param sensor_locations: A numpy array of sensor locations.
:param noise_power: Variance of the additive noise.
:param snr: Signal-to-noise ratio in dB.
:param n_snapshots: Number of snapshots.
:returns: A sample covariance matrix.
"""
noise_power = 10**(-snr / 10)
A = steering_matrix(doas, wavelength, sensor_locations)
# Noise
N = 1j * np.random.randn(sensor_locations.size, n_snapshots)
N += np.random.randn(sensor_locations.size, n_snapshots)
N *= math.sqrt(noise_power)
# Source
S = 1j * np.random.randn(doas.size, n_snapshots)
S += np.random.randn(doas.size, n_snapshots)
S *= math.sqrt(0.5)
# Snapshot model
Y = A @ S + N
return (Y @ Y.conj().T) / n_snapshots


Before proceeding further, we need to consider how the input and output are encoded. We first consider the encoding of the input. Because $\hat{\mathbf{R}}$ is a complex matrix, we cannot directly vectorize it and feed it into our neural network. We need to individually extract the real part and the imaginary part. Because $\hat{\mathbf{R}}$ is Hermitian, we can actually pack both the real part and the imaginary part into a real matrix of the same size as $\hat{\mathbf{R}}$. We also need to normalize the input because signal and noise powers can vary a lot. Here we simply normalize the sample covariance matrix with its Frobenius norm[4] before packing it. The above procedure can be easily implemented with numpy:

def process_cov_mat(R):
"""
Convert a sample covariance matrix for the input layer of our
neural network.

:param R: An MxM sample covariance matrix.
:returns: Converted sample covariance matrix. An M^2 x 1 vector.
"""
R /= np.linalg.norm(R, 'fro')
# Because R is Hermitian, we only need the upper triangular part.
# Hence we can combine the real and imaginary part here.
R_comb = np.triu(R.real) + np.tril(R.imag)
# Vectorize
return R_comb.reshape((-1,1))


We will generate sample covariance matrices on the fly as we train our neural network. To do so, we can simply implement a Keras data generator.

Next, we need to consider the encoding of the output. In DOA estimation problems, the number of sources is not fixed and the DOA ranges from $-\pi/2$ to $\pi/2$. This is very different from classification problems or regression problems with the fixed output dimension. Here, we borrow the idea from sparse modeling and discretized the DOA range into a find grid of $L$ points. We use $L$ neurons in the output layer. The $l$-th neuron is responsible for the target at the $l$-th grid point. For instance, if we use a grid size of one degree, there will be 180 grid points: $-89^\circ, -88^\circ, \ldots, 89^\circ, 90^\circ$. If there is a target at $30^\circ$, we want the 120-th neuron to be active.

Note: Here we assume that the targets are on grid. We do not consider the off-grid problem here. In the literature of DOA estimation, we use super-resolution based methods[5] to handle the off-grid targets.

After determining the encoding of the input and the output, we can start building our neural network. For simplicity, we will be using Keras. The model is just a simple MLP:

model = Sequential([
Dense(128, input_shape=(sensor_locations.size**2,)),
Activation('relu'),
Dense(256),
Activation('relu'),
Dense(512),
Activation('relu'),
Dense(n_grid_points),
Activation('sigmoid'),
])


whose summary is given by:

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense_21 (Dense)             (None, 128)               32896
_________________________________________________________________
activation_21 (Activation)   (None, 128)               0
_________________________________________________________________
dense_22 (Dense)             (None, 256)               33024
_________________________________________________________________
activation_22 (Activation)   (None, 256)               0
_________________________________________________________________
dense_23 (Dense)             (None, 512)               131584
_________________________________________________________________
activation_23 (Activation)   (None, 512)               0
_________________________________________________________________
dense_24 (Dense)             (None, 180)               92340
_________________________________________________________________
activation_24 (Activation)   (None, 180)               0
=================================================================
Total params: 289,844
Trainable params: 289,844
Non-trainable params: 0


Now we need to decide the loss function. Here we are doing some kind of regression: we want the output of the neural network to be close to the true DOA spectrum. Therefore, the mean-squared error ($L_2$ loss) seems to be the first choice. However, this will not work because the activations of the output layer are very sparse. The neural network will simply output zeros. Here we combine the binary cross-entropy loss (to encourage the activation of neurons that should activate) and the $L_1$ loss (to encourage sparsity).

Now we are ready to train our model. In my experiment, I use the following settings:

• A 16-element ULA.
• Number of sources varies from 1 to 8.
• Number of snapshots varies from 50 to 1000.
• SNR various from -10dB to 10dB.
• Number of grid points $L$ is set to 180. To avoid large DOA estimation errors in the end-fire regions, discretization is performs within $(-\pi/2.1, \pi/2.1)$.
• The batch size is set to 32 and the epoch number is set to 200.
• Loss function = 0.9 binary cross entropy + 0.1 $L_1$
• Optimizer is Adam with default parameters.

The loss curve is plotted as follows:

Let us randomly place two DOAs and check our model's output:

It looks like the neural network indeed learned to estimate the DOAs🎉 However, we can observe that the outputs are not very robust. Sometimes there are large false peaks. Below is a movie that animates the output of our neural network for three fixed DOAs after each epoch.

We can observe that as the neural network learns, the output spectrum becomes cleaner and cleaner. However, it has some trouble suppressing the false peaks until the end. While our model learned to estimate the DOAs to some extent, its performance is still far below that of the classical DOA estimation algorithms. Nevertheless, this experiment shows that a neural network can indeed learn complex relationships between the input and the output.

You can view the HTML version of the Jupyter notebook here, or download the notebook here.

1. R. Schmidt, "Multiple emitter location and signal parameter estimation," IEEE Transactions on Antennas and Propagation, vol. 34, no. 3, pp. 276–280, Mar. 1986.

2. R. Roy and T. Kailath, "ESPRIT-estimation of signal parameters via rotational invariance techniques," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 37, no. 7, pp. 984–995, Jul. 1989.

3. P. Stoica and A. Nehorai, "Performance study of conditional and unconditional direction-of-arrival estimation," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 38, no. 10, pp. 1783–1795, Oct. 1990.

4. In the literature of DOA estimation, scaling a sample covariance matrix should not affect the DOA estimation performance.

5. Z. Tan, Y. C. Eldar, and A. Nehorai, "Direction of arrival estimation using co-prime arrays: A super resolution viewpoint," IEEE Transactions on Signal Processing, vol. 62, no. 21, pp. 5565–5576, Nov. 2014.