Convolutions of tensors#
Colored images usually have \(C\) channels, e.g., \(C = 3\) for an RGB image. In such case an image of height \(H\) and width \(W\) is stored as a tensor of shape \(H\times W \times C\). To perform a convolution of tensors kernels must have the same number of channels. Each entry of the convolved image is equal to the sum of convolutions of individual channels:
Paddings, strides and dilations also work in multichannel case.
Space example#
Download and RGB-image:
import matplotlib.pyplot as plt
from skimage import data
astro = data.astronaut()
print(astro.shape)
plt.imshow(astro)
plt.xticks([]);
plt.yticks([]);
(512, 512, 3)
Convolve with blurring filter of shape \(n\times n \times 3\):
import numpy as np
from scipy.signal import convolve
n = 5
blur_kernel = np.ones((n, n, 3)) / n**2
blurred_astro = convolve(astro, blur_kernel, mode="valid")
blurred_astro.shape
(508, 508, 1)
The output has only one channel:
plt.imshow(blurred_astro, cmap="gray")
plt.xticks([]);
plt.yticks([]);
Multiple output channels#
In the described above procedure the convolved image has \(1\) channel. To increase the number of output channels one needs to add one more dimensionality to the kernel \(\boldsymbol B\): \(\boldsymbol B \in \mathbb R^{h\times w\times C_{\mathrm{in}} \times C_{\mathrm{out}}}\). Now after convolution of an image \(\boldsymbol A \in \mathbb R^{H\times W\times C_{\mathrm{in}}}\) with \(\boldsymbol B\) we obtain a tensor with \(C_{\mathrm{out}}\) channels: