Convolution of matrices#
1-d convolution#
Convolution is a widespread operation in mathematics in different contexts.
Convolution of sequences#
The convolution of sequences \((a_n)\) and \((b_n)\) is the sequence \(a * b\) such that
This formula assumes that the sequences \((a_n)\) and \((b_n)\) continue indefinitely to both sides. To calculate the convolution of finite sequences
they are padded by zeros:
\(a_k = 0\) if \(k < 0\) or \(k \geqslant \ell\);
\(b_k = 0\) if \(k < 0\) or \(k \geqslant m\).
Exercise
Let \((a_n) = (1, 2, 3)\), \((b_n) = (4, 5)\). Calculate their convolution.
Question
How many nonzero elements are there in (40) for these finite sequences?
In practice there are several reasonable strategies how to caclulate convolutions.
full
: find \((a * b)_n\) for all \(n\) such that\[ a_0, \ldots, a_{\ell-1} \text{ and } \quad b_{n-m+1}, \ldots, b_{n} \]have at least one nonzero intersection, the shape of \((a * b)_n\) is \(\ell + m - 1\);
same
: the convolution has the same shape as the largest input sequence, the shape of \(c_n\) is \(\max\{\ell, m\}\);valid
: the convolution product is only given for points where the sequences overlap completely, the shape of \(c_n\) is \(\max\{\ell, m\} - \min\{\ell, m\} + 1\).
Convolution in NumPy#
Use np.convolve
:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5])
print("Full:", np.convolve(a, b, mode='full'))
print("Same:", np.convolve(a, b, mode='same'))
print("Valid:", np.convolve(a, b, mode='valid'))
Full: [ 4 13 22 15]
Same: [ 4 13 22]
Valid: [13 22]
Cross correlation#
This is a very similar operation to convolution. The cross correlation of sequences \((a_n)\) and \((b_n)\) is a sequence \((c_n)\) such that
np.correlate
calculates the cross correlation of two arrays. The meaning of the parameters full
, same
, valid
is analogous to those of np.convolve
:
print("Full:", np.correlate(a, b, mode='full'))
print("Same:", np.correlate(a, b, mode='same'))
print("Valid:", np.correlate(a, b, mode='valid'))
Full: [ 5 14 23 12]
Same: [ 5 14 23]
Valid: [14 23]
Question
How to express cross correlation it terms of convolution?
Answer
Denote \((x^-_n)\) the flipped sequence: \(x^-_n = x_{-n}\). Now \(c_n\) from (41) is
Alternatively, we can write it as
Thus, the cross correlation can be represented as a convolution of two initial sequences exactly one of which is flipped.
print("Convolution:", np.convolve(a, b, mode='full'))
print("Correlation, flip first:", np.correlate(a[::-1], b, mode='full')[::-1])
print("Correlation, flip second:", np.correlate(a, b[::-1], mode='full'))
Convolution: [ 4 13 22 15]
Correlation, flip first: [ 4 13 22 15]
Correlation, flip second: [ 4 13 22 15]
2-d convolution#
Two-dimensinonal convolutions are similar to convolutions of sequences. For two matrices \(\boldsymbol A\) and \(\boldsymbol B\) their convolution is defined a
Once again matrices are padded by zeros outside their shape.
Exercise
Let
Calculate \(\boldsymbol A * \boldsymbol B\) in all three modes: full
, same
, valid
.
Now check these results via scipy.signal.convolve2d
:
Show code cell content
from scipy.signal import convolve2d
A = np.arange(12).reshape(3, 4)
B = np.array([[2, 0], [1, -1]])
print("Full:")
print(convolve2d(A, B, mode='full'))
print("Same:")
print(convolve2d(A, B, mode='same'))
print("Valid:")
print(convolve2d(A, B, mode='valid'))
Full:
[[ 0 2 4 6 0]
[ 8 11 13 15 -3]
[ 20 19 21 23 -7]
[ 8 1 1 1 -11]]
Same:
[[ 0 2 4 6]
[ 8 11 13 15]
[20 19 21 23]]
Valid:
[[11 13 15]
[19 21 23]]
What about pyTorch? Note that to proceed in the full
mode we can pad \(\boldsymbol A\) by one zero from each side.
Show code cell content
import torch
from torch import nn
conv = nn.Conv2d(1, 1, 3, bias=False, padding=1)
conv.weight = nn.parameter.Parameter(torch.FloatTensor(B[None, None, :, :]))
print("Full:")
print(conv(torch.FloatTensor(A[None, None, :, :])).detach().numpy().squeeze())
conv = nn.Conv2d(1, 1, 3, bias=False, padding='same')
conv.weight = nn.parameter.Parameter(torch.FloatTensor(B[None, None, :, :]))
print("Same:")
print(conv(torch.FloatTensor(A[None, None, :, :])).detach().numpy().squeeze())
conv = nn.Conv2d(1, 1, 3, bias=False, padding='valid')
conv.weight = nn.parameter.Parameter(torch.FloatTensor(B[None, None, :, :]))
print("Valid:")
print(conv(torch.FloatTensor(A[None, None, :, :])).detach().numpy().squeeze())
Full:
[[ 0. -1. -1. -1. 3.]
[-4. -1. 1. 3. 13.]
[-8. 7. 9. 11. 25.]
[ 0. 16. 18. 20. 22.]]
Same:
[[-1. 1. 3. 13.]
[ 7. 9. 11. 25.]
[16. 18. 20. 22.]]
Valid:
[[-1. 1. 3.]
[ 7. 9. 11.]]
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/nn/modules/conv.py:459: UserWarning: Using padding='same' with even kernel lengths and odd dilation may require a zero-padded copy of the input be created (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/Convolution.cpp:1004.)
return F.conv2d(input, weight, bias, self.stride,
Warning
The outputs of scipy.signal.convolve2d
and torch.nn.conv2d
look different. What is the reason of it?
Answer
Pytorch calculates not convolution but cross correlation:
Image convolution#
Grayscale images of size \(H\times W\) are stored as matrices of shape \(H\times W\). Here is an example:
import numpy as np
import matplotlib.pyplot as plt
import requests
import io
from PIL import Image
%config InlineBackend.figure_format = 'svg'
r = requests.get('https://github.com/probml/probml-data/blob/main/data/clown.png?raw=true', stream=True)
img = Image.open(io.BytesIO(r.content))
X = np.array(img) / 255
plt.xticks([])
plt.yticks([])
plt.imshow(X, cmap="gray");
The shape is \(H\times W = 200\times 320\):
X.shape
(200, 320)
Convolving with kernel \(K = \frac 1{n^2} \boldsymbol 1_{n\times n}\) averages the values of each pixel with its neigbours, so the image becomes blurred:
def blur_image(X, n):
kernel = np.ones((n, n)) / n**2
blurred_image = convolve2d(X, kernel, mode="same")
plt.xticks([])
plt.yticks([])
plt.imshow(blurred_image, cmap="gray");
\(n=3\):
blur_image(X, 3)
\(n = 8\):
blur_image(X, 8)
Now what if we convolve the image with kernel
Show code cell content
kernel = -np.ones((3, 3))
kernel[1, 1] = 8
blurred_image = convolve2d(X, kernel, mode="same")
plt.xticks([])
plt.yticks([])
plt.imshow(blurred_image, cmap="gray");
Stride#
Sometimes convolution are used with stride greater than one. To calculate the next element of the convolution the filter is moved by stride
units. Here is a gif of a convolution with stride equal to \(2\):
Exercise
Calculate both convolution and cross correlation for the same matrices
with stride \(2\) in mode full
.
Check with pyTorch:
Show code cell content
conv = nn.Conv2d(1, 1, 3, bias=False, padding=1, stride=2)
conv.weight = nn.parameter.Parameter(torch.FloatTensor(B[None, None, :, :]))
print(conv(torch.FloatTensor(A[None, None, :, :])).detach().numpy().squeeze())
[[ 0. -1. 3.]
[-8. 9. 25.]]
Dilation#
The kernel can be multiplied not only to contiguous pieces of input, they can be separated by several cells. This is regulated by the dilation parameter. Here is a gif of a convolution with dilation equal to \(2\):
Exercise
Calculate both convolution and cross correlation for the same matrices
with dilation \(2\) in mode same
.
Check with pyTorch:
conv = nn.Conv2d(1, 1, 3, bias=False, dilation=2, padding='same')
conv.weight = nn.parameter.Parameter(torch.FloatTensor(B[None, None, :, :]))
print(conv(torch.FloatTensor(A[None, None, :, :])).detach().numpy().squeeze())
[[-5. -2. -2. 6.]
[-9. -2. 0. 14.]
[ 0. 8. 10. 12.]]
Output shape formula#
Convolutions have a lot of parameters. How do they affect the shape of the convolved image?
Suppose that
input image has shape \(H\times W\);
kernel has shape \(h\times w\);
the imput image is padded by \(p\) zeros from each side (\(p \geqslant 0\));
stride is \(s\) (by default \(s=1\));
dilation is \(d\) (by default \(d=1\)).
Then the height of the convolved image is
and the width is
For example, if \(s=d=1\), \(h=w = 2k+1\), \(p=k\), then
This is how same
mode is achieved.