Pooling#

Pooling layers are commonly used in Convolutional Neural Networks (CNNs) for several reasons.

  1. Downsampling and Dimensionality Reduction

    Pooling layers help reduce the spatial dimensions (width and height) of the input volume. By downsampling the feature maps, pooling layers decrease the computational load in the subsequent layers of the network.

  2. Translation Invariance

    Pooling layers provide a certain degree of translation invariance, making the network less sensitive to the exact position of features in the input. This is particularly useful in tasks like image classification, where the location of an object within an image may vary.

  3. Local Feature Extraction

    Pooling focuses on local features and helps retain the most essential information from a region. By selecting the maximum or average value from a pool of neighboring pixels, pooling layers capture the dominant features in a local region.

The pooling operation has the same parameters as convolution:

  • kernel_size

  • padding

  • stride

  • dilation

Unlike for convolution, the stride of a poolin kernel is usually equal to the kernel size (default value).

Max pooling#

Max pooling operation returns the maximum value of each window.

https://miro.medium.com/v2/resize:fit:1400/1*WvHC5bKyrHa7Wm3ca-pXtg.gif

Exercise

What will the output of the max pooling layer in the previous example if stride is \(1\)?

Average pooling#

A very similar operation to max pooling, calculates mean instead of maximum.

https://www.researchgate.net/publication/349921480/figure/fig2/AS:999677281460235@1615353045589/Max-pooling-and-average-pooling.png

Global average pooling#

https://www.guidetomlandai.com/assets/img/machine_learning/global_average_pooling.png