CNN Architectures#

See here for a short list.

This guide explores the most influential CNN architectures that revolutionized computer vision. We’ll cover their key innovations, architectures, and impacts on the field.

1. LeNet-5 (1998)#

LeNet-5 Architecture

LeNet-5 was pioneered by Yann LeCun and was primarily used for digit recognition (MNIST dataset).

Key Features:#

  • 7 layers (not counting input)

  • Convolutional layers followed by average pooling

  • Approximately 60K parameters

  • First successful application of CNNs

  • Input size: 32×32 pixels

2. AlexNet (2012)#

AlexNet Architecture

AlexNet marked the beginning of the deep learning revolution by winning the 2012 ImageNet competition.

Key Innovations:#

  • First use of ReLU activation function in CNNs

  • Dropout for regularization (0.5)

  • Data augmentation techniques

  • Local Response Normalization (LRN)

  • Training on multiple GPUs

Architecture Details:#

  • 8 layers (5 convolutional, 3 fully connected)

  • 60 million parameters

  • Input size: 227×227×3

  • Max pooling layers

  • Final 1000-way softmax

3. VGG16/VGG19 (2014)#

VGG Architecture

VGG networks demonstrated that depth is crucial for good performance. They used a very uniform architecture with small 3×3 filters throughout.

Key Features:#

  • Uniform architecture with 3×3 convolutions

  • Deep network (16-19 layers)

  • 2×2 max pooling

  • Three fully connected layers at the end

  • 138M parameters (VGG16)

Architecture Pattern:#

  • Multiple 3×3 conv layers + ReLU

  • Max pooling to reduce spatial dimensions

  • Double the number of filters after pooling

  • Final dense layers: 4096 → 4096 → 1000

4. GoogLeNet/Inception (2014)#

Inception Module

GoogLeNet introduced the Inception module, which processes input in parallel through different filter sizes.

Key Innovations:#

  • Inception modules with parallel convolutions

  • 1×1 convolutions for dimension reduction

  • Global average pooling instead of dense layers

  • Auxiliary classifiers during training

  • Only 6.8M parameters (much less than VGG)

Inception Module Structure:#

  1. 1×1 convolutions

  2. 1×1 followed by 3×3 convolutions

  3. 1×1 followed by 5×5 convolutions

  4. 3×3 max pooling followed by 1×1 convolutions

5. ResNet (2015)#

ResNet Block

ResNet solved the degradation problem in very deep networks using skip connections (residual learning).

Key Innovations:#

  • Residual connections (skip connections)

  • Batch normalization after each convolution

  • No fully connected layers except final classifier

  • Deep architectures (up to 152 layers)

  • Identity mappings in deep residual networks

Architecture Details:#

  • Multiple variants (ResNet-18, 34, 50, 101, 152)

  • Bottleneck blocks in deeper networks

  • 25.6 million parameters (ResNet-50)

Comparison Table#

Architecture

Year

Layers

Parameters

Top-1 Accuracy (ImageNet)

Top-5 Accuracy (ImageNet)

Key Innovation

LeNet-5

1998

7

60K

N/A

N/A

First successful CNN

AlexNet

2012

8

60M

63.3%

84.7%

ReLU, Dropout

VGG16

2014

16

138M

71.5%

92.7%

Uniform 3×3 conv

GoogLeNet

2014

22

6.8M

74.8%

93.3%

Inception modules

ResNet-50

2015

50

25.6M

76.0%

96.4%

Residual connections

References#

  1. LeCun, Y., et al. (1998). Gradient-based learning applied to document recognition

  2. Krizhevsky, A., et al. (2012). ImageNet Classification with Deep CNNs

  3. Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks

  4. Szegedy, C., et al. (2015). Going Deeper with Convolutions

  5. He, K., et al. (2016). Deep Residual Learning for Image Recognition </rewritten_file>