CNN Architectures

CNN Architectures#

See here for a short list.

This guide explores the most influential CNN architectures that revolutionized computer vision. We’ll cover their key innovations, architectures, and impacts on the field.

1. LeNet-5 (1998)#

LeNet-5 Architecture

LeNet-5 was pioneered by Yann LeCun and was primarily used for digit recognition (MNIST dataset).

Key Features:#

7 layers (not counting input)
Convolutional layers followed by average pooling
Approximately 60K parameters
First successful application of CNNs
Input size: 32×32 pixels

2. AlexNet (2012)#

AlexNet Architecture

AlexNet marked the beginning of the deep learning revolution by winning the 2012 ImageNet competition.

Key Innovations:#

First use of ReLU activation function in CNNs
Dropout for regularization (0.5)
Data augmentation techniques
Local Response Normalization (LRN)
Training on multiple GPUs

Architecture Details:#

8 layers (5 convolutional, 3 fully connected)
60 million parameters
Input size: 227×227×3
Max pooling layers
Final 1000-way softmax

3. VGG16/VGG19 (2014)#

VGG Architecture

VGG networks demonstrated that depth is crucial for good performance. They used a very uniform architecture with small 3×3 filters throughout.

Key Features:#

Uniform architecture with 3×3 convolutions
Deep network (16-19 layers)
2×2 max pooling
Three fully connected layers at the end
138M parameters (VGG16)

Architecture Pattern:#

Multiple 3×3 conv layers + ReLU
Max pooling to reduce spatial dimensions
Double the number of filters after pooling
Final dense layers: 4096 → 4096 → 1000

4. GoogLeNet/Inception (2014)#

Inception Module

GoogLeNet introduced the Inception module, which processes input in parallel through different filter sizes.

Key Innovations:#

Inception modules with parallel convolutions
1×1 convolutions for dimension reduction
Global average pooling instead of dense layers
Auxiliary classifiers during training
Only 6.8M parameters (much less than VGG)

Inception Module Structure:#

1×1 convolutions
1×1 followed by 3×3 convolutions
1×1 followed by 5×5 convolutions
3×3 max pooling followed by 1×1 convolutions

5. ResNet (2015)#

ResNet Block

ResNet solved the degradation problem in very deep networks using skip connections (residual learning).

Key Innovations:#

Residual connections (skip connections)
Batch normalization after each convolution
No fully connected layers except final classifier
Deep architectures (up to 152 layers)
Identity mappings in deep residual networks

Architecture Details:#

Multiple variants (ResNet-18, 34, 50, 101, 152)
Bottleneck blocks in deeper networks
25.6 million parameters (ResNet-50)

Comparison Table#

Architecture	Year	Layers	Parameters	Top-1 Accuracy (ImageNet)	Top-5 Accuracy (ImageNet)	Key Innovation
LeNet-5	1998	7	60K	N/A	N/A	First successful CNN
AlexNet	2012	8	60M	63.3%	84.7%	ReLU, Dropout
VGG16	2014	16	138M	71.5%	92.7%	Uniform 3×3 conv
GoogLeNet	2014	22	6.8M	74.8%	93.3%	Inception modules
ResNet-50	2015	50	25.6M	76.0%	96.4%	Residual connections

Modern Trends#

Recent developments in CNN architectures:

EfficientNet (2019)
- Compound scaling of depth/width/resolution
- Better accuracy with fewer parameters
- Systematic network scaling
Vision Transformers (2020)
- Adapting transformer architecture to vision
- Competitive with CNNs on large datasets
- Potential to replace CNNs in some tasks
MobileNet (2017-2019)
- Depthwise separable convolutions
- Designed for mobile devices
- Efficient inference on edge devices
RegNet (2020)
- Systematic network design
- Optimized performance-computation trade-off
- Data-driven design decisions

References#

LeCun, Y., et al. (1998). Gradient-based learning applied to document recognition
Krizhevsky, A., et al. (2012). ImageNet Classification with Deep CNNs
Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks
Szegedy, C., et al. (2015). Going Deeper with Convolutions
He, K., et al. (2016). Deep Residual Learning for Image Recognition </rewritten_file>

CNN Architectures

Contents

CNN Architectures#

1. LeNet-5 (1998)#

Key Features:#

2. AlexNet (2012)#

Key Innovations:#

Architecture Details:#

3. VGG16/VGG19 (2014)#

Key Features:#

Architecture Pattern:#

4. GoogLeNet/Inception (2014)#

Key Innovations:#

Inception Module Structure:#

5. ResNet (2015)#

Key Innovations:#

Architecture Details:#

Comparison Table#

Modern Trends#

References#