ResNet

_{scroll ↓ to Resources}

Note

family of network architectures with advancing architectures (ResNext and others) and different number of layers (ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152)
dominated computer vision architecture before the vision transformer appeared, still remains the go-to solution when solving a standard well-known problem or prototyping
The original paper proposed a number of groundbreaking architectural solutions such as skip connection

4 blocks of convolution layers with skip connection every two layers
First skip connection in the beginning of each block of layers creates a problem due to the increasing number of layers between blocks: when skipping a convolution output from 3x3x64 needs to be summed with 3x3x128 of the next convo after the skip connection. Such mismatches denoted with a dashed skip connection line.
- One widespread solution for this is a 1x1xR convolution with stride 2
Full pre-activation with ReLU - order of batch normalization, activation and convolution (weights) which yielded best experimental results.
global average pooling before the last classification head reduces the number of parameters by averaging each output filter map to one number: 7x7x512 becomes just 1x512 and eliminates most of the connections to the fully-connected layer. This was another difference to the VGG architectures of the past (which had 90% of trainable parameters in fully connected layers), affordable in terms of quality due to the fact that much deeper ResNet networks produced higher quality output features.

table file.inlinks, file.outlinks from [[]] and !outgoing([[]])  AND -"Changelog"