Researchers from Seoul National University of Science and Technology have developed a visual forward-forward network that significantly improves deep learning training for convolutional neural networks whilst addressing fundamental limitations of traditional backpropagation algorithms.
The VFF-Net system, developed by Gilha Lee and Associate Professor Hyun Kim from the Department of Electrical and Information Engineering, with Jin Shin, achieved a test error of 1.70 per cent on the MNIST dataset, outperforming existing backpropagation methods. The approach improved test error by up to 8.31 per cent on CIFAR-10 and 3.80 per cent on CIFAR-100 compared to existing forward-forward network methods designed for convolutional neural networks.
“Directly applying FFNs for training CNNs can cause information loss in input images, reducing accuracy. Furthermore, for general purpose CNNs with numerous convolutional layers, individually training each layer can cause performance issues. Our VFF-Net effectively addresses these issues,” says Mr. Lee.
Backpropagation, the dominant training method for deep neural networks, faces inherent limitations including overfitting, vanishing or exploding gradients, slow convergence and update locking due to complete information requirements about forward propagation computations. The forward-forward network approach, introduced by Geoffrey Hinton in 2022, trains deep neural networks through a “goodness-based greedy algorithm” using two data flows, bypassing backpropagation entirely.
However, applying forward-forward networks to convolutional neural networks has presented challenges. Existing goodness functions were designed for fully connected layers without considering spatial characteristics of input images, leading to accuracy degradation when applied to convolutional layers.
VFF-Net addresses these limitations through three methodologies. The label-wise noise labelling method trains the network on three types of data: original images without noise, positive images with correct labels, and negative images with incorrect labels, eliminating pixel information loss. The cosine similarity-based contrastive loss modifies the conventional goodness function into a contrastive loss function based on cosine similarity between feature maps, preserving meaningful spatial information necessary for image classification.
Layer grouping, the third methodology, solves individual layer training problems by grouping layers with the same output characteristics and adding auxiliary layers. This approach enables transfer to convolutional neural network-based models through ensemble training effects.
“By moving away from BP, VFF-Net paves the way toward lighter and more brain-like training methods that do not need extensive computing resources. This means powerful AI models could run directly on personal devices, medical devices, and household electronics, reducing reliance on energy-intensive data centres and making AI more sustainable,” says Dr. Kim.