http://lattes.cnpq.br/4222566664683938; MAGALHÃES, Whendell Feijó.
Resumo:
The Lottery Ticket Hypothesis formulates that it is possible to find subnetworks (winning tickets) that present accuracy equal to or greater than the unpruned network and high generalization capacity, when obtained from a super-parameterized neural network. An algorithm step that implements the hypothesis requires rewinding the weights of the pruned network to their initial values, usually random values. More recent variations of this step may involve (i) resetting the weights to the values they had at an earlier time of training. of the unpruned network (weight rewind), or (ii) keep the final training weights and reset only the learning rate (rewind the training rate). Although some researches have investigated the above variations, mostly in unstructured pruning (weight pruning), there are not, based on the literature review of this research, existing evaluations focused on structured pruning (neuron pruning or filters) for the variants of local and global pruning. Furthermore, research related to the lottery ticket hypothesis uses only the magnitude of the weights as a criterion for selecting the elements to be pruned. In this context, this research presents new empirical evidence that it is possible to obtain winning tickets when performing structured pruning of networks.
convolutional neural networks and proposes the use of a pruning criterion based on the
DeepLIFT explainability as an alternative to the magnitude of weights. For this, an experiment was set up using the VGG16 network trained on the CIFAR-10 and
CIFAR-100 and compared with networks (pruned at different compression levels) obtained by weight rewinding and learning rate rewinding methods, in the contexts of local pruning (layer-oriented) and global pruning (layer-independent). The unpruned net was used as a basis for the comparisons and the resulting pruned nets were also compared with their trained versions with randomly initialized weights. In addition, the impact of replacing the magnitude of the weights by the DeepLIFT method on globally pruned networks with the learning rate rewind approach was also evaluated. In general, when using global pruning, rewinding the weights produced some winning tickets (limited to low pruning levels) and with equal or worse performance compared to random initialization. Learning rate rewinding, using global pruning, produced the best results among rewinding approaches, as it found winning tickets at different pruning levels, including more aggressive levels. In addition, the nets pruned using the DeepLIFT method as pruning criterion, at the end of the pruning iterations, showed a higher average accuracy than the nets pruned using the magnitude of the weights, in addition to greater stability and tolerance to more aggressive pruning levels. Finally, it was possible to verify a significant reduction in the inference time (speedup of 5 in batches of size 1 and of 4 in batches of size 128) of the pruned networks when executed in CPU, thus producing networks more suitable for execution. on devices with few computing resources.