Frederic Precioso
I3S-Inria - frederic.precioso@inria.frFrederic Precioso
Before understanding why transformers are expected to be the next super neural model, or in fact to actually understand why it is already, we will start by quickly defining and introducing what is AI, what is Machine Learning with respect to Statistics, to Data mining, and to Data science. The success of Deep Learning started first with the convolutional neural networks which are neural architectures preserving spatial patterns in data. This family of models is easily parallelizable. For sequential data and time series, the interest has moved to recurrent neural networks which preserve sequential patterns. This family of models is not parallelizable but can take into account large contexts (or long term dependencies). The recurrent neural networks have also benefited a lot from attention mechanisms and attentional layers. Transformers are expected to be the next super neural model because they gather all advantages of previous families. After detailing the internal mechanisms of transformers, we will see recent applications and it’s potential.