Recognizing textual entailment using deep learning techniques
Tipo de documentoProjecte Final de Màster Oficial
Condiciones de accesoAcceso abierto
Textual Entailment (TE) or Natural Language Inference (NLI) refers to the problem of determining a directional relation between two text fragments. To specify, given a sentence pair (a, b), the task is to predict whether b is entailed by a, b is contradicted to a, or whether the relation between a and b is neutral. NLI is a central problem in natural language understanding. Recently, the dominating trend of works for NLI is based on artificial neural networks, which aims at building deep and complex encoder to transform a sentence into encoded vectors. End-to-end artificial neural networks have reached state-ofthe-art performance in NLI field. For instance, there are recurrent neural network based encoders, which recursively concatenate each word with its previous memory, until the whole information of a sentence has been derived. The most common RNN encoders are Long Short-Term Memory Networks (LSTM; Hochreiter and Schmidhuber, 1997) and Gated Recurrent Unit (Cho et al., 2014). RNNs have surpassed the performance of traditional baselines in many NLP tasks (Dai et al., 2015). There are also convolutional neural network (LeCun et al., 1989) based encoders, which concatenate the sentence information by applying multiple convolving filters over the sentence. CNNs have achieved state-ofthe-art results on computer vision (Krizhevsky et al., 2012), machine translation (Costa-jussà M.R., 2016) and also on various NLP tasks (Collobert et al., 2011). In this paper, we use the model introduced by (Adina Williams et al., 2017) as the baseline model for the NLI task. The baseline model is consisted with a word-level embedding layer and a BiLSTM encoder. We augment the baseline model and propose our Character-level Intra Attention Networks (CIAN). In our CIAN model, we use the character-level convolutional network to replace the standard word-level embedding layer, and we use the intra attention layer to capture the intra-sentence semantics. One contribution of our CIAN model is that we implement the character-level convolutional network introduced by (Kim et al., 2016). Most of the sequence encoders use word-level embedding layer initialized with pre-trained word vectors such as GloVe (Pennington et al., 2014). In that way, the words in a sentence are not independent anymore, which helps the encoders to catch more internal information of a sentence. However, as the growth of vocabulary size in the modern corpus, there will be more and more out-of-vocabulary (OOV) words that are not presented in the pre-trained word embedding vector. As the word-level embedding is blind to subword information (e.g. morphemes), it leads to high perplexities for those OOV words. We use the character-level convolutional network in our model to exploit the character-level information, which will be computed from the characters of corresponding word. By doing so, our model gains the ability to learn rich semantic and orthographic features from the encoding of characters. Another contribution of our CIAN model is that we implement the intra attention mechanism introduced by (Z. Yang et al., 2017). The major advantage of attention mechanism is the ability to efficiently encode long sentences. As the size of the input grows, models that do not use attention will miss information and precision if they only use the final representation. Attention is a clever way to fix this issue and experiments indeed confirm the intuition. Another advantage of attention mechanism is that we can enhance the interpretability of the model by visualizing the attention weights of an encoded sentence. We conduct the visualization of the attention weights in chapter 5, which helps us to understand how the model judges the textual entailment relation between two sentences. The proposed CIAN is implemented using Keras and evaluated upon a newly published MNLI corpus in the RepEval 2017 workshop. The test accuracy for the CIAN model upon matched test dataset is improved with 0.9 percent compared with the baseline model. Based on the improved result, we published a paper with title Character-level Intra Attention Networks for Natural Language Inference in the RepEval 2017 workshop, as an achievement of this thesis. To summarize, the CIAN model presented in this paper is a sequence encoder that has the ability to to encode long sentence in character-level with rich semantic and orthographic features. Also, the attention mechanism provides high interpretability of the model that allows people to understand how the model doing its task. As it’s an end-to-end neural network that does not need any specific pre-processing or outside data like pre-trained word embeddings. It can be easily applied to other encoder architecture tasks such as language modeling, sentiment analysis and question answering.