美文网首页
[Text Summarization][ACL 2018] N

[Text Summarization][ACL 2018] N

作者: readME_boy | 来源:发表于2018-08-23 15:49 被阅读0次

    Title(2018)

    Neural Document Summarization by Jointly Learning to Score and Select Sentences

    Abstract

    1. A novel end-to-end neural network framework for extractive document summarizatoin by jointly learning to score and select sentences.

    2. Two main steps in this work:

    • First use hierarchical encoder to read documents and build representation of senetences.
    • Second build the output summary by extracting sentences with selection strategy integrated into the scoring model.
    1. Experiments on the CNN/Daily Mail dataset show that this proposed framework significantly outperform the current state-of-art extractive summarization models.

    1 Introduction

    Extractive methods for summarization have proven effective, is usually decomposed into two subtasks: sentence scoring and sentence selection.

    1. Sentence scoring:
    • Feature-based methods: word probability, TF*IDF weights, sentence position, sentence length features.
    • Graph-based methods: TextRank, LexRank (measure sentence importance using weighted-graphs).
    • Neural network rising.
    1. Sentence selection:
    • MMR-based methods: Maximal Marginal Relevance, select the sentence that has the maximal score and is minimally redundant with sentences already included in the summary.
    • ILP-based methods: Integer Linear Programming , use optimization with constraints to find the optimal subset of sentences in a document.
    • Neural network: Ren et al. (2016) train two neural networks with handcrafted features. One is used to rank sentences, and the other one is used to model redundancy.
    1. NEUSUM framework:
    • Integrate sentence scoring and selection into one end-to-end trainable model.
    • Identify the relative importance of sentences via a neural network without any handcrafted features.
    • Each time the model selects one sentence, it also scores considering both the sentence saliency and previously selected sentences. Therefore, the model learns to predict the relative gain given the sentence extraction state and the partial output summary.
    • Components:
    1. Document Encoder: has a hierarchical architecture suitable for the compositionality of documents.
    2. Sentence extractor: built with RNN which provides two main functions: (1) remember the partial output summary (2) provide a sentence extraction state.
    • Achieves the best result on CNN/Daily Mail dataset

    2 Related Work

    Sentence scoring is critical for measuring the saliency of a sentence.

    1. Unsupervised methods: Do not require model training and data annotation. Many surface features are useful like term frequency, TF*IDF weights, sentence length and sentence positions.
    2. Graph-based methods: Applied broadly to ranking sentences. (Erkan and Radev, 2004; Mihalcea and Tarau, 2004; Wan and Yang, 2006).
    3. Machine learning techniques: Naive Bayes classifier, Hidden Markov Model, Bigram features.
    4. Maximal Marginal Relevance: MMR, a heuristic in sentence selection Carbonell and Goldstein (1998).
    5. Integer Linear Programming: McDonald (2007) treats sentence selection as an optimization problem under some constraints.
    6. Deep neural networks:
    • Cao et al. (2015b) PriorSum, CNN capturing the prior features
    • Ren et al. (2017) two-level attention mechanism to measure the contextual relations of sentences.
    • Cheng and Lapata (2016) Nallapati et al. (2017) treat extractive document summarization as a sequence labeling task.

    3 Problem Formulation

    The goal is to learn a scoring function f\left ( \mathbf{S} \right ) on the sentence \mathbf{S} which can be used to find the best summary during testing:

    Training goal

    where l is the sentence number limit, \mathbf{D} is a document containing L sentences.

    • In this paper, instead of ILP, MMR method is adopted, since MMR tries to maximize the relative gain given previous extracted sentences so that the model can learn to score the gain.
    • \mathbf{ROUGE-F1} is used as the evaluation function r(\cdot), to prevent the tendency of choosing longer sentences, since the CNN/Daily Mail dataset have no length limit. And therefore, we have g(\cdot) as the scoring function.
      Scoring function
      where \boldsymbol{S_{t-1}} is the set of previously selected sentences. At each time t, the system chooses the sentence with maximal \mathbf{ROUGE-F1} gain.

    4 Neural Document Summarization

    A hierarchical document encoder is employed to reflect the hierarchy structure that words form a sentence and sentences form a document. The sentence extractor scores the encoded sentences and extracts one of them at each step.

    Overview of NEUSUM

    4.1 Document encoding

    Encode the document in two levels i.e. sentence level encoding and document level encoding.

    The sentence level encoder reads the j-th input senetence S_j and constructs the basic sentence representation \widetilde{s_j}. Here we employ a bidirectional GRU (BiGRU) (Cho et al., 2014) as the input where GRU is defined as:

    GRU

    where {\mathbf{W}}_z, {\mathbf{W}}_r and {\mathbf{W}}_h are weight matrices.

    The BiGRU consists of a forward GRU and a backward GRU that reads the word embeddings in the sentence S_j from the opposite directions and get hidden states.

    • Forward: ({\overrightarrow{h}}_1^{(j)},{\overrightarrow{h}}_2^{(j)},\cdots,{\overrightarrow{h}}_{n_j}^{(j)}),
    • Backward: ({\overleftarrow{h}}_1^{(j)},{\overleftarrow{h}}_2^{(j)},\cdots,{\overleftarrow{h}}_{n_j}^{(j)})
      BiGRU encoder
      Then the sentence level representation is constructed:
      \widetilde{s_j} = \begin{bmatrix} \overleftarrow{h}_1^{(j)}\\ \overrightarrow{h}_{n_j}^{(j)} \end{bmatrix}

    Another BiGRU is used as the document level encoder in the similar manner with the sentence level encoded vectors (\widetilde{s_1},\widetilde{s_2},\cdots ,\widetilde{s_L}) as inputs. the document level representation s_i of sentence S_i is the concatenation of the forward and backward hidden vectors:
    s_i = \begin{bmatrix} \overrightarrow{s}_i\\ \overleftarrow{s}_i \end{bmatrix}
    We then get the final sentence vectors in the given document.

    4.2 Joint Sentence Scoring and Selection

    Benefits: a) sentence scoring can be aware of previously selected sentences; b) sentence selection can be simplified since the scoring function is learned to be the \mathbf{ROUGE-F1} gain.

    Given the last extracted sentence \widehat{S}_{t-1}, to decide the next \widehat{S}_{t}, the model should have two key abilities:
    1). remembering the information of previously selected sentences. (Use another GRU)
    2). scoring the remaining document sentences considering both the previous selected sentences and the importance of the remaining sentences. (Use a Multi-Layer Percptron MLP)

    • GRU input: the document level representation s_{t-1} of the last extracted sentence \widehat{S}_{t-1} .
    • GRU output: hidden state h_t .
    • MLP input: the current hidden state h_t and the sentence representation vector s_i .
    • MLP output: score \delta (S_i) of sentence S_i .
      joint learning
      where {\mathbf{W}}_s, {\mathbf{W}}_q and {\mathbf{W}}_d are trainable parameters.

    with the initialization set as followed:


    Initialization

    At time t, we choose the sentence with maximal gain score.

    4.3 Objective Function

    Basic idea: Optimize the Kullback-Leibler (KL) divergence of the model prediction P and the labeled training data distribution Q.

    Model prediction P: normalize score \delta (S_i) with softmax function.

    Softmax

    Labeled data distribution Q:
    First use Min-Max normalization to rescale the gain value to [0,1]:

    Min-Max

    Then apply a softmax operation with temperature \tau to produce Q :

    Loss Function

    5 Experiments and Results

    Result

    Human evaluation also proves the excellent performance of NEUSUM model.

    相关文章

      网友评论

          本文标题:[Text Summarization][ACL 2018] N

          本文链接:https://www.haomeiwen.com/subject/uzhmiftx.html