A Step-by-Step Gradient Penalty with Similarity Calculation for Text Summary Generation
The summary generation model equipped with gradient penalty avoids overfitting and makes the model more stable. However, the traditional gradient penalty faces two issues: (i) calculating the gradient twice increases training time, and (ii) the disturbance factor requires repeated trials to find the best value. To this end, we propose a step-by-step gradient penalty model with similarity calculation (S2SGP). Firstly, the step-by-step gradient penalty is applied to the summary generation model, effectively reducing the training time without sacrificing accuracy. Secondly, the similarity score between reference and candidate summary is calculated as disturbance factor. To show the performance of our proposed solution, we conduct experiments on four summary generation datasets, among which the EDUSum dataset is newly produced by us. Experimental results show that S2SGP effectively reduces training time, and the disturbance factors do not rely on repeated trials. Especially, our model outperforms the baseline by more than 2.4 ROUGE-L points when tested on the CSL dataset.
PDFCode
Datasets
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Abstractive Text Summarization | EDUsum | Seq2seq | ROUGE-1 | 48.62 | # 1 | |
ROUGE-2 | 32.32 | # 1 | ||||
ROUGE-L | 44.13 | # 1 | ||||
Abstractive Text Summarization | EDUsum | BERT | ROUGE-1 | 62.37 | # 2 | |
ROUGE-2 | 50.70 | # 2 | ||||
ROUGE-L | 59.40 | # 2 | ||||
Abstractive Text Summarization | EDUsum | RoBERTa | ROUGE-1 | 63.22 | # 3 | |
ROUGE-2 | 51.34 | # 3 | ||||
ROUGE-L | 60.26 | # 3 | ||||
Abstractive Text Summarization | EDUsum | NEZHA | ROUGE-1 | 63.91 | # 4 | |
ROUGE-2 | 51.88 | # 4 | ||||
ROUGE-L | 61.00 | # 4 | ||||
Abstractive Text Summarization | EDUsum | GP_Step_Sim | ROUGE-1 | 64.48 | # 5 | |
ROUGE-2 | 52.70 | # 5 | ||||
ROUGE-L | 61.91 | # 5 |