The analysis of discourse and the study of what characterizes it in terms of communicative objectives is essential to most tasks of Natural Language Processing. Consequently, research on textual genres as expressions of such objectives presents an opportunity to enhance both automatic techniques and resources. To conduct an investigation of this kind, it is necessary
To What Extent does Content Selection affect Surface Realization in the context of Headline Generation?
Headline generation is a task where the most important information of a news article is condensed and embodied into a single short sentence. This task is normally addressed by summarization techniques, ideally combining extractive and abstractive methods together with sentence compression or fusion techniques. Although Natural Language Generation (NLG) techniques have not been directly exploited
Headline generation is a task where the most important information of a news article is condensed and embodied into a single short sentence. This task is normally addressed by summarization techniques, ideally combining extractive and abstractive methods together with sentence compression or fusion techniques. Although Natural Language Generation (NLG) techniques have not been directly exploited
Selección de Contenido Relevante mediante Modelos de Lenguaje Posicionales: Un Análisis Experimental
Como muchas áreas en el ámbito del Procesamiento de Lenguaje Natural, la generación extractiva de resúmenes ha sucumbido a la tendencia general marcada por el éxito de los enfoques de aprendizaje profundo y redes neuronales. Sin embargo, los recursos que tales aproximaciones requieren — computacionales, temporales, datos — no siempre están disponibles. En este trabajo
Como muchas áreas en el ámbito del Procesamiento de Lenguaje Natural, la generación extractiva de resúmenes ha sucumbido a la tendencia general marcada por el éxito de los enfoques de aprendizaje profundo y redes neuronales. Sin embargo, los recursos que tales aproximaciones requieren — computacionales, temporales, datos — no siempre están disponibles. En este trabajo
A Discourse-Informed Approach for Cost-Effective Extractive Summarization
This paper presents an empirical study that harnesses the benefits of Positional Language Models (PLMs) as key of an effective methodology for understanding the gist of a discursive text via extractive summarization. We introduce an unsupervised, adaptive, and cost-efficient approach that integrates semantic information in the process. Texts are linguistically analyzed, and then semantic information—specifically
This paper presents an empirical study that harnesses the benefits of Positional Language Models (PLMs) as key of an effective methodology for understanding the gist of a discursive text via extractive summarization. We introduce an unsupervised, adaptive, and cost-efficient approach that integrates semantic information in the process. Texts are linguistically analyzed, and then semantic information—specifically
Optimizing Data-Driven Models for Summarization as Parallel Tasks
This paper presents tackling of a hard optimization problem of computational linguistics, specifically automatic multi-document text summarization, using grid computing. The main challenge of multi-document summarization is to extract the most relevant and unique information effectively and efficiently from a set of topic-related documents, constrained to a specified length. In the Big Data/Text era, where
This paper presents tackling of a hard optimization problem of computational linguistics, specifically automatic multi-document text summarization, using grid computing. The main challenge of multi-document summarization is to extract the most relevant and unique information effectively and efficiently from a set of topic-related documents, constrained to a specified length. In the Big Data/Text era, where
Applying Natural Language Processing Techniques to Generate Open Data Web APIs Documentation
Information access globalisation has resulted in the continuous growing of online available data on the Web, especially open data portals. However, in current open data portals, data is difficult to understand and access. One of the reasons of such difficulty is the lack of suitable mechanisms to extract and learn valuable information from existing open
Information access globalisation has resulted in the continuous growing of online available data on the Web, especially open data portals. However, in current open data portals, data is difficult to understand and access. One of the reasons of such difficulty is the lack of suitable mechanisms to extract and learn valuable information from existing open
GPLSI at TREC 2019 Incident Streams Track
In this paper we present our contribution to the TREC 2019 Incident Streams track. We submitted four runs to the 2019-B edition of this task. Our main goal is to evaluate the effectiveness of sentiment analysis and information retrieval techniques to automatically detect and prioritize incidents on social media streams. Here, we describe these techniques
In this paper we present our contribution to the TREC 2019 Incident Streams track. We submitted four runs to the 2019-B edition of this task. Our main goal is to evaluate the effectiveness of sentiment analysis and information retrieval techniques to automatically detect and prioritize incidents on social media streams. Here, we describe these techniques
Team GPLSI. Approach for automated fact checking
Fever Shared 2.0 Task is a challenge meant for developing automated fact checking systems. Our approach for the Fever 2.0 is based on a previous proposal developed by Team Athene UKP TU Darmstadt. Our proposal modifies the sentence retrieval phase, using statement extraction and representation in the form of triplets (subject, object, action). Triplets are
Fever Shared 2.0 Task is a challenge meant for developing automated fact checking systems. Our approach for the Fever 2.0 is based on a previous proposal developed by Team Athene UKP TU Darmstadt. Our proposal modifies the sentence retrieval phase, using statement extraction and representation in the form of triplets (subject, object, action). Triplets are
The Impact of Rule-Based Text Generation on the Quality of Abstractive Summaries
In this paper we describe how an abstractive text summarization method improved the informativeness of automatic summaries by integrating syntactic text simplification, subject-verb-object concept frequency scoring and a set of rules that transform text into its semantic representation. We analyzed the impact of each component of our approach on the quality of generated summaries and
In this paper we describe how an abstractive text summarization method improved the informativeness of automatic summaries by integrating syntactic text simplification, subject-verb-object concept frequency scoring and a set of rules that transform text into its semantic representation. We analyzed the impact of each component of our approach on the quality of generated summaries and
Towards Adaptive Text Summarization: compression Rate, Readability and L2
This paper addresses the problem of readability of automatically generated summaries in the context of second language learning. For this we experimented with a new corpus of level-annotated simplified English texts. The texts were summarized using a total of 7 extractive and abstractive summarization systems with compression rates of 20%, 40%, 60% and 80%. We
This paper addresses the problem of readability of automatically generated summaries in the context of second language learning. For this we experimented with a new corpus of level-annotated simplified English texts. The texts were summarized using a total of 7 extractive and abstractive summarization systems with compression rates of 20%, 40%, 60% and 80%. We