Module B: Intelligent Natural Language Generation
In this module, priority will be given to the use of advanced machine learning techniques, such as deep learning. The scientific novelty of this module lies in how to integrate the text planner and the knowledge available on a topic in the structure of hidden layers that these types of algorithms provide. The text planner will be determined by the outcome of the activities in module A. Knowledge will be obtained from existing knowledge bases and ontologies –Wikipedia, BabelNet, for example–, and NLP tools will be used to extract the information needed to compose the text. Thus, the proposed approach will address the entire NLG process, from macroplanning to surface realization as a whole – hence the term holistic – and will be able to generate coherent and semantically correct text oriented to a given communicative purpose.
Objective OBJ4 will be achieved with this module.
Activity 1. Deep learning to generate natural language
The objective of this activity is to determine the machine learning algorithms that work best for the generation of language and to develop new models based on them, achieving a balance between the quality of the output generated and its computational cost. First, there will be a systematic analysis of the literature on this subject in order to: (i) know which type of algorithm has obtained better results in other related NLPs (e.g. automatic generation of summaries); and (ii) be aware of its advantages and limitations, thus avoiding the repetition of errors already identified by other researchers. Existing platforms such as Keras or TensorFlow will also be analyzed to carry out the development and implementation of the proposed approach.
Initially, the algorithms will be used to demonstrate their validity for the surface realization stage. Subsequently, once the preliminary results have demonstrated the success of this type of machine learning algorithm, the remaining stages of the generation process, macro and microplanning, will be integrated to build a holistic approach in Activity 2.
Milestone: Obtaining models for surface realization using deep learning
Activity 2. Proposal and development of a holistic approach to generate natural language
This is a key activity for the project, the successful completion of which will result in a holistic NLG method guided by the desired communicative goal, and which will be able to solve, with the same approach, many of the generation problems –reports, recommendations, complaints, criticisms, opinions, etc.–.
The text planners obtained in module A will be integrated as intermediate layer knowledge of the advanced machine learning algorithms, considering the methods of surface realization researched in the previous activity of this module. This will lead to a holistic NLG model, from heterogeneous sources of information, that is flexible and adaptive in terms of the type of text to be produced and the communicative goal.
Previous research has shown that it is possible to use traditional machine learning algorithms to integrate the three stages mentioned above (Duma and Klein, 2013, Konstas and Lapata, 2013). These studies confirm the feasibility of the task to be investigated with algorithms and advanced techniques that have greater potential. In this sense, Integer will make a quantitative and qualitative leap using deep learning techniques, and considering the communicative language models obtained for text planners as characteristics of the entire generation process.
Milestone: Holistic approach for NLG guided by a communicative objective for heterogeneous information sources
- Duma, D. and E. Klein (2013). Generating natural language from Linked Data: Unsupervised template extraction. Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013), pages 83–94. Association for Computational Linguistics.
Konstas, I. and M. Lapata. (2013). Inducing Document Plans for Concept-to-Text Generation. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1503–1514. Association for Computational Linguistics.