Neural network-based language models have been shown to generate remarkably fluent and human-like text.
Our goal is to incorporate these language models into real life applications, such as surface realization in task-oriented dialogue systems.
However these language models cannot be trusted to produce outputs with 100% accuracy.
Even in the best case scenario | with large datasets, on relatively simple tasks | neural network-based language models communicate incorrect information in 5% - 10% of cases.
Therefore, our research focuses on how to guarantee accurate output.
We present experiments and analysis on the use of sentence plans, which we believe are key to improving the performance of neural network-based language models on surface realization tasks.
These insights are a key contribution towards the development of more reliable surface realization systems in task-oriented dialogue.