Skip to main content

Bio: Yuntian Deng is a PhD student at Harvard University advised by Prof. Alexander Rush and Prof. Stuart Shieber. Yuntian’s research aims at building coherent, transparent, and efficient long-form text generation models. Yuntian is also the main contributor to open-source projects such as OpenNMT, Im2LaTeX, and LaTeX2Im that are widely used in academia and industry. He has been awarded Nvidia Fellowship, Baidu Fellowship, ACL 2017 Best Demo Paper Runner-Up, DAC 2020 Best Paper, Harvard Certificate of Distinction in Teaching, NeurIPS 2020 top 10% reviewer for his research, teaching, and professional services.

Talk Title: Model Criticism for Long-Form Text Generation

Talk Abstract: Language models have demonstrated the ability to generate highly fluent text; however, it remains unclear whether their output retains coherent high-level structure (e.g., story progression). Here, we propose to apply a statistical tool, model criticism in latent space, to evaluate the high-level structure of the generated text. Model criticism compares the distributions between real and generated data in a latent space obtained according to an assumptive generative process. Different generative processes identify specific failure modes of the underlying model. We perform experiments on three representative aspects of high-level discourse—coherence, coreference, and topicality—and find that transformer-based language models are able to capture topical structures but have a harder time maintaining structural coherence or modeling coreference structures.

arrow-left-smallarrow-right-large-greyarrow-right-large-yellowarrow-right-largearrow-right-long-yellowarrow-right-smallfacet-arrow-down-whitefacet-arrow-downCheckedCheckedlink-outmag-glass