Fake nus: Generating equations with topic-based RNNs

July 30, 2018

Scientific documents rely on both mathematics and text to communicate ideas. Modeling the topical correspondence between mathematical equations and word contexts observed in scientific texts, we develop a new type of topic model that jointly generates mathematical equations and their surrounding text (TopicEq). Using an extension of the correlated topic model, the context is generated from a mixture of latent topics, and the equation is generated by an RNN that depends on the latent topic activations. To experiment with this model, we create a corpus of 400K equation-context pairs extracted from a range of scientific articles from arXiv, and fit the model using a variational autoencoder approach. The model effectively captures the relationship between topics and mathematics, enabling  applications such as topic-aware equation generation, equation topic inference, and topic-aware alignment of mathematical symbols and words. The data used in this study is available upon request.