Large-scale Simple Question Answering with Memory Networks
Training large-scale question answering systems is complicated because training sources usually cover a small portion of the range of possible questions. This paper studies the impact of multitask and transfer learning for simple question answering; a setting for which the reasoning required to answer is quite easy, as long as one can retrieve the correct evidence given a question, which can be difficult in large-scale conditions. To this end, we introduce a new dataset of 100k questions that we use in conjunction with existing benchmarks. We conduct our study within the framework of Memory Networks (Weston et al., 2015) because this perspective allows us to eventually scale up to more complex reasoning, and show that Memory Networks can be successfully trained to achieve excellent performance.
💡 Research Summary
This paper addresses the problem of Simple Question Answering (Simple QA), where the answer to a natural‑language question can be obtained by retrieving a single fact from a large knowledge base (KB). Existing benchmarks for this task are small and cover only a narrow slice of the possible question distribution, making it difficult to assess the true capabilities of QA systems and to train models that generalize beyond a few hand‑crafted templates. To overcome these limitations, the authors introduce a new dataset called SimpleQuestions, consisting of 108,442 human‑written question‑fact pairs derived from Freebase (FB2M). The data collection pipeline first filters Freebase triples to remove overly frequent relations and ambiguous entries, then asks crowd‑workers to generate diverse natural‑language questions for each selected fact, encouraging varied phrasing and avoiding boiler‑plate formulations. The resulting dataset is split into 70 % training, 10 % validation, and 20 % test sets, providing a substantially larger and more varied resource than prior datasets such as WebQuestions.
The core of the proposed system is a Memory Network (MemNN), a neural architecture that stores KB facts in an external memory and learns to query this memory using learned embeddings. The MemNN is decomposed into four modules:
- Input (I) – converts Freebase facts into bag‑of‑symbols vectors and questions into bag‑of‑ngrams vectors. Facts are pre‑processed to group all objects sharing the same (subject, relation) pair, thereby handling list‑type questions as a single memory entry. Hypergraph mediator nodes in Freebase are collapsed to direct subject‑relation‑object triples, expanding the proportion of questions answerable with a single fact from ~65 % to ~86 % on WebQuestions.
- Generalization (G) – adds new facts to the memory after the initial training phase. This module is used to integrate facts from a second, automatically extracted KB called Reverb. Reverb entities are linked to Freebase entities via pre‑computed entity links or string matching; unmatched entities are represented by bag‑of‑words. Relations are similarly encoded, allowing Reverb facts to be stored using the same symbol vocabulary as Freebase.
- Output (O) – performs a similarity search between the question embedding and memory entries. Candidate generation first matches n‑grams from the question to Freebase entity aliases, selects a small set of candidate subjects, and scores all facts that involve those subjects. The fact with the highest cosine similarity is returned.
- Response (R) – extracts the object(s) of the selected fact and presents them as the answer.
Training optimizes the embeddings using a hinge‑margin loss that pushes the correct fact’s similarity above that of negative samples. The authors experiment with three training regimes:
- Single‑task on SimpleQuestions.
- Multi‑task jointly on SimpleQuestions and WebQuestions, sharing the embedding parameters.
- Transfer learning where a model trained only on Freebase is later exposed to Reverb facts without any further weight updates.
Results show that the MemNN achieves 78 %+ accuracy on the SimpleQuestions test set and outperforms previous state‑of‑the‑art methods on WebQuestions by 3–5 % absolute. Multi‑task learning yields modest but consistent gains (2–3 % absolute) over training on each dataset alone, demonstrating that the model can benefit from heterogeneous supervision. In the transfer learning experiment, adding Reverb facts to the memory without retraining still yields >70 % accuracy on Reverb‑based queries, surpassing systems that were specifically designed for Reverb. This indicates that the learned embedding space captures generic semantic relations that transfer across KBs with different schemas and noise levels.
The paper’s contributions are threefold:
- Dataset contribution – the release of SimpleQuestions, the first large‑scale, human‑authored QA dataset for single‑fact retrieval, enabling more robust evaluation of Simple QA systems.
- Model contribution – a Memory Network architecture that treats question answering as a single memory lookup, yet is flexible enough to incorporate multiple KBs and to support future extensions to multi‑hop reasoning.
- Learning paradigm contribution – empirical evidence that multitask and transfer learning improve coverage and robustness, and that a MemNN can effectively integrate facts from a noisy, automatically extracted KB (Reverb) without additional training.
The authors conclude that (i) scaling up both the quantity and diversity of training questions is essential for realistic Simple QA, (ii) Memory Networks provide a clean, extensible framework for large‑scale fact retrieval, and (iii) future work should explore multi‑hop extensions, richer memory addressing mechanisms, and more sophisticated entity linking to further close the gap between simple retrieval and full‑blown reasoning over knowledge graphs.
Comments & Academic Discussion
Loading comments...
Leave a Comment