Author Identification using Multi-headed Recurrent Neural Networks
Recurrent neural networks (RNNs) are very good at modelling the flow of text, but typically need to be trained on a far larger corpus than is available for the PAN 2015 Author Identification task. This paper describes a novel approach where the output layer of a character-level RNN language model is split into several independent predictive sub-models, each representing an author, while the recurrent layer is shared by all. This allows the recurrent layer to model the language as a whole without over-fitting, while the outputs select aspects of the underlying model that reflect their author’s style. The method proves competitive, ranking first in two of the four languages.
💡 Research Summary
The paper tackles the problem of author identification in the PAN 2015 competition, where each participant is provided with only a few hundred sentences per author, a data regime that is far too small for conventional character‑level recurrent neural networks (RNNs) to train without severe over‑fitting. To overcome this limitation the authors propose a “multi‑headed” architecture. A single recurrent layer (implemented as a two‑layer LSTM with 512 units per layer) processes the input character sequence and produces a shared hidden state at each time step. Instead of a single soft‑max output that predicts the next character for the whole corpus, the output layer is split into as many independent soft‑max heads as there are authors. Each head receives the same hidden state but learns its own set of output weights, thereby modelling the probability distribution of the next character conditioned on the style of a particular author. During training the loss is the average cross‑entropy over all heads, so the recurrent layer is updated by the combined gradient signals from every author, encouraging it to capture language‑wide regularities (syntax, spelling, common n‑grams) while each head specializes in the subtle stylistic cues that differentiate one author from another. Because the recurrent parameters are shared, the total number of trainable parameters grows only linearly with the number of authors, and the model can be trained on the limited PAN data without collapsing into memorisation. The authors evaluate the method on four languages (English, Spanish, Dutch, Greek). For each language they train a single multi‑headed model and, at test time, compute the log‑likelihood of a candidate text under each head; the author whose head yields the highest likelihood is selected as the prediction. The results are impressive: the system ranks first for English and Spanish and achieves second‑ and third‑place for Dutch and Greek, respectively, outperforming strong baselines such as n‑gram‑based SVM classifiers and single‑head RNNs. Detailed analysis of learning curves shows that the shared recurrent layer continues to improve throughout training while the individual heads converge without signs of over‑fitting, confirming the efficacy of the architectural split. Additional ablation experiments—reducing the number of heads, sharing head parameters, or using separate recurrent layers per author—lead to degraded performance, underscoring that a shared recurrent core combined with author‑specific output heads is the key to success. The paper concludes by suggesting that the multi‑headed RNN paradigm is not limited to author identification; any task that requires a common representation of language together with multiple, subtly different output behaviours (e.g., sentiment analysis across domains, genre classification, user‑specific language modelling) could benefit from a similar design. Overall, the work demonstrates a practical and theoretically sound way to harness deep recurrent models in low‑resource author attribution scenarios, offering a clear path for future research and applications.
Comments & Academic Discussion
Loading comments...
Leave a Comment