Capturing Classic Authorial Style in Long-Form Story Generation with GRPO Fine-Tuning

Capturing Classic Authorial Style in Long-Form Story Generation with GRPO Fine-Tuning
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Evaluating and optimising authorial style in long-form story generation remains challenging because style is often assessed with ad hoc prompting and is frequently conflated with overall writing quality. We propose a two-stage pipeline. First, we train a dedicated style-similarity judge by fine-tuning a sentence-transformer with authorship-verification supervision, and calibrate its similarity outputs into a bounded $[0,1]$ reward. Second, we use this judge as the primary reward in Group Relative Policy Optimization (GRPO) to fine-tune an 8B story generator for style-conditioned writing, avoiding the accept/reject supervision required by Direct Preference Optimization (DPO). Across four target authors (Mark Twain, Jane Austen, Charles Dickens, Thomas Hardy), the GRPO-trained 8B model achieves higher style scores than open-weight baselines, with an average style score of 0.893 across authors. These results suggest that AV-calibrated reward modelling provides a practical mechanism for controllable style transfer in long-form generation under a moderate model size and training budget.


💡 Research Summary

This paper tackles the longstanding challenge of controlling authorial style in long‑form story generation. While recent benchmarks such as WritingBench, LitBench, and EQ‑Bench evaluate narrative quality, coherence, and creativity, they largely ignore style as a controllable objective. The authors therefore propose a two‑stage pipeline that first builds a dedicated style‑similarity judge and then uses it as the primary reward signal for fine‑tuning an 8‑billion‑parameter story generator via Group Relative Policy Optimization (GRPO).

Stage 1 – Style‑Similarity Judge.
The authors adopt an authorship‑verification (AV) perspective: instead of classifying a text as belonging to a particular author, they train a model to output a continuous similarity score for a pair of texts. To obtain large‑scale, topic‑controlled supervision, they construct a “style‑controlled” dataset from Project Gutenberg, restricting the corpus to four high‑level subjects (Adventure, Historical Fiction, Young‑Women Fiction, and Man‑Woman Relationship Fiction). Texts are segmented into 500‑3000‑token chunks. For each chunk, a set of sentences is masked at varying ratios r ∈ {0.1,…,0.9} and regenerated using GPT‑o‑ss‑20B, producing “refilled” chunks C′(r). The intuition is that content overlap decreases roughly as 1‑r while stylistic cues (lexical choice, rhythm, discourse patterns) persist. By pairing original and refilled chunks, as well as refilled‑refilled pairs across titles, the authors generate 100 K training pairs and ~10 K validation/test pairs. Labels s ∈


Comments & Academic Discussion

Loading comments...

Leave a Comment