LLM as a Neural Architect: Controlled Generation of Image Captioning Models Under Strict API Contracts

Reading time: 2 minute
...

📝 Original Info

  • Title: LLM as a Neural Architect: Controlled Generation of Image Captioning Models Under Strict API Contracts
  • ArXiv ID: 2512.14706
  • Date: 2025-12-07
  • Authors: Krunal Jesani, Dmitry Ignatov, Radu Timofte

📝 Abstract

Neural architecture search (NAS) traditionally requires significant human expertise or automated trial-and-error to design deep learning models. We present NN-Caption, an LLM-guided neural architecture search pipeline that generates runnable image-captioning models by composing CNN encoders from LEMUR's classification backbones with sequence decoders (LSTM/GRU/Transformer) under a strict Net API [3, 6] . Using DeepSeek-R1-0528-Qwen3-8B as the primary generator [1], we present the prompt template and examples of generated architectures. We evaluate on MS COCO with 9] . The LLM generated dozens of captioning models, with over half successfully trained and producing meaningful captions. We analyse the outcomes of using different numbers of input model snippets (5 vs. 10) in the prompt, finding a slight drop in success rate when providing more candidate components. We also report training dynamics (caption accuracy vs. epochs) and the highest BLEU-4 attained. Our results highlight the promise of LLM-guided NAS: the LLM not only proposes architectures but also suggests hyperparameters and training practices. We identify the challenges encountered (e.g., code hallucinations or API compliance issues) and detail ...

📄 Full Content

...(본문 내용이 길어 생략되었습니다. 사이트에서 전문을 확인해 주세요.)

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut