Evaluating LLM-Written Abstracts with ChainForge

2026-03-19

This workshop introduces ChainForge through a hands-on demonstration of prompt testing and evaluation for scientific abstract generation. I will begin with a brief overview of ChainForge as a tool for comparing and analyzing LLM outputs, then show how a simple RAG pipeline can be used to generate abstracts from papers in PDF format. Finally, I will demonstrate how these outputs can be evaluated against prompt constraints such as word count, grammatical quality, and relevance to the original abstract. The workshop aims to provide a practical example of how quantitative and qualitative evaluation can be combined in a clear and reproducible workflow.