Summary: https://arxiv.org/pdf/2501.05707

Sources

Published as a conference paper at ICLR 2025 MULTIAGENT FINETUNING : S ELF IMPROVEMENT WITH DIVERSE

Summary of "Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains"

Authors and Publication

Authors: Vighnesh Subramaniam, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba, Shuang Li, Igor Mordatch
Published: ICLR 2025
Link: Read the full paper

Abstract

The paper discusses a novel approach to enhance the performance of large language models (LLMs) through a method called multiagent finetuning. This method addresses the limitations of traditional self-improvement techniques, which often lead to diminishing returns after several iterations of training. Instead of finetuning a single model, the authors propose a system where multiple models, all derived from the same base, are independently specialized using data generated from their interactions. This approach fosters specialization and diversification, allowing for sustained performance improvements over multiple rounds of finetuning.

Key Contributions

Multiagent Interaction: The paper introduces the concept of leveraging interactions among multiple language models to facilitate self-improvement.
Specialization of Models: Each model in the multiagent setup is assigned distinct roles (e.g., generation agents and critic agents) to enhance the quality of outputs through feedback loops.
Quantitative Validation: The authors provide empirical evidence demonstrating the effectiveness of their approach across various reasoning tasks, showing significant performance gains compared to traditional single-agent methods.
Generalization: The finetuned models exhibit the ability to generalize to new datasets, outperforming baseline models trained directly on those datasets.

Methodology

Multiagent Debate: The authors utilize a debate mechanism where multiple models generate responses to queries, and a majority voting system determines the best output. This process creates a rich dataset for finetuning.
Independent Specialization: Each model is finetuned on distinct subsets of data generated from the multiagent interactions, promoting diverse reasoning capabilities.
Iterative Improvement: The multiagent finetuning process allows for repeated training cycles, leading to continuous performance enhancements.

Results

The paper presents quantitative results showing that the multiagent finetuning approach leads to improved reasoning performance across several iterations, as illustrated in their experiments with various LLMs, including open-source models like Phi-3, Mistral, and LLaMA-3, as well as proprietary models like GPT-3.5.

Conclusion

The proposed multiagent finetuning method represents a significant advancement in the self-improvement of language models, enabling them to achieve better performance through collaborative learning and specialization. This approach not only enhances the models' capabilities but also allows them to adapt and generalize to new tasks effectively.

For more detailed insights, you can access the full paper here.