Wang 2022 - Self Consistency LLM
Self-Consistency Improves Chain of Thought Reasoning in Language Models
The main idea of this paper is that majority decoding (dubbed self-consistency) of sampled LLM responses significant improves over CoT prompting (see Wei 2022).
Method
The idea is simple. As per (Wei 2022), we use CoT few shot examples in our prompt. Instead of just picking the answer from one run of the LLM, we sample many <CoT reasoning, Answer>
pairs from the LLM with the same prompt. We then "marginalize out" the reasoning by just choosing the answer that occurs the most frequently.
Note that this method is unsupervised and the only cost to pay is the compute cost of multiple runs.
The name self-consistency comes from the idea that the most consistent answer given by the model is the most reliable one. Prior approaches to task-based responses is to use greedy decoding, i.e. set and get the highest likelihood answer. This paper shows that an ensemble of diverse answers is far more effective than the greedy approach.
We may think of this idea as analogous to random forests. Creating an ensemble over diverse weak learners improves significantly, and increasing the diversity of the learners (via column or row sampling) up to a certain point helps to improve performance.
Note that the idea of self-consistency decoding is orthogonal to the specific choice of CoT prompting. Specifically, self-consistency significantly improves with several forms of CoT prompting, including:
- Few shot in-context CoT prompts (Wei 2022)
- Zero shot
Let's think step by step
prompt (Kojima 2022)
Parameters
To generate diverse reasoning paths, the authors mainly refer to methods used in other papers:
- For
PaLM-540B
they usedT=0.7
andk=40
with top-k token truncation - For
GPT-3
they usedT=0.7
without top-k token truncation
For the paper, the authors mainly sampled 40
paths.