Wang 2022 - Self Consistency LLM

Self-Consistency Improves Chain of Thought Reasoning in Language Models

The main idea of this paper is that majority decoding (dubbed self-consistency) of sampled LLM responses significant improves over CoT prompting (see Wei 2022).

Method

The idea is simple. As per (Wei 2022), we use CoT few shot examples in our prompt. Instead of just picking the answer from one run of the LLM, we sample many <CoT reasoning, Answer> pairs from the LLM with the same prompt. We then "marginalize out" the reasoning by just choosing the answer that occurs the most frequently.

Note that this method is unsupervised and the only cost to pay is the compute cost of multiple runs.

The name self-consistency comes from the idea that the most consistent answer given by the model is the most reliable one. Prior approaches to task-based responses is to use greedy decoding, i.e. set $T = 0$ and get the highest likelihood answer. This paper shows that an ensemble of diverse answers is far more effective than the greedy approach.

We may think of this idea as analogous to random forests. Creating an ensemble over diverse weak learners improves significantly, and increasing the diversity of the learners (via column or row sampling) up to a certain point helps to improve performance.

Note that the idea of self-consistency decoding is orthogonal to the specific choice of CoT prompting. Specifically, self-consistency significantly improves with several forms of CoT prompting, including:

Few shot in-context CoT prompts (Wei 2022)
Zero shot Let's think step by step prompt (Kojima 2022)

Parameters

To generate diverse reasoning paths, the authors mainly refer to methods used in other papers:

For PaLM-540B they used T=0.7 and k=40 with top-k token truncation
For GPT-3 they used T=0.7 without top-k token truncation

For the paper, the authors mainly sampled 40 paths.

Keyboard shortcuts

Chux's Notebook

Wang 2022 - Self Consistency LLM

Method

Parameters