Decoding Methods for LLMs

All of the following decoding methods can be used for auto-regressive language models.

Auto-regressive language generation

Auto-regressive language generation is based on the assumption that the probability distribution of a word sequence can be decomposed into the product of conditional next word distributions.

Greedy Search

Greedy search is the simplest decoding method. It selects the word with the highest probability as its next word.

Beam Search

Beam search reduces the risk of missing hidden high probability word sequences by keeping the most likely num_beams of hypotheses at each time step and eventually choosing the hypothesis that has the overall highest probability.

Temperature Scaling

A trick to make the distribution P(w|w1:t−1) sharper (increasing the likelihood of high probability words and decreasing the likelihood of low probability words) is by lowering the so-called temperature of the softmax.
Setting temperature → 0, temperature-scaled sampling becomes equal to greedy decoding.

For more details see Softmax.

Top-K Sampling

In Top-K sampling, the K most likely next words are filtered and the probability mass is redistributed among only those K next words.
In its most basic form, sampling means randomly picking the next word w_t according to its conditional probability distribution.

Top-P Sampling

Limiting the sample pool to a fixed size K could endanger the model to produce gibberish for sharp distributions and limit the model’s creativity for flat distributions. This intuition led Ari Holtzman et al. (2019) to create Top-p or nucleus-sampling.
In Top-p sampling, the method chooses from the smallest possible set of words whose cumulative probability exceeds the probability p.
Top-p can also be used in combination with Top-K, which can avoid very low ranked words while allowing for some dynamic selection.
As ad-hoc decoding methods, top-p and top-K sampling seem to produce more fluent text than traditional greedy and beam search on open-ended language generation.

Eric's AI Notes

Explorer

Decoding Methods for LLMs

Greedy Search

Beam Search

Temperature Scaling

Top-K Sampling

Top-P Sampling

Graph View

Table of Contents

Backlinks

Eric's AI Notes

Explorer

Decoding Methods for LLMs

Greedy Search §

Beam Search §

Temperature Scaling §

Top-K Sampling §

Top-P Sampling §

Graph View

Table of Contents

Backlinks

Greedy Search

Beam Search

Temperature Scaling

Top-K Sampling

Top-P Sampling