Definition
A Phred quality score is a logarithmic metric used to represent the probability that a particular nucleotide base call obtained from automated DNA sequencing is incorrect. It is expressed as Q = –10 log₁₀(P), where P denotes the estimated error probability for the base.
Overview
Phred quality scores are integral to modern high‑throughput sequencing workflows. The scores are generated by base‑calling software (most notably the Phred program) that analyses raw fluorescence or intensity data from sequencing instruments and assigns a confidence value to each identified base. The scores are routinely encoded in the FASTQ file format, enabling downstream tools for read alignment, variant calling, and quality control to incorporate base‑specific confidence information. Typical Phred scores range from 0 (≈100 % error probability) to 40–50 (≈0.01–0.001 % error probability), with higher values indicating greater confidence.
Etymology / Origin
The term derives from the Phred base‑calling program, which was developed in the late 1990s by researchers including Phil Green and colleagues at the Wellcome Trust Sanger Institute. The program’s name “Phred” is a stylized form of the developer’s name “Phil” combined with a suffix, and it became the eponym for the associated quality‑score metric. The logarithmic formulation of the score was introduced to provide a compact, additive representation of error probabilities across sequencing reads.
Characteristics
| Characteristic | Description |
|---|---|
| Mathematical definition | Q = –10 log₁₀(P), where P is the probability that the base call is wrong. |
| Scale | Commonly 0–40 for Sanger sequencing; up to 93 for modern Illumina platforms (reflecting very low error rates). |
| Interpretation | A score of 20 corresponds to a 1 % error probability (P = 0.01); a score of 30 corresponds to 0.1 % error probability (P = 0.001). |
| Encoding | In FASTQ files, scores are ASCII‑encoded by adding an offset (typically 33 or 64) to the numeric Q value. |
| Usage in pipelines | Quality filtering, trimming low‑confidence bases, weighting alignments, and informing variant‑calling algorithms. |
| Additivity | Because of the logarithmic transformation, the sum of Phred scores for independent bases approximates the overall log‑likelihood of a read. |
Related Topics
- Base calling – the computational process of translating raw sequencing signals into nucleotide sequences, of which Phred is a seminal implementation.
- FASTQ format – a text‑based file format that stores both nucleotide sequences and their corresponding Phred quality scores.
- Phrap – a sequence assembly program that utilizes Phred scores to assess the reliability of overlapping reads.
- Quality trimming – preprocessing steps that remove or mask low‑quality bases based on Phred thresholds.
- Illumina sequencing – a widely used next‑generation sequencing technology that reports quality scores in the Phred scale.
- Error modeling – statistical frameworks that incorporate Phred scores to estimate sequencing error rates in downstream analyses.