Definition
Phred is a computer program that performs base‑calling on raw fluorescence trace data generated by automated Sanger DNA sequencers. It translates the analog signal into a nucleotide sequence and assigns a quality score to each base, indicating the confidence of the call.
Overview
Developed in the mid‑1990s for use in large‑scale genome projects, Phred became a standard component of the DNA sequencing data‑processing pipeline. The software accepts trace files (commonly in the .ab1 or .scf format) and produces sequence files with accompanying quality values in the widely employed Phred quality score (Q) system. Phred’s output is compatible with downstream assembly tools such as Phrap and visualization programs such as Consed. By providing reliable quality metrics, Phred contributed significantly to the accuracy of the Human Genome Project and subsequent sequencing initiatives.
Etymology/Origin
The precise origin of the name “Phred” is not formally documented in peer‑reviewed literature. It is generally understood to be a whimsical variation on the common name “Fred,” chosen by the developers for its brevity and distinctiveness. No official acronym or expanded form for “Phred” has been reported.
Characteristics
| Feature | Description |
|---|---|
| Algorithm | Phred employs a statistical model that analyzes peak height, peak spacing, and signal‑to‑noise ratios in electropherogram traces. The model estimates the probability of each of the four nucleotides at every position and selects the most likely base. |
| Quality Scoring | Each base is assigned a Phred quality score Q, defined as $Q = -10 \log_{10} P_e$ where $P_e$ is the estimated probability that the base call is incorrect. Scores typically range from 0 to ≥40, with higher values indicating greater confidence. |
| Input/Output | Accepts standard trace formats (.ab1, .scf). Produces plain‑text sequence files (FASTA/FASTQ) and a separate file containing per‑base quality scores. |
| Integration | Designed to work in concert with Phrap (for sequence assembly) and Consed (for assembly review). The quality scores generated by Phred are used by these tools to resolve ambiguities and to flag low‑confidence regions. |
| Platform | Originally released for Unix‑like operating systems; later ports and wrappers enabled use on Windows and MacOS environments. |
| Licensing | Historically distributed under a non‑commercial academic license; source code has been made publicly available for research purposes. |
Related Topics
- Base calling – The process of converting raw sequencing signals into nucleotide sequences; Phred is a seminal example of a base‑calling algorithm for Sanger data.
- Phred quality score – A logarithmic metric for base‑call confidence that has become a de‑facto standard in sequencing quality assessment.
- Phrap – An assembly program that utilizes Phred quality scores to construct consensus sequences from overlapping reads.
- Consed – A graphical tool for reviewing and editing sequence assemblies generated with Phrap and Phred.
- Sanger sequencing – The chain‑termination method that produces the electropherogram traces processed by Phred.
- Next‑generation sequencing (NGS) quality metrics – Modern sequencing platforms employ analogous quality‑scoring schemes (e.g., Illumina’s Q scores) that trace conceptual lineage to Phred’s methodology.
Note: Information presented is based on documented publications and widely accepted use of Phred in the genomics community. No unverified claims are included.