📖 WIPIVERSE

🔍 Currently registered entries: 118,801건

F1 (classification)

The F1 score, also known as the F1-measure, is a single metric that combines precision and recall into a harmonic mean, providing a balanced measure of a classification model's accuracy. It is particularly useful when dealing with datasets that have imbalanced class distributions, where a simple accuracy score can be misleading.

Definition:

The F1 score is calculated as follows:

F1 = 2 * (Precision * Recall) / (Precision + Recall)

Where:

  • Precision (also called positive predictive value) is the proportion of positive identifications that were actually correct. It answers the question: "Of all the instances predicted as positive, how many were actually positive?" Precision = True Positives / (True Positives + False Positives)

  • Recall (also called sensitivity) is the proportion of actual positives that were identified correctly. It answers the question: "Of all the actual positive instances, how many were predicted as positive?" Recall = True Positives / (True Positives + False Negatives)

  • True Positives (TP) are the number of instances correctly predicted as positive.

  • False Positives (FP) are the number of instances incorrectly predicted as positive.

  • False Negatives (FN) are the number of instances incorrectly predicted as negative.

Interpretation:

The F1 score ranges from 0 to 1, where:

  • 1 represents perfect precision and recall (ideal score).
  • 0 represents the worst possible score, indicating that either precision or recall (or both) is zero.

A higher F1 score indicates a better balance between precision and recall.

Use Cases:

The F1 score is commonly used in various machine learning tasks, including:

  • Spam detection
  • Medical diagnosis
  • Fraud detection
  • Information retrieval
  • Natural language processing

Limitations:

While the F1 score provides a useful single metric, it's important to consider its limitations:

  • It gives equal weight to precision and recall. In some scenarios, one might be more important than the other.
  • Like precision and recall, it is sensitive to class imbalance.
  • It doesn't provide insight into the types of errors the model is making (e.g., confusing different classes).

Alternatives:

Depending on the specific application and requirements, other evaluation metrics can be considered, such as:

  • Accuracy
  • Precision
  • Recall
  • Area Under the ROC Curve (AUC-ROC)
  • Area Under the Precision-Recall Curve (AUC-PR)
  • Matthews Correlation Coefficient (MCC)