Sensitivity and specificity are statistical measures of the performance of a binary classification test, also known in statistics as a classification function, that are widely used in medicine:
Sensitivity (also called the true positive rate, the epidemiological/clinical sensitivity, the recall, or probability of detection[1] in some fields) measures the proportion of actual positives that are correctly identified as such (e.g., the percentage of sick people who are correctly identified as having the condition). It is often mistakenly confused with the detection limit[2][3], while the detection limit is calculated from the analytical sensitivity, not from the epidemiological sensitivity.
Specificity (also called the true negative rate) measures the proportion of actual negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition).
Note that the terms “positive” and “negative” do not refer to the value of the condition of interest, but to its presence or absence; the condition itself could be a disease, so that “positive” might mean “diseased”, while “negative” might mean “healthy”.
In many tests, including diagnostic medical tests, sensitivity is the extent to which actual positives are not overlooked (so false negatives are few), and specificity is the extent to which actual negatives are classified as such (so false positives are few). Thus, a highly sensitive test rarely overlooks an actual positive (for example, showing “nothing bad” despite something bad existing); a highly specific test rarely registers a positive classification for anything that is not the target of testing (for example, finding one bacterial species and mistaking it for another closely related one that is the true target); and a test that is highly sensitive and highly specific does both, so it “rarely overlooks a thing that it is looking for” and it “rarely mistakes anything else for that thing.”
Sensitivity, therefore, quantifies the avoidance of false negatives and specificity does the same for false positives. For any test, there is usually a trade-off between the measures – for instance, in airport security, since testing of passengers is for potential threats to safety, scanners may be set to trigger alarms on low-risk items like belt buckles and keys (low specificity) in order to increase the probability of identifying dangerous objects and minimize the risk of missing objects that do pose a threat (high sensitivity). This trade-off can be represented graphically using a receiver operating characteristic curve. A perfect predictor would be described as 100% sensitive, meaning all sick individuals are correctly identified as sick, and 100% specific, meaning no healthy individuals are incorrectly identified as sick. In reality, however, any non-deterministic predictor will possess a minimum error bound known as the Bayes error rate. The values of sensitivity and specificity are agnostic to the percent of positive cases in the population of interest (as opposed to, for example, precision).
The terms “sensitivity” and “specificity” were introduced by the American biostatistician Jacob Yerushalmy in 1947.[