In these times of pandemic, there is a lot of talk about the importance of testing. And there is no question indeed that containing the propagation of the virus requires identifying sick people, and therefore testing them. However, it is tempting to take the result of a test for granted: if I test positive, I am sick; if I test negative, I am not.
Unfortunately, tests are not 100% reliable: you may test positive and not be sick – this is called a false positive; you may test negative and nevertheless be sick – this is called a false negative.
In order to understand what it means to receive a positive or negative result from a test, we need two pieces of information:
It is tempting to think that since the test is accurate % of the time, if you receive a positive result, you have % chance of being sick (and conversely, if you receive a negative result, you have % chance of being healthy). THIS IS NOT THE CASE!
In fact, for the parameters above, if you receive a positive result, you have % chance of being sick, and if you get a negative result, you have % chance of being healthy!
Note 1: Rather than using the word "sick", we should say "infected" instead because of the incubation period and the fact that a number of infected people do not get sick at all. For simplicity, we will nevertheless stick with the word "sick".
Note 2: The accuracy of a test is normally measured with two numbers: sensitivity and specificity. We get to these at the end of the page.
Let us visualize these two parameters:
Prevalence | % | ||
% | % | ||
Sick | Healthy |
Accuracy: % | ||
% | Accurate | |
% | Inaccurate |
Now let's cross these two diagrams. We get the diagram below with four quadrants:
Sick | Healthy | Total | ||
Accurate test | % | % | % | |
Inaccurate test | % | % | % | |
Total | % | % | 100% |
The top part represents the proportion of the population that received a correct result, either positive because they are sick (left, %) or negative because they are healthy (right, %).
The bottom part represents the proportion of the population that received incorrect results: false negatives – the test is negative but they are, in fact, sick (left, %); false positives – the test is positive but they are, in fact, healthy (right, %).
You can change the parameters to see how the four quadrants change.
Another way to look at the diagram is to note that the main diagonal (% + %) represents those who received a negative result: most were healthy (true negatives), but some were sick (false negatives). The other diagonal (% + %) represents those who received a positive result: some were sick (true positives), but some were healthy (false positives).
You can notice that when the prevalence (proportion of sick people) is equal to the inaccuracy of the test (proportion of tests that give an incorrect result, i.e. 100% - accuracy), the proportions of true and false positives become the same (the two rectangles in the top-left and bottom-right have the same area). This is because a small proportion (the false positives) of a large population (the healthy people) can be the same as a large proportion (the real positives) of a small population (the sick people). The net result is that in this case, if you get a positive result, you have only a 50% chance of being sick! Click to show such an example (10% prevalence, 90% accuracy).
In the general case, the confidence you can have in the results is calculated as follows:
Probability of being sick if you receive a positive test: | / ( + ) = % |
Probability of being healthy if you receive a negative test: | / ( + ) = % |
As you can see, this is quite different from the accuracy of the test. Here are the two sliders again to explore how the parameters affect the results: %; %
Since we are interested in interpreting the result of a test (positive or negative), let us use a different representation. The diagram below shows the proportion of people who received a negative result on the first line, split between those who were sick and those who were healthy. The second line does the same for those who received a positive result. In other words, we turn the area representations of the previous diagram into bars and organize them differently.
From this diagram we can easily see the proportion of correct results: for the first line, it is the relative size of the bar on the right (negative test and healthy) relative to the whole first bar (all negative tests); for the second line, it is the relative size of the bar on the left (positive test and sick) relative to the whole second bar (all positive tests).
Sick | Healthy | ||||
Negative test | % chance that you are healthy if you receive a negative test. |
Positive test | % chance that you are sick if you receive a positive test. |
%;
The prevalence of the COVID-19 disease is fairly low. Current estimates are between 5% and 15% of the population.
The accuracy of available tests, on the other hand, is quite low, about 75% (see some references at the end of the page).
You can click to set these parameters and see the results. As you can see, the confidence in positive results is extremely low. This explains why it is not helpful to test the population at large.
By testing only people with symptoms and people who have been in close contact with people who are known to be sick, we test a population where the prevalence of the disease is much higher, say 70%. This increases the confidence in the results, as you can see by clicking . As you can see, the confidence in positive results is high, however the confidence in negative ones is low. You can scroll up to the diagram with the four quadrants to see why: we now have a symmetric situation where a large proportion of a smaller population (those who are healthy and tested negative – %) is simular to a smaller proportion of a larger population (those who are sick but tested negative – %).
If you like maths, here are the formulas that lead to these counter-intuitive measures. We note P(x) the probability of event x and P(x | a) the probability of event x given that a is true.
Here are the two parameters again for convenience: %; %
Based on these two parameters, we can define four basic probabilities:
Now we need to calculate the probability that a test turns out positive (resp. negative). This happens when the test is accurate and the person is sick, or when the test is inaccurate and the person is healthy (similarly for negative tests):
What we are interested in are the conditional probabilities:
To calculate these we use Bayes rule: P(A | B) = P(B | A) * P(A) / P(B):
Traditionally, the tables below are used to present the four cases of interest in what is called Signal Detection Theory:
Positive test | Negative test | Total | |
Sick | Hit % |
Miss % |
% |
Healthy | False alarm % |
Correct rejection % |
% |
Total | % | % | 100% |
If we swap the rows and the columns of the table and use colored bars to represent the percentages, we get the diagram that we saw earlier:
Sick | Healthy | Total | |
Negative test | Miss % |
Correct rejection % |
% |
Positive test | Hit % |
False alarm % |
% |
Total | % | % | 100% |
In the above we have used the word "accuracy" to characterize the proportion of correct results of a test. In practice, there are separate accuracies for positive and negative tests:
The diagram below, similar to the one we saw earlier, lets you specify these two rates with the two vertical sliders: the one on the left for sensitivity, the one on the right for specificity.
Sick | Healthy | ||||
Sensitivity % | Specificity % |
Accurate | % | |
Inaccurate | % | |
Total | % |
% | Accurate | |
% | Inaccurate | |
% |
Here is the corresponding alternative representation that we saw earlier:
Sick | Healthy | ||||
Negative test | % chance that you are healthy if you receive a negative test. |
Positive test | % chance that you are sick if you receive a positive test. |
In the case of COVID-19, the specificity of the RT-PCR viral test, which is considered the most accurate diagnostic test, is estimated at 75% and its sensitivity at 90%. You can click to see the confidence levels for a prevalence of 10% (testing the general population randomly), or to see the confidence levels for a prevalence of 70% (testing suspicious cases only).
When the probability of a false negative is high, you can increase confidence in the result by taking a new test. In this case, the prevalence is updated to the probability of a true positive. Click to see the confidence after a second test for the current parameters.
In contrast, the COVID-19 antibody tests, which detect if you have antibodies in your blood, have much higher accuracy. The Roche Antibody Test, for example, claims a specificity greater than 99.8% and a sensitivity of 100%. Click to see the confidence in the results for a prevalence of 10%.
The probability of getting sick despite a negative test is not zero. For the current parameters, it is %.
If a group of people, who all tested negative, gets together, the probability that at least one of them is actually sick and risks propagating the virus is %. This percentage increases very quickly if the test is less sensitive. This is why the size of gatherings (family or other) is limited. Moreover, in reality, it is rare that all the people who meet have been tested recently, so this percentage is an underestimate.
You can change the size of the group here : people.