We know what data is necessary to calculate the standard deviation of a set, but in some cases, we could actually do with a lot less information than the average test-taker may think they need.
Let’s explore this idea through an example GMAT data sufficiency question:
What is the standard deviation of a set of numbers whose mean is 20?
Statement 1: The absolute value of the difference of each number in the set from the mean is equal.
Statement 2: The sum of the squares of the differences from the mean is greater than 100.
We need to determine whether the information we have been given is sufficient to get us the exact value of the standard deviation of a particular set of numbers. To find the standard deviation of a set, we need to know the deviation of each term from the mean so that we can square those deviations, sum the squares, divide them by the number of terms, and then find the square root.
Essentially, to find the standard deviation we either need to know each element of the set, or we need to know the deviation of each element from the mean (which will also give us the number of terms), or we need to know the sum of the square of deviations and the number of terms in the set.
The question stem here tells us that the mean of the set is 20. We have no other information about any of the actual elements of the set or the number of elements. With this in mind, let’s examine each of the statements:
Statement 1: The absolute value of the difference of each number in the set from the mean is equal.
With this statement, we don’t actually know what the absolute value of the difference is. We also don’t know how many elements there are. The set could be something like:
19, 21 (each term is exactly 1 away from the mean 20)
or
18, 18, 22, 22 (each term is exactly 2 away from the mean 20)
etc.
The standard deviation in each case will be different. We don’t know the elements of the set and we don’t know the number of elements in the set. Because of this, there is no way for us to know the value of the standard deviation – this statement alone is not sufficient.
Statement 2: The sum of the squares of the differences from the mean is greater than 100.
“Greater than 100” encompasses a large range of numbers – it could be any value larger than 100. Again, we cannot find the exact standard deviation of the set, so this statement is also not sufficient alone.
Using both statements together, we still do not have any idea of what the elements of the set are or what the sum of the squares of the differences from the mean is. We also still don’t know the number of elements. Hence, both statements together are not sufficient, so the answer is (E).
Now, let us add just one more piece of information to the problem in this similar question:
What is the standard deviation of a set of 7 numbers whose mean is 20?
Statement 1: The absolute value of the difference of each number in the set from the mean is equal.
Statement 2: The sum of the squares of the differences from the mean is greater than 100.
What would you expect the answer to be? Still E, right? The sum of the deviations are still unknown and the exact elements of the set are still unknown – all we know is the number of elements. Actually, this information is already too much. All we need to know is that the number of elements is odd and suddenly we can find the standard deviation.
Here is why:
Statement 1 is quite tricky.
If we have an odd number of elements, in which case can the absolute values of the differences of each number in the set from the mean be equal?
Think about it – the mean of the set is 20. What could a possible set look like such that the mean is 20 and the absolute values of the differences of each number in the set from the mean are equal. Try to think of such a set with just 3 elements. Can you come up with one?
19, 19, 21? No, the mean is not 20
19, 20, 21? No, the absolute value of the difference of each number in the set from the mean is not equal. 19 is 1 away from mean but 20 is 0 away from mean.
Note that in this case, the only possible set that could fit the given criteria is one consisting of just an odd number of 20s (all elements in this set must be 20). Only then can each number be equidistant from the mean, i.e. each number would be 0 away from mean. If the numbers of the set all have equal elements, then obviously the standard deviation of the set is 0. It doesn’t matter how many elements it has; it doesn’t matter what the mean is! In this case, Statement 1 alone is sufficient so the answer would be (A).
Takeaway:
If a set has an even number of distinct terms, the absolute values of the distances of each term from the mean could be equal. But if a set has an odd number of terms and the absolute values of the distances of each term from the mean are equal, all the terms in the set must be the same and will be equal to the mean.