When we sample, we are trying to determine something about
a population from a subset of that population. But
before we can determine that, we need to know what population we are
dealing with. In terms of set theory, we need to know what
universe we are dealing with.
It may not be easy to determine if a particular entity is in the
population or not. Is someone who smokes one cigarette a month when
he is out drinking a "smoker"? Is a raccon who lives at the border of
Panama and Columbia a "North American racoon"? Is a phone with 30% of
its components from China "produced in China"? Is someone who claims
they are going to vote, but hasn't voted in 20 years, really a
"likely" voter? Is someone taking only one class every year or
two a "student at St. Joseph's College"?
Note that these decisions can be made in a biased way: if we
want to exaggerate the dangers of smoking, we could count
as "smokers" only people who smoke over two packs a day. On the
other hand, if we want to minimize the dangers, we could
include anyone who has smoked even a single cigarette in the
last several decades.
At first, it might seem plausible that if we want to learn something about a population from a subset of that population, we should carefully construct that subset to closely mirror the actual population. So if, for instance we want to sample the American electorate about an upcoming election, we might decide, "Well, we should construct our sample to include 45% Democrat voters, 40% Republican voters, 10% Libertarian voters, 5% Green Party voters."
This approach it is seriously wrong, as it begs the question of what the population is actually like. If we already know the composition of the population, then we do not need to sample. We could simply declare that "The vote will be 45% Democrat, 40% Republican, 10% Libertarian, and 5% Green." The only reason that we are sampling is that we do not know how the population as a whole will vote, and we are hoping that our sample will help us to understand how it will.
Perhaps surprisingly, the best way to sample a population to determine its characteristics from the sample is to make the sample as random as we can. But even that is fraught with difficulties: we need to sample by some means, and that means itself may bias our sample. Alf Landon.