Dangers in sampling

Published : Jul 08, 2000 00:00 IST

"The most obvious danger in sampling is that a formula which appears good for one sample may be poor for the entire population of data. A less obvious but equally serious danger is that a formula which is good for the population may be overlooked, not be cause it is poor for the sample but merely because it is not the best for the sample...It might seem that the greater the number of predictors, the greater the probability of a good prediction formula, and, indeed this would be so if the sample used coul d consist of the entire population. When the size of the sample is limited, however, the use of too many predictors can lead to trouble. The difficulty is that the greater the number of predictors, the greater the probability that some linear combination will be highly correlated with the predictand within the sample even though it may be uncorrelated with the predictand within the entire population... and correspondingly increase the error when applied to new data."

- E.N. Lorenz in Empirical Orthogonal Functions and Statistical Weather Prediction (1956).

You have exhausted your free article limit.
Get a free trial and read Frontline FREE for 15 days
Signup and read this article for FREE

More stories from this issue

Get unlimited access to premium articles, issues, and all-time archives