Dangers in sampling

Published : Jul 08, 2000 00:00 IST

"The most obvious danger in sampling is that a formula which appears good for one sample may be poor for the entire population of data. A less obvious but equally serious danger is that a formula which is good for the population may be overlooked, not be cause it is poor for the sample but merely because it is not the best for the sample...It might seem that the greater the number of predictors, the greater the probability of a good prediction formula, and, indeed this would be so if the sample used coul d consist of the entire population. When the size of the sample is limited, however, the use of too many predictors can lead to trouble. The difficulty is that the greater the number of predictors, the greater the probability that some linear combination will be highly correlated with the predictand within the sample even though it may be uncorrelated with the predictand within the entire population... and correspondingly increase the error when applied to new data."

- E.N. Lorenz in Empirical Orthogonal Functions and Statistical Weather Prediction (1956).

+ SEE all Stories
Sign in to Unlock member-only benefits!
  • Bookmark stories to read later.
  • Comment on stories to start conversations.
  • Subscribe to our newsletters.
  • Get notified about discounts and offers to our products.
Sign in

Comments

Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide to our community guidelines for posting your comment