Pages

8/20/2012

A brief history of statistical inference

In the beginning of 19th century, the majority of the statisticians believed in parametric methods and embraced maximum likelihood estimation. For a given problem, the standard workflow was to first identify an appropriate parametric model and then apply maximum likelihood estimation to find the parameter for the model. This approach worked very well for a while. Concurrently, several statisticians were interested in non-parametric methods. They proved that the empirically distribution converges to the true distribution universally and exponentially fast. This led to a general approach for statistical inference. However, this general approach was not widely appreciated and mostly considered as purely technical achievements. The field of statistical inference was dominated by parametric methods. This situation remained the case until the invention of computers, which leads to information explosion.
With increasing amount and complexity of data, statisticians began to realize the disadvantages of parametric methods. They identified the curse of dimensionality. They found that maximum likelihood estimation is not always the best approach for statistical inference. In order to apply statistical inference to challenging problems, they later move to the idea of empirical risk minimization. That is, instead of finding a parameter that best explains the data, one finds a function that results in the minimum empirical lost, called learning machine. However, unlike maximum likelihood estimation, statisticians was unable to prove the consistency and convergence of methods based on empirical risk minimization.
The proof only came later with the introduction of VC dimension. It was shown that both the necessary and sufficient conditions of consistency and convergence of methods based on empirical risk minimization depend on the capacity of the set of functions implemented by the learning machine. It is necessary and sufficient that the set of functions has a finite VC dimension. With this proof, statisticians are more satisfied now and move forward to solve more challenging problems.