Random thoughts: Statistics vs Statistical engineering

Recent days I attended many talks given by people from statistics and statistical engineering (machine learning, data mining…etc).

I notice that people doing theories in statistical engineering is quiet similar to people in statistics. We do lots of statistical analysis on the method/algorithm and build some useful bounds for the convergence rate.

However, I just found that there’s a feature for people in statistics that people in theoretical engineering usually do not have: seeking for the asymptotic distributions. It is true that many people in statistical engineering try to find the bounds on convergence rate. The bounds are like their destination; they usually not go further for the distribution. In contrast, people in statistics will not stop at the rate; statisticians are targeting at the asymptotic distributions.

The reason why statisticians care about asymptotic distribution may be related to the statistical inference. The statistical inference such as confidence intervals, hypothesis tests, requires knowledge about the distribution of a certain statistics. Knowing the bounds is not sufficient for carrying out the inference. Both confidence intervals (or more general, confidence sets) and hypothesis test require the distributions.

This might also be the reason why courses in ML emphasizes more on the Hoeffding’s inequality, Bernstein’s inequality while in statistics, the courses focus more on the central limit theory and chi-square approximation.

Usually, finding the bounds on convergence rate is much easier than finding the true distribution. This might be a reason why many popular methods in statistical engineering are not so welcomed in statistics. The lack of asymptotic distribution reduces popularity from statisticians. However, many methods though have no asymptotic distribution, are still very useful in prediction, especially those with guarantees from probability bounds. Maybe we statisticians should not limit ourselves to those methods that are capable of statistical inference.

Anyway, I just discovered the feature for statisticians on deriving the asymptotic distributions. Maybe this is just my bias sample or maybe it is the truth. I’ll keep using this feature as a predictor to the future talks.