Statistical Engineering

In my opinion, machine learning, data mining , pattern recognitions ..etc are branches of ‘statistical engineering’. I find the relationship between these disciplines and statistics is very similar to engineering versus science.

In engineering, people focus on the prediction, real performance and optimization for a process/procedure/algorithm. Theoretical analysis for engineers are not the as important as the empirical performance of a method. And how to use a method in solving practical problems is more important than to understand how it works. This is the case in machine learning, data mining and pattern recognition.

For instance, if a new method is proposed, it will be very popular in machine learning or data mining once the empirical performance is very good. How people classify a method as a good one is through the performance on a variety of data. In addition, those who are doing machine learning or data mining prefer to learn how to implement a method rather than to understand why this method works.

On the contrary, the scientific research emphasizes on constructing a general rule/model to explain the phenomena. Understanding a phenomenon is usually more important than knowing how to apply the outcome to real problem. For instance, astronomers develop lots of theories to explain the orbit, motion of a planet. However, astronomers do not care much about how this knowledge can be practically used in daily life.

In data analysis, the phenomena to be explained are the results from a statistical method such as the error of an estimation. For example, if a new method is proposed, it will arise statisticians’ attention once its theoretical performance is good. When there’s no theoretical guarantee for this method, statisticians will try to construct theories to explain how this method works. Besides, statisticians usually prefer understanding how a method works to learning how to implement it.

One can see that statistics versus machine learning/data mining/pattern recognition is nearly the same as science versus engineering. That’s why I use the term “statistical engineering” for these disciplines.

Leave a comment