Recent days I was talking to some friends on the philosophy of analyzing data. I conclude that statistics is not the only way to study data but definitely a very useful way. In my opinion, statistics is the analysis of data using probability model. Such a probabilistic formalism has several merits. In this post, I briefly introduce two nice properties for the statistical model.
First, statistical model for data is always in a simple form. In most cases, the data is full of noises and it is often hard to explain the formation of the noises without using the probability. In statistics, we simply model the generating process by using the notion of probability so that we can easily explain the randomness or the uncertainty of the data. For instance, in regression, our data is pairs of (X,Y) and we want to find the relation between Y and X. Generally, it is hard to find a simple function such that Y=f(X) for all points. However, in statistical model, we use Y = f(X)+noise. This is so-called the signal+noise form. In this model, the function f(x) (also called the regression function) can be modeled as a simple function as the noise is very reasonably distributed (reasonably: in terms of probability). The well-known simple linear regression is to use f(X)=a+bX which is very simple and easy to interpret.
The second advantage is that the statistical model provides a population level analysis. The population level analysis means that we can perform lots of analysis given the true population. For instance, we know that given a specific sample size, the sample mean will differ from the population mean due to the randomness of sampling process. However, if we have full information on the true population (usually we assume what the population looks like), we know the strength of randomness in the sample mean. We can compare this to what we observe in the data. Hence, we can check the reasonableness of our assumptions. This leads to the hypothesis test and model checking (model diagnostic) in statistics and is also widely used in scientific studies.
Statistics also has other nice features such as the asymptotic analysis and uncertainty quantification. I will introduce these characteristics later.
Some people always misinterpret the likelihood function as probability function. They’re similar and related but distinct.
The likelihood function is a measure of an intensity of likelihood for a particular value for the parameter that varies as the “parameter” (here we consider a single parameter) changes. This measure cannot be interpret as probability. The main reason is that we do not assume any probabilistic structure for the parameters (note*). Parameters are fixed but unknown quantities.
For instance, the mass of sun is a parameter that influence the sunlight intensity. Based on data, we can infer the mass of sun by the likelihood function. So we can say the case that the sun has mass 1.9891 × 1030 kilograms has the likelihood 0.9 (numbers are made by arbitrary). This does not mean that the mass of sun is a random quantity (and it shouldn’t); it just states that the intensity of sun’s mass being 1.9891 × 1030 kilograms by the likelihood measure is 0.9.
For another example in the daily life, assuming you want to know somebody’s age. However, you cannot directly ask him/her (this may be impolite). All you can do is to infer the age by asking him/her some other questions. Based on the responses, you can make some inference. So after a short chat, you will make a conclusion in your mind like “there’s 0.3 likelihood that his/her age is 25”. However, this doesn’t mean that his/her age has probability 30% being 25; the age is just an unknown value for you and has no probability.
It is true that the likelihood function is related to the probability density function (note**). For probability density function, we fixed the parameters and consider the probability density for different observations. For likelihood function, we use the same form of probability density function but we fixed the observation and consider different parameters. A critical difference is that if we sum over all possible observations for the probability density function for any fixed parameters, we will get 1. But the sum over all possible parameters under a specific observation is usually not 1 and even infinity. This makes the likelihood function differs from the probability density function.
Note*: This is called the Frequentist’s point of view. In the view of frequentist, the parameters are just unknown quantities that have no probability structure. In statistics, there’re another school called the Bayesian. In Bayesian’s perspective, the parameters can have probabilistic structure.
Note**: The probability density functions is in fact the joint probability density function for continuous random variables and is the joint probability mass function for discrete random variables.