 Research
 Open Access
 Published:
Estimation of a linear model with twoparameter symmetric platykurtic distributed errors
Journal of Uncertainty Analysis and Applications volume 1, Article number: 13 (2013)
Abstract
Purpose
A linear regression model with Gaussiandistributed error terms is the most widely used method to describe the possible relationship between outcome and predictor variables. However, there are some drawbacks of Gaussian errors such as the distribution being mesokurtic. In many practical situations, the variables under study may not be mesokurtic but are platykurtic. Hence, to analyze this sort of platykurtic variables, a multiple regression model with symmetric platykurtic (SP) distributed errors is needed. In this paper, we introduce and develop a multiple linear regression model with symmetric platykurtic distributed errors for the first time.
Methods
We used the methods of ordinary least squares (OLS) and maximum likelihood (ML) to estimate the model parameters. The properties of the ML estimators with respect to the symmetric platykurtic distributed errors are discussed. The model selection criteria such as Akaike information criteria (AIC) and Bayesian information criteria (BIC) for the models are used. The utility of the proposed model is demonstrated with both simulation and realtime data.
Results
A comparative study of symmetric platykurtic linear regression model with the Gaussian model revealed that the former gives good fit to some data sets. The results also revealed that ML estimators are more efficient than OLS estimators in terms of the relative efficiency of the onestepahead forecast mean square error.
Conclusions
The study shows that the symmetric platykurtic distribution serves as an alternative to the normal distribution. The developed model is useful for analyzing data sets arising from agricultural experiments, portfolio management, space experiments, and a wide range of other practical problems.
Introduction
Regression analysis is one of the most commonly used statistical methodologies in many branches of science and engineering used for discovering functional relationships between variables. The most typical example of regression analysis is multiple linear regression modeling, which is used for predicting values of one or more response variables from any factor of interest, the independent variables. It has received applications in almost every area of science, engineering, and medicine. Comprehensive accounts of the theory and applications of the linear regression model are discussed in Seber [1], Montgomery et al. [2], Grob [3], Sengupta and Jammalamadaka [4], Seber and Lee [5], Weisberg [6], and Yan and Su [7]. This technique is usually based on a statistical model in which the error terms are assumed to be independent and identically distributed random variables, whose distribution is considered to be multivariate normal with a zero mean vector and a positive definite covariance matrix [8]. However, in many disciplines, scientific research based on empirical studies or theoretical reasoning provided support for the presence of skewness or heavy tails in the distribution of the error terms. The departures from normality may be caused also by the presence of outlying values in the responses. Examples can be found, amongst others, in Fama [9] and Sutton [10]. For these reasons, several researchers proposed to perform multivariate regression analysis using a model that assumes a different parametric distribution family for the error terms.
Zeckhauser and Thompson [11] studied on a linear regression model with power distributions. Zellner [12] and Sutradhar and Ali [13] studied on a regression model with a multivariate t error variable. Tiku et al. [14–16] investigated a linear regression model with symmetric innovations, discussed a firstorder autoregressive model with symmetric innovations, and presented a linear model with t distribution, respectively. Sengupta and Jammalamadaka [17] studied on linear models. Liu and Bozdogan [18] studied on power exponential (PE) multiple regression. Wong and Bian [19, 20] studied on multiple regression coefficients in a linear model with errors being Student's t distribution and a linear regression model with underlying distribution being a generalized logistic distribution, respectively. Liu and Bozdogan [21] studied on multivariate regression models with PE random errors under various assumptions. Soffritti and Galimberti [22] discussed a multivariate linear regression model under the assumption that the error terms follow a finite mixture of normal distributions. Jafari and Hashemi [23] studied on linear regression with the error term of skewnormal distribution. Jahan and Khan [24] investigated the gandk distribution as the underlying assumption for the distribution of error in a simple linear regression model. Bian et al. [25] studied a multiple linear regression model with underlying Student's t distribution.
No serious attempt is made to develop and analyze multiple regression models with symmetric platykurtic (SP) errors. For this reason, to achieve more flexibility in statistical modeling and model selection, and to robustify many multiple statistical procedures, the purpose of this paper is to introduce and develop a multivariate linear regression model for conditions in which the distribution of error terms is assumed to be independent and identically distributed SP random errors with mean 0 and constant variance σ^{2}.
Model description
The multiple regression model assumes a linear (in parameters) relationship between a dependent variable Y = (y _{1}, y _{2}, …, y _{ n }) ' and a set of independent variables ${X}_{i}^{\text{'}}=\left({x}_{i0},{x}_{i1},{x}_{i2},\dots ,{x}_{\mathit{ik}}\right)$, where the first regressor x _{ i 0} = 1 is a constant and i = 1, 2,…, n. The model of interest is the standard linear regression model of the following form:
where y _{ i } is an observed dependent variable, x _{ ik } are observed independent variables, β _{0}, β _{1},…, β _{ k } are unknown regression coefficients to be estimated, and ϵ _{ i } are independently and identically distributed. Using linear algebra notation, model (1) may be written alternatively in matrix form as
In this model,
where y is a column vector of n elements, X is an nx(k + 1)(k + 1 < n) nonrandom design matrix of covariates (with its first column having all elements equal to 1, the second column being filled by the observed values of x _{1}, the (k + 1)th column being filled by the observed values of x _{ k }), β is a column vector of the (k + 1) elements, σ is an unknown scale parameter, and ∈ is an nx 1 column vector of error terms with zero mean and constant variance σ^{2} I.
Model assumptions
In order to complete the description of the model, some assumptions about the nature of the errors are necessary. It is assumed that the errors are independent and identically distributed (i.i.d.) random variables whose distribution is assumed to be twoparameter SP rather than normal, with zero mean and a positive definite variancecovariance matrix σ^{2} I of dimension nxn, that is
These assumptions are summarized in the matrix vector form as
where the notation E stands for the expected value and Cov represents an nxn variancecovariance matrix. The vector 0 is a column vector with n zero elements, and I is an identity matrix of order nxn. The parameter σ^{2} is unspecified, along with the vector parameter β. The elements of β are realvalued, while σ is positive. The covariates are either nonrandom or are independent of the errors. E(y) = Xβ, and Cov(y) = σ^{2} I. We shall use the triplet (y,Xβ,σ^{2} I) for linear model (2).
Article headings
The manuscript is organized in eight sections. The ‘Introduction’ section frames the objective of the paper and reviews related literatures. In the ‘Properties of the two parameter symmetric platykurtic distribution’ section, we introduced the twoparameter SP distribution, in notation SP(μ,σ). We derived the maximum likelihood (ML) estimators in the ‘Maximum likelihood estimation of the model parameters’ section. In order to obtain numerical solutions to the ML estimate problem, the Newton–Raphson (NR) iterative method has been used. In the ‘Properties of the estimators through simulation study’ section, we show the asymptotic properties of the estimators. In the ‘Least squares estimation of the model parameters’ section, OLS estimation for the model parameters is studied. Comparison of MLE with OLS estimators and that of the proposed model with the Gaussian model are done in the ‘Comparative study of the model’ section. The ‘Application of the model’ section demonstrates the usefulness of the present model on real data. Finally, the ‘Summary and conclusions’ section concludes the paper.
Properties of the twoparameter symmetric platykurtic distribution
Even though the sample frequency curve has a symmetric and bell shape, it often has fat tails than normal and the almost universally used Gaussian distribution may badly fit the fat tails. The family of symmetric platykurtic distributions can best model these features. The origin and genesis of this distribution is given by Srinivasa Rao et al. [26] for analyzing statistical data that arise from biological, sociological, agricultural, and environmental experiments. It became popular thereafter and has received considerable attention in the present time in modeling image segmentation and economic and financial data as a generalization of normal distribution (e.g., [27]). The probability density function (pdf) of such a family of distributions is
The distribution depends on three parameters μ, σ, and γ. These parameters can be interpreted as follows:

μ is a real number and may be thought of as a location measure.

σ is positive and measures the dispersion or the scale of the distribution.

γ is a kurtosis parameter which determines the shape of the distribution, taking values γ = 0, 1, 2,…, n. If γ = 0, we retrieve the normal distribution N(μ,σ^{2}). If we take γ = 1, we get a twoparameter symmetric platykurtic distribution.
In this section, we briefly present a twoparameter symmetric platykurtic distribution which is appropriate for data having kurtosis β = 2.52. A random variable Y is said to follow a twoparameter SP distribution if the density function of Y is
where − ∞ < μ < ∞ and σ > 0 are location and scale parameters, respectively. We denote this by Y ~ SP(μ,σ^{2}). The various shapes of the frequency curves are shown in Figure 1.
Some important properties of the random variable Y in a univariate context and those which are needed for the simulation of the variate are as follows:

1.
The distribution function of the random variable Y specified by the probability density function (7) is given by
$${F}_{Y}\left(y\right)={\displaystyle {\int}_{\infty}^{y}\frac{\left[2+{\left(\frac{t\mu}{\sigma}\right)}^{2}\right]{e}^{\frac{1}{2}{\left(\frac{t\mu}{\sigma}\right)}^{2}}}{3\sigma \sqrt{2\pi}}\mathit{dt}}$$which, on simplification, reduces to
$${F}_{Y}\left(y\right)=\frac{1}{\sigma \sqrt{2\pi}}{\displaystyle {\int}_{\infty}^{y}{e}^{\frac{1}{2}{\left(\frac{t\mu}{\sigma}\right)}^{2}}\mathit{dt}\frac{\left(y\mu \right){e}^{\frac{1}{2}{\left(\frac{y\mu}{\sigma}\right)}^{2}}}{3\sigma \sqrt{2\pi}}}={F}_{0}\left(y;\mu ,\sigma \right){F}_{1}\left(y;\mu ,\sigma \right),$$(8)where ${F}_{0}\left(y;\mu ,\sigma \right)=\frac{1}{\sigma \sqrt{2\pi}}{\displaystyle {\int}_{\infty}^{y}{e}^{{\left(\frac{t\mu}{\sigma}\right)}^{2}}\mathit{dt}}\phantom{\rule{0.25em}{0ex}}y\in \Re $, is the distribution function of the normal random variable with mean μ and variance σ, while ${F}_{1}\left(y\right)=\frac{\left(y\mu \right)}{3\sigma \sqrt{2\pi}}{e}^{\frac{1}{2}{\left(\frac{y\mu}{\sigma}\right)}^{2}},y\in \mathrm{\Re}$ is the nondistribution function (cannot be a cumulative density function) since it is negative for Y > μ.

2.
Numerical approximations for the twoparameter symmetric platykurtic cumulative distribution function (CDF): Following Marsaglia's [28] approximation for standard normal distribution who suggested a simple algorithm based on the Taylor series expansion, we have also approximated the values F(y;μ,σ) as follows. For standard normal distribution with arbitrary precision, $\Phi \left(y\right)=\frac{1}{2}+\phi \left(y\right)\left(y+\frac{{y}^{3}}{3}+\frac{{y}^{5}}{3.5}+\frac{{y}^{7}}{\mathrm{3.5.7}}+\dots \right)$ where φ and Φ is the pdf and CDF of the normal distribution with mean μ and variance σ. Accordingly, after little algebra, the standard symmetric platykurtic CDF, F(y), is approximated by
$$F\left(y\right)=\frac{1}{2}+f\left(y\right)\left(\frac{y}{3}+{\displaystyle \sum _{i=1}^{n}\frac{{y}^{i}}{i!!}}\right)$$(9)where n = 1, 3, 5,…, n and n!! denotes the double factorial that is the product of every odd number from 1 to n.

3.
The cumulantgenerating function is the logarithm of the momentgenerating function:
$$g\left(t;\mu ,{\sigma}^{2}\right)=\mathit{\mu t}+\frac{1}{2}{\sigma}^{2}{t}^{2}+ln\left[1+\frac{{\left(\mathit{\sigma t}\right)}^{2}}{3}\right].$$(10)The cumulants k _{ n } are extracted from the cumulantgenerating function via differentiation (at zero) of g(t). That is, the cumulants appear as the coefficients in the Maclaurin series of g(t):
$${k}_{1}=g\text{'}\left(0\right)=\mu ,{k}_{2}=g\text{'}\text{'}\left(0\right)=\frac{5{\sigma}^{2}}{3},{k}_{3}=g\text{'}\text{'}\text{'}\left(0\right)=\cdots ={k}_{n}={g}^{\left(n\right)}\left(0\right)=0$$(11)That is, the first two cumulants are equal to the mean μ and the variance $\frac{5{\sigma}^{2}}{3}$ of the twoparameter symmetric platykurtic distribution, respectively, whereas all higherorder cumulants are equal to zero.

4.
Hazard rate function of the distribution: The hazard function h(y;μ,σ) of the twoparameter symmetric platykurtic distribution used in this paper is utilized to characterize life phenomena and can be written as
$$h\left(y;\mu ,\sigma \right)=\frac{\left(2+{\left(\frac{y\mu}{\sigma}\right)}^{2}\right)\mathit{\phi}\left(\frac{y\mu}{\sigma}\right)}{3\mathit{\sigma Q}\left(y\right)+\left(y\mu \right)\phi \left(\frac{y\mu}{\sigma}\right)},$$(12)where φ is the pdf of the normal distribution with mean μ and variance σ, whereas the Qfunction Q(y) is the complement of the standard normal CDF, Q(y) = 1 − Φ(y).
Recently, it was observed by Gupta and Gupta [29] that the reversed hazard function plays an important role in the reliability analysis. The reversed hazard function of the twoparameter SP(μ,σ) is
$$r\left(y;\mu ,\sigma \right)=\frac{f\left(y;\mu ,\sigma \right)}{\mathrm{SP}\left(y;\mu ,\sigma \right)}=\frac{\left(2+{\left(\frac{y\mu}{\sigma}\right)}^{2}\right)\phi \left(\frac{y\mu}{\sigma}\right)}{3\mathit{\sigma}\mathrm{\Phi}\left(\frac{y\mu}{\sigma}\right)\left(y\mu \right)\phi \left(\frac{y\mu}{\sigma}\right)}.$$(13)It is well known that the hazard function or the reversed hazard function uniquely determines the corresponding probability density function.

5.
Entropy: The entropy for a twoparameter symmetric platykurtic distribution random variable y with probability density function f(y) on the real line is defined by
$$h\left(y\right)=\frac{529}{1,500}+ln\left(3\sigma \sqrt{2\pi}\right).$$(14)It can be recalled that the entropy for normal distribution is $\frac{1}{2}ln\left(2\mathit{\pi e}{\sigma}^{2}\right)$. If f and g are the probability distributions of symmetric platykurtic and normal distributions, respectively, then the relative entropy D(fg) from f to g is
$$D\left(f\left\rightg\right)={\displaystyle {\int}_{\infty}^{\infty}f\left(y\right)log\frac{f\left(y\right)}{g\left(y\right)}\mathit{dy}=0.7088}.$$(15)This gives us a measure of something like the distance between the two probability distributions, in the sense that the relative entropy is always positive, is zero if and only if the two distributions are the same, and increases as the distributions diverge. Some of the more important properties of the SP distribution are summarized in the Appendix.
Maximum likelihood estimation of the model parameters
In this section, we consider the homoscedastic regression model (Y,Xβ,σ^{2} I) with ϵ ~ SP(0, σ^{2} I). The unknown parameters of this model are the coefficient vector β and the error variance σ^{2}. We deal with the problem of ML estimation of these parameters, which requires some fairly intricate mathematics, from the observables Y and X. The MLEs of β and σ^{2} are the parameter values that maximize the likelihood function:
The loglikelihood function in the i.i.d. case, ignoring the additive constants, equals
The unknown parameters of this model are the coefficient vector β and the error variance σ^{2}. The MLE is that ${\widehat{\theta}}_{\mathrm{MLE}}=\left(\widehat{\beta},{\widehat{\sigma}}^{2}\right)$ which maximizes the loglikelihood. Taking the partial derivatives of the log of the likelihood with respect to the (k + 1)x 1 vector β and nxn matrix of σ^{2} I and setting the result equal to zero will produce (18) and (19). The MLEs are the solutions of the equations:
and
Since there are no closedform solutions to the likelihood equations, numerical methods such as the Fisher scoring or Newton–Raphson iterative method can be used to obtain the MLEs. The usual or standard procedure for implementing this solution is to use the Newton–Raphson iteration method given by
where H is the matrix of the second derivative and S is the vector of the first derivative of the loglikelihood function both evaluated at the current values of the parameter vector θ.
Here we begin with some starting value, say θ^{(0)}, and improve it by finding some better approximation θ^{(1)} to the required root. This procedure can be iterated to go from a current approximation θ^{(n)} to a better approximation θ^{(n+1)}.
Variancecovariance matrix
To obtain the asymptotic variances and covariance of the estimates, it is required to construct the Hessian matrix of the loglikelihood. The negative expected values of the secondorder partial derivatives of the loglikelihood equations (18) and (19) can be used to estimate the asymptotic covariance matrix of parameter estimates and can be found as follows. Each summand in the righthand side of (19) has zero expectation. Therefore,
Further,
as the integrand in the last expression is an odd function of u, that is, making use of the fact that the integrand of the offdiagonal term is an odd function of u.
Finally,
Simplification of the expression of this term is aided by the identity
Therefore, the information matrix for θ = (β,σ^{2})' is
The information for β is I _{ μ } X'X. Therefore, the design issues can be addressed with reference to the matrix X'X  just as in the normal case. The scalar I _{ μ } is equal to $\frac{4.934}{{\sigma}^{2}}$ (greater than σ^{−2} when the components of y are normally distributed), whereas the scalar ${I}_{{\sigma}^{2}}$ is equal to $\frac{6.68275}{{\sigma}^{4}}$ (greater than σ^{−4} when the components of y are normally distributed). The large sample variances and covariance of the estimates can be approximated by inverting the usual symmetric information matrix (26):
Thus, the square root of the elements on the diagonal of this matrix will give us the standard errors associated with the coefficients.
Simulation and results
The proposed approach was evaluated through Monte Carlo experiments in which artificial data sets were generated from model (2) using Wolfram Mathematica 9. To facilitate exposition of the method of estimation, a multiple data set with two independent variables and one dependent variable are simulated from a model with prespecified parameters for various sample sizes n = 100, 1,000, 3,000, 5,000, 10,000. (The sample size in this case, 10,000, is relatively large, and so finite sample bias is less of an issue.) The dependent variable (Y) is simulated from the symmetric platykurtic distribution with mean 0 and variance 1 while the two predictor x _{1} and x _{2} variables are generated from normal and lognormal distributions, respectively, with the prespecified mean and variance using the following simulation protocol:
and use as explanatory variables for the regression model. Without loss of generality, we considered the values of the model parameters as
Summary statistics of estimations of the regression model with symmetric platykurtic distributed errors using the ML procedure are presented in Table 1.
The numerical results in Table 1 suggest that as the sample size increases, the estimates of the parameters become more precise. The ML method provides good estimates of the underlying model not only of the regression coefficients but also of the correlation matrix. The fitted linear regression model with symmetric platykurtic distributed error terms to the simulated data, based on the sample of size 10,000, is
Standard errors of the estimates were estimated by the square root of the diagonal elements of the inverse of the Hessian of the loglikelihood function. Thus, the estimated standard errors are
Properties of the estimators through simulation study
If certain regularity conditions of the density are met, the MLEs are most attractive because they possess many asymptotic or large sample properties. Derivations of the asymptotic properties require some fairly intricate mathematics. The three properties of the regular densities (moments of the derivatives of the loglikelihood) are used in establishing the properties of MLEs. The properties of the ML estimators are as follows.
Consistency
One of the basic properties of a good estimator is that it differs from a true value by a very small amount as n becomes large. This implies that we can reach the exact value of θ by indefinitely increasing the sample size. Mathematically, it is expressed as
More specifically, a consistent estimator should not only be unbiased, but it should also have a variance which is as small as possible. This leads to two definitions:
From (27), it is clear that the variance tends to zero as n → ∞ in each case, so we conclude that the estimators are consistent since they are composed of i.i.d. observations.
Asymptotic normality
Greene's derivation of the asymptotic normality of the MLE applies here. The first derivative of the loglikelihood evaluated at the MLE equals zero. So
Expand this set of equations in a Taylor series around the true parameters θ _{0} using the mean value theorem to truncate the Taylor series at the second term:
The Hessian is evaluated at a point $\overline{\theta}$ that is between $\widehat{\theta}$ and θ _{0} $\left[\overline{\theta}=w\widehat{\theta}+\left(1w\right){\theta}_{0}\phantom{\rule{0.25em}{0ex}}\mathrm{for}\phantom{\rule{0.25em}{0ex}}\mathrm{some}\phantom{\rule{0.25em}{0ex}}0<w<1\right]$. We then rearrange this function and multiply the result by to obtain
Using (32), the probability limits of $\sqrt{n}\left(\widehat{\theta}\theta \right)$ and $\sqrt{n}\left(\widehat{\theta}\theta \right)$ go to zero because of the consistency of $\widehat{\theta}$ (i.e., $plim\left(\widehat{\theta}\theta \right)=0\phantom{\rule{0.25em}{0ex}}\mathrm{and}\phantom{\rule{0.25em}{0ex}}plim\left(\widehat{\theta}\overline{\theta}\right)=0$). The second derivatives are continuous functions. Therefore, if the limiting distribution exists, then
By dividing H(θ _{0}) and S(θ _{0}) by n, we obtain
We may apply the LindebergLevy central limit theorem to $\left[\sqrt{n}\overline{S}\left({\theta}_{0}\right)\right]$ because it is times the mean of a random sample. By virtue of $V\left[{S}_{i}\left(\theta \right)\right]=E\left[\frac{1}{n}H\left(\theta \right)\right]$, the limiting variance of $\left[\sqrt{n}\overline{S}\left(\theta \right)\right]$ is $E\left[\frac{1}{n}H\left(\theta \right)\right]$, so
By virtue of $E\left[{S}_{i}\left(\theta \right)\right]=0,plim\left[\frac{1}{n}H\left(\theta \right)\right]=E\left[\frac{1}{n}H\left(\theta \right)\right]$. This result is a constant matrix, so we can combine results to obtain
It follows that the MLEs are asymptotically normal with asymptotic distribution:
Asymptotic efficiency
The information matrix forms a tool of interest to verify efficiency, viz. the attainment of the information limit to the variance of the estimator. An estimator whose variance is as small as the CramerRao lower bound when the sample size tends to infinity is called asymptotically efficient. This means that an estimator which reaches 100% efficiency only in the n → ∞ limit is called asymptotically efficient. It can be shown that the CramerRao lower bound for $\widehat{\theta}=\left({\widehat{\beta}}_{0},{\widehat{\beta}}_{1},{\widehat{\beta}}_{2}\right)\text{'}$ and ${\widehat{\sigma}}^{2}$, respectively, are
where I _{ μ } and ${I}_{{\sigma}^{2}}$ are as defined in (22) and (25), respectively.
This means that any unbiased estimator that achieves this lower bound is efficient and no better unbiased estimator is possible. Now look back at the variancecovariance matrix (27). It is interesting to note that the variances of the estimators in the variancecovariance matrix do asymptotically coincide with the CramerRao lower bound (42). This means that our MLEs are 100% asymptotically efficient. The asymptotic variance of the MLE is, in fact, equal to the CramerRao lower bound for the variance of a consistent and asymptotically normally distributed estimator [30].
Invariance
Last, the invariance property is a mathematical result of the method of computing MLEs; it is not a statistical result as such. If it is desired to analyze a continuous and continuously differentiable function of an MLE, then the function of $\widehat{\theta}$ will, itself, be the MLE since the MLE is invariant to onetoone transformations of θ.
These four properties explain the prevalence of the ML technique. The second is a particularly powerful result. The third greatly facilitates hypothesis testing and the construction of interval estimates. The MLE has the minimum variance achievable by a consistent and asymptotically normally distributed estimator.
Least squares estimation of the model parameters
The most widely used technique for estimating the unknown regression coefficients in a standard linear regression model is undeniably the method of ordinary least squares (OLS). The least squares estimates of β _{0}, β _{1}, and β _{2} are the values which minimize
This leads to a closedform expression for the estimated value of the unknown parametric vector β:
Tables 2 and 3 summarize the results of the OLS estimation for a sample of size 10,000 using the same simulated observations obtained in the ‘Simulation and results’ section.
From Table 3, it is observed that the OLS estimates differ significantly from the ML estimates and the ML estimators are closer to the true values of the parameters compared to the OLS estimators.
Comparative study of the model
Comparison of estimators of the linear regression model
In this section, ML and OLS estimators are compared in fitting the multivariate linear model with twoparameter symmetric platykurtic error terms. Onestepahead forecasting is commonly used to compare the performance of different models [31, 32]. For each estimation technique, bias and mean square error (MSE) are calculated for the sample size of 10,000 where
The computational result is presented in Table 4.
As we expect, the results reported in Table 4 show that ML estimators have both smaller onestepahead forecast bias and less MSE than OLS estimators. This reveals that ML estimators exhibit superior performance to OLS estimators. This confirms the fact that deviations from normality cause OLS estimators to be poor estimators.
Comparison of the SPLRM with the NLRM
The linear regression model with symmetric platykurtic errors (SPLRM) and linear regression model with normal errors (NLRM) were applied to the simulated data sets. This produced parameter estimates $\left({\widehat{\beta}}_{0},{\widehat{\beta}}_{1},{\widehat{\beta}}_{2},\widehat{\sigma}\right)$ for the model with SP errors and parameter estimates $\left({\widehat{\beta}}_{0},{\widehat{\beta}}_{1},{\widehat{\beta}}_{2},\widehat{\sigma}\right)$ for the model with N errors. The simulated multivariate data are used to compare the performance of the linear regression model with symmetric platykurtic error terms with that of the linear regression model with normal error terms. In order to determine the best linear model among the fitted ones, we computed the Akaike information criteria (AIC) and Bayesian information criteria (BIC) with model diagnostics rootmeansquare error (RMSE). Needless to say, the proposed model would be chosen as the best model according to the minimum of AIC, BIC, and RMSE. The output of the simulation study using various sample sizes is presented in Table 5.
Models with small values for the criterion are potential candidate models where it can be seen that the SP distribution provides the best fit to the data. That is, both information criteria and model diagnostics indicate that the linear model with symmetric platykurtic distributed error terms consistently performed best across all the sample sizes of the simulation. The fact that SPLRM is superior to NLRM is also consistently noticed from Figures 2, 3, and 4.
Application of the model
As an illustration of the proposed methodology, we considered a real data set concerning 202 athletes collected at the Australian Institute of Sport, courtesy of Richard Telford and Ross Cunningham [33]. It is also available within the package sn in R. The variables examined are body mass index (BMI), red cell count (RCC), white cell count (WCC), and plasma ferritine concentration (PFC). The first is a biometrical variable, while the remaining three concern blood composition. They are summarized in Table 6. We studied the linear dependence of the biometrical variable on the blood composition variables.
We compute the skewness, kurtosis, and JarqueBera statistic to test the normality hypothesis for the body mass index. The results are shown in Table 7. Several other statistics could be used to test normality, such as the modified ShapiroWilk statistic, AndersonDarling test, and KolmogorovSmirnov test. However, as the JarqueBera statistic is one of the most powerful tests of normality, and the results of the other statistics are similar, we report only the results of the JarqueBera statistic and its corresponding skewness and kurtosis. The 1% level of significance shown in the table leads to rejection of the null hypothesis of normality for the body mass index.
Table 8 shows the results of fitting multiple regression models with symmetric platykurtic distributed errors and Gaussian errors (SPLRM and NLRM) to the Australian Institute of Sport data set using the maximum likelihood method of estimation. The standard errors are the asymptotic standard errors based on the observed information matrix given in (26).
From Table 8, it is observed that the estimates differ slightly between the two models. Following Lachos et al. [34], we propose selecting the best fit between NLRM and SPLRM by inspection of information criteria such as AIC and BIC (the preferred model is the one with the smallest value of the criterion). The AIC and BIC values shown at the bottom of Table 8 indicate that the SP LRM outperforms the NLRM. Therefore, the fitted model for BMI is
with the estimated standard errors
Summary and conclusions
A multiple linear regression model generalizes the simple linear regression model by allowing the response variable to depend on more than one explanatory variable. In this paper, we have explored the idea of using a symmetric platykurtic distribution for analyzing nonnormal errors in the multivariate linear regression model. The symmetric platykurtic distribution serves as an alternative to the normal distribution with platykurtic nature. The maximum likelihood estimators of the model parameters are derived and we found them feasible. Through simulation studies, the properties of these estimators are studied. Traditional OLS estimation is carried out in parallel and the results are compared. The simulated results reveal that the ML estimators are more efficient than the OLS estimators in terms of the relative efficiency of onestepahead forecast mean square error. A comparative study of the developed regression model with the Gaussian model revealed that this model gives good fit to some data sets. The asymptotic properties of the maximum likelihood estimators are studied, and the large sample theory with respect to regression coefficients is also presented. The utility of the proposed model is demonstrated with realtime data. This regression model is much more useful for analyzing data sets arising from agricultural experiments, portfolio management, space experiments, and a wide range other practical problems. The calculations in this paper make considerable use of a combination of three popular statistical packages: Mathematica 9.0, Matlab R2012b, and SAS 9.0.
Appendix
Summary of properties of twoparameter symmetric platykurtic distribution

1.
Notation: NS(μ, σ ^{2})

2.
Parameters: $\begin{array}{l}\mu \in \Re \mathrm{Mean}\phantom{\rule{0.12em}{0ex}}\left(\mathrm{location}\right)\\ {\sigma}^{2}>0\mathrm{Variance}\phantom{\rule{0.12em}{0ex}}\left(\mathrm{squared}\phantom{\rule{0.12em}{0ex}}\mathrm{scale}\right)\end{array}$

3.
Support: y ∈ ℜ

4.
PDF: $\frac{\left[2+{\left(\frac{y\mu}{\sigma}\right)}^{2}\right]{e}^{\frac{1}{2}{\left(\frac{y\mu}{\sigma}\right)}^{2}}}{3\sigma \sqrt{2\pi}}$

5.
CDF: $\frac{1}{\sigma \sqrt{2\pi}}{\displaystyle {\int}_{\infty}^{y}{e}^{\frac{1}{2}{\left(\frac{t\mu}{\sigma}\right)}^{2}}\mathit{dt}\frac{\left(y\mu \right){e}^{\frac{1}{2}{\left(\frac{y\mu}{\sigma}\right)}^{2}}}{3\sigma \sqrt{2\pi}}}$

6.
Mean: μ

7.
Median: μ

8.
Mode: μ

9.
Variance: $\frac{5}{3}{\sigma}^{2}$

10.
Skewness: 0

11.
Kurtosis: 2.52

12.
Entropy: $\frac{529}{1500}+\mathit{\text{ln}}\left(3\sigma \sqrt{2\pi}\right)$

13.
MGF: ${e}^{\mathit{\mu t}+\frac{{t}^{2}{\sigma}^{2}}{2}}\left[1+\frac{{\left(\mathit{\sigma t}\right)}^{2}}{3}\right]$

14.
CF: ${e}^{\mathit{\mu it}\frac{1}{2}{\sigma}^{2}{t}^{2}}\left\{1+\frac{{\left(\mathit{\sigma it}\right)}^{2}}{3}\right\}$

15.
CGF: $\mathit{\mu t}+\frac{1}{2}{\sigma}^{2}{t}^{2}+\mathit{\text{ln}}\left[1+\frac{{\left(\mathit{\sigma t}\right)}^{2}}{3}\right]$

16.
Central moments: moments $\begin{array}{l}{\mu}_{2n}=\left[\frac{\left(n+\frac{3}{2}\right)\Gamma \left(n+\frac{1}{2}\right)}{3\sqrt{\pi}}\right]{2}^{n+1}{\sigma}^{2n},\mathrm{even}\phantom{\rule{0.12em}{0ex}}\mathrm{central}\phantom{\rule{0.12em}{0ex}}\mathrm{moments}\\ {\mu}_{2n+1}=0,\phantom{\rule{6.6em}{0ex}}\mathrm{odd}\phantom{\rule{0.12em}{0ex}}\mathrm{central}\phantom{\rule{0.12em}{0ex}}\mathrm{moments}\end{array}$

17.
Fisher information: information $\left(\begin{array}{cc}\hfill {I}_{\mu}X\text{'}X\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill n{I}_{{\sigma}^{2}}\hfill \end{array}\right)$; ${I}_{\mu}=\frac{4.934}{{\sigma}^{2}}$,${I}_{{\sigma}^{2}}=\frac{6.68275}{{\sigma}^{4}}$
References
 1.
Seber GAF: Linear Regression Analysis. New York: Wiley; 1977.
 2.
Montgomery DC, Peck EA, Vining GG: Introduction to Linear Regression Analysis. 3rd edition. New York: Wiley; 2001.
 3.
Grob J: Linear Regression. In Lecture Notes in Statistics, vol. 175. Berlin: Springer; 2003.
 4.
Sengupta D, Jammalamadaka SR: Estimation in the linear model. In Linear Models: An Integrated Approach. River Edge: World Scientific; 2003:93–131.
 5.
Seber GAF, Lee AJ: Linear Regression Analysis. 2nd edition. New York: Wiley; 2003.
 6.
Weisberg S: Applied Linear Regression. 3rd edition. New York: Wiley; 2005.
 7.
Yan X, Su XG: Linear Regression Analysis: Theory and Computing. Hackensack: World Scientific; 2009.
 8.
Srivastava MS: Methods of Multivariate Statistics. New York: Wiley; 2002.
 9.
Fama EF: The behaviour of stock market prices. J. Bus 1965, 38: 34–105. 10.1086/294743
 10.
Sutton J: Gibrat’s legacy. J. Econ. Lit 1997, 35: 40–59.
 11.
Zeckhauser R, Thompson M: Linear regression with nonnormal error terms. Rev. Econ. Stat 1970,52(3):280–286. 10.2307/1926296
 12.
Zellner A: Bayesian and nonBayesian analysis of the regression model with multivariate Studentt error terms. J. Am. Stat. Assoc 1976,71(354):400–405.
 13.
Sutradhar BC, Ali MM: Estimation of the parameters of a regression model with a multivariate t error variable. Commun. Stat. Theory 1986, 15: 429–450. 10.1080/03610928608829130
 14.
Tiku ML, Wong WK, Bian G: Estimating parameters in autoregressive models in nonnormal situations: symmetric innovations. Commun. Stat. Theory Methods 28(2):315–341.
 15.
Tiku ML, Wong WK, Vaughan DC, Bian G: Time series models with nonnormal situations: symmetric innovations. J. Time Ser. Anal 2000,2(5):571–596.
 16.
Tiku ML, Islam MQ, Selcuk AS: Nonnormal regression, II: symmetric distributions. Commun. Stat. Theory Methods 2001,30(6):1021–1045. 10.1081/STA100104348
 17.
Sengupta D, Jammalamadaka SR: The symmetric nonnormal case. In Linear Models: An Integrated Approach. River Edge: World Scientific; 2003:131–133.
 18.
Liu M, Bozdogan H: Power exponential multiple regression model selection with ICOMP and genetic algorithms. Springer, Tokyo: Working paper; 2004.
 19.
Wong WK, Bian G: Estimation of parameters in autoregressive models with asymmetric innovations. Stat. Prob. Lett 2005,71(1):61–70. 10.1016/j.spl.2004.10.022
 20.
Wong WK, Bian G: Robust estimation of multiple regression model with asymmetric innovations and its applicability on asset pricing model. Euras. Rev. Econ. Financ 2005,1(4):7.
 21.
Liu M, Bozdogan H: Multivariate regression models with power exponential random errors and subset selection using genetic algorithms with information complexity. Eur. J. Pure Appl. Math 2008,1(1):4–37.
 22.
Soffritti G, Galimberti G: Multivariate linear regression with nonnormal errors: a solution based on mixture models. Stat. Comput 2011,21(4):523–536. 10.1007/s1122201091903
 23.
Jafari H, Hashemi R: Optimal designs in a simple linear regression with skewnormal distribution for error term. J. Appl. Math 2011,1(2):65–68.
 24.
Jahan S, Khan A: Power of ttest for simple linear regression model with nonnormal error distribution: a quantile function distribution approach. J. Sci. Res 2012,4(3):609–622.
 25.
Bian G, McAleer M, Wong WK: Robust estimation and forecasting of the capital asset pricing model. Ann. Financ. Econ 2013. in press
 26.
Srinivasa Rao K, Vijay Kumar CVSR, Lakshmi Narayana J: On a new symmetrical distribution. J. Indian Soc. Agric. Stat 1997,50(1):95–102.
 27.
Seshashayee M, Srinivas Rao K, Satyanarayana CH, Srinivasa Rao P: Image segmentation based on a finite generalized symmetric platykurtic mixture model with Kmeans. Int. J. Comput. Sci. Issu 2011,8(3):2.
 28.
Marsaglia G: Evaluating the normal distribution. J. Stat. Softw 2004,11(4):1–7.
 29.
Gupta RC, Gupta RD: Proportional reversed hazard rate model and its applications. J. Stat. Plann. Inference 2007,137(11):3525–3536. 10.1016/j.jspi.2007.03.029
 30.
Greene W: Econometric Analysis. 5th edition. Upper Saddle River: PrenticeHall; 2003.
 31.
Clements MP, Hendry DF: An empirical study of seasonal unit roots in forecasting. Int. J. Forecast 1997,13(3):341–355. 10.1016/S01692070(97)000228
 32.
Chiang TC, Qiao Z, Wong WK: New evidence on the relation between return volatility and trading volume. J. Forecast 2010,29(5):502–515.
 33.
Ferreira JTAS, Steel MF Statistics Research Report, 419. In Bayesian multivariate regression analysis with a new class of skewed distributions. University of Warwick: Department of Statistics; 2004.
 34.
Lachos VH, Bolfarine H, ArellanoValle RB, Montenegro LC: Likelihood based inference for multivariate skewnormal regression models. Commun. Stat. Theory Methods 2007, 36: 1769–1786. 10.1080/03610920601126241
Acknowledgements
The authors are grateful to the Editor of JUAA, two anonymous referees, and SpringerOpen Copyediting Management for the helpful comments and suggestions on the earlier version of this article. The present version of the paper owes much to their precise and kind remarks.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Received
Accepted
Published
DOI
Keywords
 Maximum likelihood
 Multiple linear regression model
 Simulation
 Symmetric platykurtic distribution