Distancebased test for uncertainty hypothesis testing
 Sundaram Sampath^{1}Email author and
 Balu Ramya^{1}
https://doi.org/10.1186/2195546814
© Sampath and Ramya; licensee Springer. 2013
Received: 7 March 2013
Accepted: 16 June 2013
Published: 28 June 2013
Abstract
Background
If an appropriate probability distribution cannot be identified for a given situation, it becomes extremely difficult to draw reliable inferences about the given domain of study under investigation. This is due to the fact that statistical theory of testing of hypothesis cannot be meaningfully employed in those cases. To deal with such situations, Uncertainty theory is recommended as an alternative by Liu (2007) and testing the validity of the hypotheses about uncertainty distributions is currently receiving the attention of researchers.
Methods
In this paper, for testing uncertain hypotheses about the true uncertainty distribution function, a new test procedure based on the inputs given by one or more domain experts is suggested. The proposed method can also be used for testing uncertain hypotheses about the equality of two uncertainty distribution functions.
Results
Illustrative examples are also provided in support of the test procedure suggested in this paper to demonstrate the utility of the same.
Conclusions
The same methodology can be used for testing the equality of two uncertainty distributions by making use of the ratio used in the construction of the test.
Keywords
Background
Testing of statistical hypotheses is a major branch of study in classical statistical inference. It deals with the process of developing appropriate test procedures for testing the validity of statistical hypotheses. Statistical hypotheses are statements about characteristics of reallife situations modeled in terms of probability distributions and a statistical test helps the decision maker whether to accept or reject the given hypothesis based on sampled observations. The theory of testing of statistical hypotheses revolves around the probability theory.
There are several reallife situations where it would be very difficult to identify appropriate models (probability distributions) describing the probabilistic properties of the given phenomena. Further, collection of adequate information in the form of sampled data to explain fully the probability distribution is not always viable. To deal with these situations, [1] introduced a new theory called the Theory of Uncertainty. Further refinements on the Theory of Uncertainty have been carried out by Liu [2]. For more details about the Theory of Uncertainty and its applications in various fields of research, one can refer to [2]. The online resource of [3] is an excellent source of information on the latest status of various aspects related to Uncertainty Theory.
It is well known that probability distributions are the backbone of the theory of statistical inference that helps practitioners to study about the inherent characteristics of the given situation. Inferences related to the given system require the knowledge of parameters involved in the underlying probability distributions for which several solutions are available in the literature. Similar to probability distributions playing a crucial role in the stochastic situations, uncertainty distributions play a significant role in the Theory of Uncertainty. Uncertainty distributions model the nature of uncertainty present in the given system. Several uncertainty distributions and their properties are available in [3]. These distributions have certain unknown constants, and practitioners require the knowledge of these quantities to study the nature of uncertainty. Liu [2] suggested an estimation procedure for the estimation of parameters in an uncertainty distribution. It was followed by the works of Wang and Peng [4] and Wang et al. [5]. Recently, Wang et al. [6] introduced an uncertain hypothesis testing procedure to test the equality of two uncertainty distributions.
In this paper, a new test procedure is introduced for testing whether a specified uncertainty distribution function can be the true uncertainty distribution function of the given system. The proposed test procedure makes use of a distance based on empirical comprehensive uncertainty distribution defined by Liu [2]. The suggested procedure can be modified suitably for handling the situation wherein one will be interested in testing the equality of two uncertainty distributions. The paper is organized as follows. The second section of the paper introduces the uncertainty theory and uncertainty distributions briefly. The third section of the paper explains Wang et al. [6] test procedure and introduces the new test for testing hypotheses about uncertainty distributions. Illustrations are given in the fourth section, and conclusions are provided in fifth section.
Methods
Uncertainty distributions
Let Г be a nonempty set and L be a σ  algebra over Г. Elements of L are known as events. Uncertainty measure M is a function from L to [0,1] which measures the degree of belief associated with an event. Initially, it was introduced as a function from L to [0,1], satisfying the axioms such as normality, monotonicity, selfduality, and countable subadditivity [1]. Later on, Liu [3] refined the definition of uncertainty measure and defined it as a measure satisfying normality, duality, and subadditivity axioms. A measureable function ξ from the uncertainty space (Г,L,M) to the set of real numbers is defined as uncertain variable. The uncertainty distribution Ф:R → [0,1] of an uncertain variable ξ is defined by Ф(x) = M{ξ ≤ x}, for any x ∈ R. According to Peng and Iwamura [7], a sufficient and necessary condition for a function Ф:R → [0,1] to be an uncertainty distribution function is that the function is an increasing function except for the choices Ф(x) ≡ 0 and Ф(x) ≡ 1.
 1.The uncertainty normal distribution denoted by N(c,σ),c ∈ R, σ > 0 is defined as$\mathbf{\Phi}\left(x\right)={\left[1+{e}^{\frac{\pi \left(cx\right)}{\sqrt{3}\sigma}}\right]}^{1},\phantom{\rule{0.5em}{0ex}}x\in R.$(1)
 2.The uncertainty lognormal distribution denoted by LOGN(c,σ), ∈ c R, σ > 0 is defined as$\mathbf{\Phi}\left(x\right)={\left[1+{e}^{\frac{\pi \left(clogx\right)}{\sqrt{3}\sigma}}\right]}^{1},\phantom{\rule{0.5em}{0ex}}x\ge 0.$(2)
Liu [2] has given a method of computing empirical uncertainty distribution function using the data collected from an expert. Assume that the set of expert’s experimental data (x _{1},α _{1}),(x _{2},α _{2}),…,(x _{ n },α _{ n })meets the consistent condition x _{1} < x _{2} < … < x _{ n }, 0 ≤ α _{1} ≤ α _{2} ≤ … ≤ α _{ n } ≤1.
where $\sum _{i=1}^{m}{w}_{i}=1,{w}_{i}\ge 0,i=1,2,\dots ,m}.$
It is pertinent to note that this convex combination is also an empirical uncertainty distribution as proved by Peng and Iwamura [7].
Test for uncertainty distribution hypotheses
In this section, we develop a test procedure for testing hypotheses about uncertainty distributions. An uncertain hypothesis is a hypothesis about uncertainty distributions that characterize uncertain situations.
It is presumed that the two theoretical uncertainty distributions with respect to the expert’s data are F _{1}(x) and F _{2}(x). To test the null uncertainty hypothesis H _{0}:F _{1}(x) = F _{2}(x) for any x ∈ R against the alternative uncertainty hypothesis H _{1}:F _{1}(x) ≠ F _{2}(x) for some x ∈ R, Wang et al. [4] constructed a test procedure based on randomly generated points from two empirical uncertainty distribution functions corresponding to the two experts’ data In this paper, we construct a test for testing the uncertain hypothesis that a given function F _{1} can be the true uncertainty distribution function for the given situation of interest against the alternative hypothesis that F _{2} is the true uncertainty distribution function. Consider the problem of testing the hypothesis H _{0}:F(x) = F _{1}(x) ∀x against the alternative hypothesis H _{1}:F(x) = F _{2}(x) ∀x, where F _{1} and F _{2} are known theoretical uncertainty distribution functions.
Experimental data for m experts
Expert  Values and degree of belief 

1  $\left({x}_{1}^{1},{\alpha}_{1}^{1}\right),\left({x}_{2}^{1},{\alpha}_{2}^{1}\right),\dots ,\left({x}_{{n}_{1}}^{1},{\alpha}_{{n}_{1}}^{1}\right)$ 
2  $\left({x}_{1}^{2},{\alpha}_{1}^{2}\right),\left({x}_{2}^{2},{\alpha}_{2}^{2}\right),\dots ,\left({x}_{{n}_{2}}^{2},{\alpha}_{n2}^{2}\right)$ 
M  $\left({x}_{1}^{m},{\alpha}_{1}^{m}\right),\left({x}_{2}^{m},{\alpha}_{2}^{m}\right),\dots ,\left({x}_{{n}_{m}}^{m},{\alpha}_{{n}_{m}}^{m}\right)$ 
Corresponding to the information given in the m rows of Table 1, we compute the empirical uncertainty distribution functions ${\widehat{\mathrm{\Phi}}}_{1}$, ${\widehat{\mathrm{\Phi}}}_{2}$ … ${\widehat{\mathrm{\Phi}}}_{m}$ using the definition given in the previous section and combine these m empirical functions, using a convex combination, to get the empirical comprehensive uncertainty distribution function defined in Eq. (4).
where ${d}_{x}\left(\widehat{\mathrm{\Phi}}\left(x\right),{F}_{1}\left(x\right)\right)={\left(\widehat{\mathrm{\Phi}}\left(x\right){F}_{1}\left(x\right)\right)}^{2}\phantom{\rule{0.5em}{0ex}}\forall x\in S.$
where ${d}_{x}\left(\widehat{\mathrm{\Phi}}\left(x\right),{F}_{2}\left(x\right)\right)={\left(\widehat{\mathrm{\Phi}}\left(x\right){F}_{2}\left(x\right)\right)}^{2}\phantom{\rule{0.5em}{0ex}}\forall x\in S.$
where k is the largest real number satisfying the inequality
σ(R _{ k }) ≥ α.
Here, ${R}_{k}=\left\{x\left\frac{{d}_{x}\left(\widehat{\mathrm{\Phi}}\left(x\right){F}_{1}\left(x\right)\right)}{{d}_{x}\left(\widehat{\mathrm{\Phi}}\left(x\right){F}_{2}\left(x\right)\right)}>k\right.\right\}$ and $\sigma \left({R}_{k}\right)=\frac{N\left({R}_{k}\right)}{N\left(S\right)}$, N being the number of elements in the given set. The constant α is predetermined by the user. It is nothing but the proportion of items in $S={\displaystyle \underset{r=1}{\overset{m}{\cup}}{A}_{r}}$ for which the ratio of the distance between the empirical comprehensive uncertainty distribution function and the distribution specified under null hypothesis to the corresponding distance based on the distribution mentioned under the alternative hypothesis, exceeding the threshold value k. It may be noted that the value of α closer to 1 will increase the number of cases where the condition mentioned in R _{ k } will be satisfied leading higher chance of rejection. Similarly, a value of α closer to 0 will decrease the number of cases where the condition mentioned in R _{ k } will be satisfied leading to lower chance of rejection. The practitioner has to decide the choice of α in a judicious manner striking a balance between the rejection rate and the chance of taking a correct decision.
Results and discussion
To illustrate the process of developing a test procedure using the above method, two examples are considered.
Example 1
The data given in [6] are based on knowledge and experience of three teachers who performed an analysis about the degree of difficulty of a higher mathematics examination. The experimental data describing their estimated average scores and belief degrees are given below.

Teacher 1: (60, 0.05), (70, 0.15), (80, 0.55), (85, 0.85), (90, 0.95)

Teacher 2: (60, 0.08), (70, 0.17), (75, 0.36), (80, 0.58), (85, 0.85), (90, 0.95)

Teacher 3: (50, 0.2), (60, 0.3), (70, 0.4), (80, 0.8), (85, 1)
As mentioned earlier, the most important task in testing of uncertain hypotheses is the formulation of null and alternative hypotheses in a meaningful manner. It has two stages, namely, identifying a suitable uncertainty distribution (e.g., zigzag, normal, and lognormal distributions) for the given situation and the parametric values to be used under the null and alternative hypotheses. This can be accomplished using the works of Liu [3] and Wang and Peng [4] related to estimation of uncertainty distributions.
Here, ${\widehat{\mathit{\Phi}}}_{1}$, ${\widehat{\mathit{\Phi}}}_{2}$, and ${\widehat{\mathit{\Phi}}}_{3}$ are empirical distributions based on first, second, and third teachers. It may be noted that the weightsw _{1}, w _{2}, and w _{3} are nonnegative quantities satisfying, w _{1} + w _{2} + w _{3} = 1. For the sake of simplicity, we assume w _{1}, w _{2}, and w _{3} are $\frac{1}{3}$. For the given data, we have A _{1} = {60,70,80,85,90},A _{2} = {60,70,75,80,85,90}, and A _{3} = {50,60,70,80,85} S = {50,60,70,75,80,85,90}.
Hence, $\frac{d\left(\widehat{\mathrm{\Phi}},{F}_{1}\right)}{d\left(\widehat{\mathrm{\Phi}},{F}_{2}\right)}=1.449482$.
Range of k and σ ( R _{ k } ) for Wang et al. [6] data
k  >5  [2.6,4.9]  [2.0,2.5]  [1.1,1.9]  1.09  [1.02,1.08]  [1.0,1.01]  <1 

σ(R _{ k })  0  $\frac{1}{7}$  $\frac{2}{7}$  $\frac{3}{7}$  $\frac{4}{7}$  $\frac{5}{7}$  $\frac{3}{7}$  $\frac{3}{7}$ 
Example 2
In Example 1, since lognormal uncertainty distribution fitted well for all the three data sets, we have used the same distribution under the null and alternative hypotheses. Now, we consider a testing problem where different distributions are used under the null and alternative hypotheses.
This example is based on the data provided in Chapter 4 of [3]. The expert’s experimental data is given below:
(0.6, 0.1), (1.0, 0.3), (1.5, 0.4), (2.0, 0.6), (2.8, 0.8), (3.6, 0.9).
The least squares estimated values corresponding to lognormal and normal fit for data set are (C = 0.4825, σ = 0.7852) and (C = 1.7690, σ = 1.2953), respectively. The errors corresponding to the lognormal and normal fit are 0.0081 and 0.0074. It is decided to test the null uncertain hypothesis ${H}_{0}:\mathrm{\Phi}\left(x\right)={\left[1+{e}^{\frac{\pi \left(0.4825lnx\right)}{\sqrt{3}\left(0.7852\right)}}\right]}^{1}$ against alternative uncertain hypothesis ${H}_{0}:\mathrm{\Phi}\left(x\right)={\left[1+{e}^{\frac{\pi \left(0.4825lnx\right)}{\sqrt{3}\left(0.7852\right)}}\right]}^{1}.$
It is to be noted that a testing problem of this kind becomes meaningful in these types of situations since the errors do not show a huge difference. If the difference between the errors is considerably large, then one can use the distribution function corresponding to the smaller error as the one suitable for the given uncertain situation without depending on any test procedure.
Since the data is based on only one expert, the test makes use of the empirical uncertainty distribution using the experimental data obtained from the expert.
Therefore, $\frac{d\left(\widehat{\mathrm{\Phi}},{F}_{1}\right)}{d\left(\widehat{\mathrm{\Phi}},{F}_{2}\right)}=0.913856$.
Range of k and σ ( R _{ k } ) for Liu [3] data
K  >55.6  [1.1, 55]  [0.8,1]  0.7  [0.2, 0.6]  0.1  0 

σ(R _{ k })  0  $\frac{1}{6}$  $\frac{2}{6}$  $\frac{3}{6}$  $\frac{4}{6}$  $\frac{5}{6}$  $\frac{6}{6}$ 
Conclusions
In this paper, a new test procedure that makes use of the data gathered from one or more domain experts has been developed for testing whether a specified uncertainty distribution can be the true uncertainty distribution function of the given situation. Two illustrative examples are also provided by making use of the data sets available in [4] and [3]. The first example deals with the case where both the null and alternative hypotheses use the lognormal uncertainty distribution, whereas the second example considers the testing problem where lognormal uncertainty and normal uncertainty distributions are used under null and alternative hypotheses, respectively.
It is pertinent to note that the same methodology can be used for testing the equality of two uncertainty distributions by making use of the ratio used in the construction of the test explained in the third section. Decision regarding the acceptance or rejection of the null hypotheses can be made by making use of the same ratio, namely, $\frac{d\left(\widehat{\mathrm{\Phi}},{F}_{1}\right)}{d\left(\widehat{\mathrm{\Phi}},{F}_{2}\right)}$. However, the null hypothesis will be rejected if the ratio $\frac{d\left(\widehat{\mathrm{\Phi}},{F}_{1}\right)}{d\left(\widehat{\mathrm{\Phi}},{F}_{2}\right)}$ is either very small or very large.
Authors’ information
First author SS is currently holding the position of Professor of Statistics in the University of Madras, Chennai, India. He has more than 30 years experience in teaching and research. His main areas of research are Inference for Finite Populations, Classical Statistical Inference, and Data Mining. He has to his credit more than 30 research articles in highly rated research journals. Second author BR is a fulltime research scholar working for her Ph.D. degree in the Department of Statistics, University of Madras, Chennai, India. Her research interest includes Fuzzy and Rough Set theories.
Declarations
Acknowledgments
The authors wish to thank the referees for their comments and suggestions that lead to considerable amount of improvement in the contents as well as the overall organization of the manuscript.
Authors’ Affiliations
References
 Liu B: Uncertainty theory. 2nd edition. Berlin: SpringerVerlag; 2007.Google Scholar
 Liu B: A branch of mathematics for modeling human uncertainty. Berlin: SpringerVerlag; 2011.Google Scholar
 Liu B: Uncertainty theory. 4th edition. 2013. . Accessed January 2013 http://orsc.edu.cn/liu/ut.pdf Google Scholar
 Wang X, Peng Z: Method of moments for estimation uncertainty distribution. 2012. . Accessed January 2013 http://orsc.edu.cn/online/100408.pdf Google Scholar
 Wang X, Gao Z, Guo H: Delphi method for estimating uncertainty distributions. Information: An International Interdisciplinary Journal 2012,12(2):449–460.MathSciNetGoogle Scholar
 Wang X, Gao Z, Guo H: Uncertain hypothesis testing for expert’s empirical data. Math. Comput. Model. 2012, 55: 1478–1482. 10.1016/j.mcm.2011.10.039MathSciNetView ArticleGoogle Scholar
 Peng Z, Iwamura K: A sufficient and necessary condition of uncertainty distribution. J. Interdiscipl. Math. 2010, 13: 277–285. 10.1080/09720502.2010.10700701MathSciNetView ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.