Monty Hall game: a host with limited budget
 Agustín Alvarez^{1} and
 Juan P Pinasco^{2}Email author
https://doi.org/10.1186/2195546822
© Alvarez and Pinasco; licensee Springer. 2014
Received: 17 August 2013
Accepted: 2 January 2014
Published: 17 January 2014
Abstract
Abstract
In this paper we introduce a new version of the classical Monty Hall problem, where the host is trying to maximize the audience while restricted in its budget. This problem is related to the design of games with a predetermined outcome and decisionmaking process under uncertainty when the agent does not know if the received advice is favorable or not.
Keywords
Monty Hall Probability Game theoryIntroduction
The Monty Hall problem appeared first in a letter to the American Statistician of S. Selvin [1], and it is a nice and controversial problem for introductory courses in probability, statistics, and game theory. The problem can be posed as follows:
You are playing a game on a TV show. There are three doors. One of them has a car in the backside, and the other ones have goats. You select one of the doors, say door No. 1. And before opening it, the host who knows where the car is, opens another door, say No. 3, which has a goat. And then he gives you the possibility to change, allowing you to pick door No. 2. What do you do?
The problem posed in this way may lead to a lot of controversy, mainly because we do not know whether the behavior of the host had anything to do with your first choice or not. As Gill stated in [2], this is a problem of mathematical modeling, and the answer is not a probability but a decision, and the decision must be chosen in a setting of uncertainty.
Perhaps the host would open a door with a goat only when your first choice was right. In this case, it was not a good choice to change doors. But if the host always shows you a door with a goat after your first choice, then by changing the first choice you increase your probability of winning a car from 1/3 to 2/3. One of the easiest way of seeing this is the following: imagine that you play this game repeatedly, around 1/3 of the times your first choice is right, and then you do not win the car because you change doors; in the other 2/3 of the times your first choice is wrong, and changing to the door the host did not open will make you win the car. So, 2/3 of the times you will be winning the car.
There are many variants on this game, see the recent book of Rosenhouse [3]. Some of the analysis are based on game theory although there is only one player, since the host’s behavior is completely determined. In all the variants we know, the game is analyzed from the point of view of the player, and the host acts in almost a deterministic way (of course, sometimes the host takes a decision at random). In several variants, the host’s behavior correspond to one of two main cases: the malicious host who opens a door only when the player chooses the right door, and the benevolent host who opens a door only when the player chooses one of the wrong doors, see [4, 5] and Chapter 5 in [3]. A combination of both cases can be found in [6], and it is assumed that the participant has no information on the proportion of times that the host behaves as a malicious or benevolent host, and he cannot change doors when the host does not open a door.
We will restrict our analysis to this case, since lower values of α can be easily allowed by adding more doors, and higher values can be obtained by adding more doors and then opening more than one with goats. This general case follows in almost the same way.
Moreover, we assume that the host believes that the success of the game is based on the tense moment in which the host has opened a door with a goat after the first choice of the player and gives the player the possibility of changing his choice. So the host plays with a strategy which fixes the probability α of the player of winning the car and maximizes the proportion of shows in which the host opens a door with a goat.
We can think of this problem as an inverse problem in game theory, since α is the minimax solution of the zerosum game between the player and the host: it is the minimum expected prize that the player can win and the maximum expected prize that the host can pay.
Another difference with other versions of the Monty Hall problem is the following: the player will have the option of changing his first choice both when the host shows a door with a goat and also when he does not show anything. This is a variant that seems to be overlooked in previous works; although in many real game shows, the host offers the player the option to change his decision, asking repeatedly things like ‘Are you sure?’, ‘Do you want to change your answer?’, and ‘Is that your final answer?’ Usually, there exists a key question which finishes the option to change. We believe that it is more realistic to include both option  the player can change his choice even if no door is open and the host is constrained in the number of times he can offer to open a door  in this game.
Finally, let us mention that this model is related to a more complex multiagent problem, where each agent is a driver choosing among few roads. A realtime device  like a radio station, GPS, and intelligent transport systems  can inform or not the state of the roads, reducing the travel time uncertainty. However, several little accidents in the same road (cars stopped due to flat tires or lack of gasoline) are not interesting enough to catch the attention of the media compared with a major accident or the effect on the traffic cannot be measured quickly, although the road starts to be congested. Here, the system acts as the host, and in the worstcase scenario, it gives the minimum information about traffic or the information reaches the agent when a change of roads is not possible due to communications delays. We refer the interested reader to [8–10] for models of route choice with realtime information, and a discussion about the behavioral mechanism of drivers in this kind of decisionmaking process.
The work is organized as follows: in Section ‘The Monty Hall problem and our model’, we introduce the parameters of the model and the rules of the game. In Section ‘Optimal strategiesOptimal strategies’, we formulate and solve an equivalent problem, and we find the optimal strategies of the host and the player. The equivalence of the problems is proved in Section ‘The dual problem’ where we solve the inverse problem giving the optimal strategies as functions of the probability parameter α. We compare the payoffs when the players is not allowed to change doors if the host does not open an extra door in Section ‘A related modelA related model’, and we conclude in the Section ‘Conclusions’.
The Monty Hall problem and our model
A variant of the problem
 1.
There are three doors, one of them has a car in the backside, and the other ones have goats.
 2.
The host keeps the winning probability fixed at some α∈ [ 1/3,2/3].
 3.
The host knows where the car is, and his strategy will be based in two numbers, say:

m = Probability of the host showing a door with a goat given the first choice of the player was right.

b = Probability of the host showing a door with a goat given the first choice of the player was wrong.
We can associate as in [6] the letter m to malevolence, since the host is tempting the player to change his choice when the player has chosen the right door; and the letter b to benevolence, since the host is tempting the player to change his mind after he has chosen the wrong door.
 4.
The host wants to maximize the number of times he opens a door.
 5.
The host understands that in the long run both numbers m and b will be well estimated by the show’s followers since in each tv show, when the game is over, all three doors are shown to the viewers in order to demonstrate the transparency of the game. So, the host expects that the player will know m and b and that the player will act in order to maximize its probability of winning.
 6.
The player’s strategy is based on two probabilities, say:

c = Probability for the player changing given the host has open a door with a goat.

n = Probability for the player changing given the host has not open a door with a goat.

The players will use the same values of m, b, n and c in each game.
The problem, now, is as follows: How the probabilities m, b, c and n must be chosen by the host and the player? Recall that the host is trying to maximize the expected number of times that he will open a door, keeping the probability α fixed, and the player is trying to win the car.
We have the following sequential game in each tv show between the host and the player:

A car is hidden behind one of three doors and remains there until the game is finished.

The player chooses a door.

The host  knowing if this choice was right or wrong  decides to open a door or not by using the probabilities m or b, respectively. If he opens a door, he shows a door with a goat.

The player can change his initial choice, knowing the host’s decision, and the new information if the host shows a goat or not, according to the probabilities c or n.

The game finishes and the host open all the doors.

The player wins if and only if the car is behind the door he finally choose.
Observe that the host must pay the full value of the car when the player wins. The law of large numbers enables him to estimate the prizes as the number of shows times the winning probability, although the variance introduces a serious risk when expensive prizes are involved. A classical way to deal with this uncertainty is to obtain a specialized coverage from an insurance company and the cost will depend on the player’s winning probability.
Optimal strategies
Let us call $\mathbb{P}(m,b,c,n)$ the player’s probability to finally win the car when the host’s strategy is (m, b) and the player’s strategy is (c, n).
which gives the best probability for the player to finally win the car if the host uses strategy (m, b).
Let us now calculate the probability for the host showing a door with a goat in terms of m and b. First of all we name some events as follows:

O = ‘The host opens a door with a goat’.

F_{R} = ‘Player’s first choice was right’.

F_{W} = ‘Player’s first choice was wrong’.

PW = ‘The player finally wins using his best possible strategy’.

C = ‘The player changes his first choice’.

NC = ‘The player does not change his first choice’.
We will call O^{c} the complementary event of O.
that is, the host is maximizing the number of times that he opens a door, keeping the player’s winning probability bounded by α.
A dual problem
namely, the host chooses his probabilities in order to minimize the winning probability of the player, constrained to open the door in at least 100·β% of the shows.
As we will see in the next section (see Lemma 4.1), it is equivalent to solve any of the problems (4) and (5). However, the minimization problem (5) is conditioned on a very simple restriction, and we can give explicitly one of the variables in terms of the other; problem (4) on the other hand is conditioned on PFW which is given by a piecewise function
Now, if the host deviates from the optimal strategy, there are two possibilities: when he opens a door more times, he is playing with higher values of m and/or b, and the dual problem shows that the player will win more games than the ones allowed by the host’s budget (recall that the player can detect the values of m and b). On the other hand, by reducing the number of time he opens a door, the player’s winning probability decreases, and the host spent only a part of the budget.
So we will concentrate now in solving problem (5), and in Section ‘The dual problem’ we show that if this problem has a solution, this must be a solution of problem (4).
An expression for PFW
First of all we compute PFW. Notice that when the host opens a door with a goat, the player has only two options: either he keeps his first choice with a probability of winning $\mathbb{P}\left({F}_{\mathrm{R}}\rightO)$ or he can change his first option to the other unknown door with a probability of winning of $1\mathbb{P}\left({F}_{\mathrm{R}}\rightO)$. Whereas when the host does not open a door with a goat, he can stay with a probability of winning of $\mathbb{P}\left({F}_{\mathrm{R}}\right{O}^{\mathrm{c}})$ or he can choose any of the other two doors with probability of winning $\frac{1\mathbb{P}\left({F}_{\mathrm{R}}\right{O}^{\mathrm{c}})}{2}$
since we will take c = 0 (respectively, c = 1) when $\mathbb{P}\left({F}_{\mathrm{R}}\rightO)>\mathbb{P}({F}_{\mathrm{W}}\leftO\right)$ (resp., when $\mathbb{P}\left({F}_{\mathrm{R}}\rightO)<\mathbb{P}({F}_{\mathrm{W}}\leftO\right)$). Clearly, the player is indifferent when $\mathbb{P}\left({F}_{\mathrm{R}}\rightO)=\mathbb{P}({F}_{\mathrm{W}}\leftO\right)$.
For other values of (m, b), using Equations 1, 2, 3, 6, and 7 we get
The function PFW
We now analyze the function PFW(m, b). In the previous formula for PFW there are two maxima, and we can think of it as a piecewisedefined function. The changes occur in the lines b = m and $b=\frac{1}{2}m$.
We will replace now the minimization problem (5) by a simpler one, which can be solved explicitly in terms of h_{1, β}.
The minimization problem
Observe that we are minimizing now on the boundary of the restriction of problem (5).
Observe that h_{1, β}(m) decreases when $0\le m\le \frac{3\beta}{2}$, and increases when $\frac{3\beta}{2}<m\le 1$. We then find the following:

If $\frac{3\beta}{2}\le 1$, then h_{1, β} has a unique minimum at $m=\frac{3\beta}{2}$.

If $\frac{3\beta}{2}\ge 1$, the function h_{1, β} decreases in the whole interval [ 0,1] and has its minimum at m = 1.
Hence, in order to compute the function h_{2}, let us note that

If $\frac{3\beta}{2}\le 1$, then $\widehat{m}\left(\beta \right)=\frac{3}{2}\beta $ which implies that $\widehat{b}\left(\beta \right)=\frac{3\beta}{4}$.

If $\frac{3\beta}{2}\ge 1$, $\widehat{m}\left(\beta \right)=1$ which implies that $\widehat{b}\left(\beta \right)=(\beta \frac{1}{3})\frac{3}{2}$.
so h_{2} is increasing in β, which implies that $\left(\stackrel{~}{m}\right(\beta ),\stackrel{~}{b}(\beta \left)\right)=\left(\widehat{m}\right(\beta ),\widehat{b}(\beta \left)\right)$ is the unique solution of (5).
Host’s strategies
It’s worth noticing that for $0\le \beta \le \frac{2}{3}$ (or equivalently $\frac{1}{3}\le \alpha \le \frac{1}{2}$ as we show in Section ‘The dual problem’), the optimum strategy of the host consist in having double malevolence, that is $\stackrel{~}{m}\left(\beta \right)=2\stackrel{~}{b}\left(\beta \right)$. Whereas if $\frac{2}{3}<\beta \le 1$ or equivalently $\frac{1}{2}<\alpha \le \frac{2}{3}$, this is not possible since $\stackrel{~}{m}\left(\beta \right)=1$ and $\stackrel{~}{b}\left(\beta \right)>\frac{1}{2}$.
The function h_{2} defined in (8) represents the winning probability of a player following its best strategy given that the host is using its best strategy. It can be seen in (9) that this probability increases with a higher speed ($\frac{1}{2}$ instead of $\frac{1}{4}$) when the host is not able of having malevolence anymore while doubling the benevolence.
Player’s strategies
We are assuming that in the long term, the player will estimate well the parameters m and b of the host. If the host opens an extra door, the player must compare $\mathbb{P}\left({F}_{\mathrm{R}}\rightO)$ with $\frac{1}{2}$ and

If $\mathbb{P}\left({F}_{\mathrm{R}}\rightO)<\frac{1}{2}$, then the player must change door.

If $\mathbb{P}\left({F}_{\mathrm{R}}\rightO)>\frac{1}{2}$, then the player must keep his first choice.

If $\mathbb{P}\left({F}_{\mathrm{R}}\rightO)=\frac{1}{2}$, then is indistinct.
If the host does not open an extra door, the player must compare $\mathbb{P}\left({F}_{\mathrm{R}}\right{O}^{\mathrm{c}})$ with $\frac{1}{3}$ and now:

If $\mathbb{P}\left({F}_{\mathrm{R}}\right{O}^{\mathrm{c}})<\frac{1}{3}$, then the player must change door.

If $\mathbb{P}\left({F}_{\mathrm{R}}\right{O}^{\mathrm{c}})>\frac{1}{3}$, then the player must keep his first choice.

If $\mathbb{P}\left({F}_{\mathrm{R}}\right{O}^{\mathrm{c}})=\frac{1}{3}$, then is indistinct.
If the host is using an optimal strategy, then there are two possibilities for these parameters:

If m = 2b then if the host shows an extra door, $\mathbb{P}\left({F}_{\mathrm{R}}\rightO)=\frac{1}{2}$, and changing is indistinct. In addition, if the host does not show an extra door, $\mathbb{P}\left({F}_{\mathrm{R}}\right{O}^{\mathrm{c}})<\frac{1}{3}$, and the player must change door.

If m = 1 and $b>\frac{1}{2}$ then if the host opens an extra door, $\mathbb{P}\left({F}_{\mathrm{R}}\rightO)<\frac{1}{2}$, and the player must change. In addition, if the host does not open an extra door, then $\mathbb{P}\left({F}_{\mathrm{R}}\right{O}^{\mathrm{c}})=0<\frac{1}{3}$, and so the player must change door.
In summary, the player’s strategy could be always changing and would be doing as best as possible if the host is using an optimal strategy.
The dual problem
Our goal is to show now the equivalence of solving (4) or (5). This gives the values of the desired optimal probabilities of the host (m^{⋆}, b^{⋆}) as functions of α.
Lemma 4.1.
Proof
Indeed, we get the classical Monty Hall problem and in this case f(m, b) = 2/3.
Let us consider now the case β < 1.
Take (m, b) such that g(m, b) > β, then f(m, b) > α. Otherwise, (m^{⋆}(α), b^{⋆}(α)) did not maximize (11).
If g(m, b) = β < 1, then (m, b) ≠ (1,1). Since g is continuous and (m, b) is not a local maxima, there exists a sequence {(m_{ n }, b_{ n })}_{n≥1} which converges to (m, b) with g(m_{ n }, b_{ n }) > β. In particular, f(m_{ n }, b_{ n }) > α.
So, by continuity of f we have f(m, b)≥α. So we have seen that in any case that g(m, b)≥β then f(m, b)≥α.
Then, by inequality (13), we get f(m^{⋆}(α), b^{⋆}(α))=α so is a minimizer for (12) which is unique and so $\left({m}^{\star}\right(\alpha ),{b}^{\star}(\alpha \left)\right)=\left(\stackrel{~}{m}\right(\beta ),\stackrel{~}{b}(\beta \left)\right)$.
since we know from Lemma 4.1 that $\left({m}^{\star}\right(\alpha ),{b}^{\star}(\alpha \left)\right)=\left(\stackrel{~}{m}\right(\beta \left(\alpha \right)),\stackrel{~}{b}(\beta \left(\alpha \right)\left)\right)$. The next lemma will show that β(α) is the inverse function of h_{2} defined in (8).
Lemma 4.2.
are well defined and that one of them (say Ψ) is strictly increasing in the interval I, then Ψ:I→I m(Ψ)is invertible and its inverse is Φ:I m(Ψ)→I.
Proof.
Let α = Ψ(β), then there exists an element k_{0} ∈ K such that g(k_{0}) ≥ β and f(k_{0}) = α. Actually g(k_{0}) = β, because if g(k_{0}) = β^{′}>β then Ψ(x) ≡ α in the interval [ β, β^{′}] which contradicts the fact of Ψ being strictly increasing.
Now, to prove the lemma is enough to see that Φ(α) = β. Since f(k_{0}) ≤ α, then Φ(α) ≥ g(k_{0}) = β. Suppose Φ(α) = β^{′} > β. Then, there exists k_{1} such that f(k_{1}) ≤ α and g(k_{1}) = β^{′} > β. Therefore Ψ(β^{′}) ≤ α = Ψ(β), being β < β^{′} which is absurd because Ψ is strictly increasing.
Then h_{2} = Ψ and seeing that the formula (9) $\Psi :\phantom{\rule{0.3em}{0ex}}[\phantom{\rule{0.3em}{0ex}}0,1]\to [\phantom{\rule{0.3em}{0ex}}\frac{1}{3},\frac{2}{3}]$ is strictly increasing. Due to Lemma 4.2, $\beta \left(\alpha \right):\phantom{\rule{0.3em}{0ex}}[\phantom{\rule{0.3em}{0ex}}\frac{1}{3},\frac{2}{3}]\to [\phantom{\rule{0.3em}{0ex}}0,1]$ is its inverse.
A related model
So we have the following:

If $\frac{3\beta}{2}\le 1$, then any $m\ge \frac{2\beta}{3}$ is a global minimum of ${h}_{1,\beta}^{c}\left(m\right)$ and its minimum value is $\frac{1}{3}$.

If $\frac{3\beta}{2}\ge 1$, then ${h}_{1,\beta}^{c}\left(m\right)$ has its minima in m = 1 and its minimum value is $\beta \frac{1}{3}$.
This formula can be compared with (15).
We see that the player’s chances are worse than in the previous version of the game. From the host’s perspective, in this version he can pay the same prizes as before, by opening an extra door more times. However, this is achieved by reducing the number of times the host opens a door when the player misses with his first choice and does not have the option to change doors, which is a very unfriendly policy.
Conclusions
We have analyzed a different version of the Monty Hall problem, where the player faces a host which has its own objectives (maximize the audience) and limited budget. From the host’s perspective, he must solve an inverse problem in zerosum game theory: to determine the payoffs of the game with a given minimax equilibria. From the player’s point of view, this is a toy model of a decision process where the agent does not know if the received information is beneficial or not (i.e., the host is benevolent or malevolent), although the player knows the probability of each behavior.
We show that we can formulate the problem both in terms of the expected prize that the host will pay (α) and in terms of the proportion of times he opens a door (β). The later one is better for computations.
Declarations
Acknowledgements
AA is a fellow of University of Buenos Aires, and JPP is a member of CONICET. This research was partially supported by grants W276 and 20020100100400 from the University of Buenos Aires and by CONICET (Argentina) PIP 5478/1438.
Authors’ Affiliations
References
 Selvin S: A problem in probability. Am. Statistician 1975, 29(1):67.View ArticleGoogle Scholar
 Gill R: The Monty Hall problem is not a probability puzzle (it’s a challenge in mathematical modelling). Statistica Neerlandica 2011, 65: 58–71. 10.1111/j.14679574.2010.00474.xMathSciNetView ArticleGoogle Scholar
 Rosenhouse J: The Monty Hall Problem. New York: Oxford University Press; 2009.MATHGoogle Scholar
 Granberg D: To switch or not to switch. In : vos Savant, M (ed.) The Power of Logical Thinking. New York: St. Martin’s Press; 1996.Google Scholar
 Tierney J: Behind Monty Hall’s doors: puzzle, debate and answer? The New York Times July 21 1991.Google Scholar
 Schuller JC: The malicious host: a minimax solution of the Monty Hall problem. J. Appl. Stat 2012, 39: 215–221. 10.1080/02664763.2011.580337MathSciNetView ArticleGoogle Scholar
 Fernandez L, Piron R: Should she switch? A gametheoretic analysis of the Monty Hall problem. Math. Mag 1999, 72: 214–217. 10.2307/2690884MathSciNetView ArticleMATHGoogle Scholar
 AbdelAty MA, Abdalla MF: Examination of multiple mode/routechoice paradigms under ATIS. IEEE Trans. Intell. Transportation Syst 2006, 7: 332–348. 10.1109/TITS.2006.880634View ArticleGoogle Scholar
 BenElia E, Shiftan Y: Which road do I take? A learningbased model of route choice with realtime information. Transportation Res. Part A Policy Pract 2010, 44: 249–264. 10.1016/j.tra.2010.01.007View ArticleGoogle Scholar
 Gao S, Frejinger E, BenAkiva E: Adaptive route choices in risky traffic networks: a prospect theory approach. Transportation Res. Part C: Emerg. Technol 2010, 18: 727–740. 10.1016/j.trc.2009.08.001View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.