Does anyone know where I could get information on game theory applied to deterministic, perfect knowledge games with more than 2 players? Think 3player chess, for example. In particular, I'm curious how one would define an optimal strategy.
On the one hand, there will often be a Nash equilibrium in which no player can unilaterally improve their outcome. But if you specifically choose that strategy, you're assuming perfect play on the part of the other players. That's fine in a twoplayer game, because you'll only ever be pleasantly surprised when your opponent makes a mistake. But in 3+ player games, there can easily be circumstances in which one opponent's best move for himself is also helpful for you, and he can hurt you by making a mistake. So it can be dangerous to assume the other players will play perfectly.
You can revert to minimax strategy, which amounts to assuming that all the other players will gang up on you cooperatively. But that doesn't seem like a rational strategy either, since it often won't be a Nash equilibrium, and many games will be hopelessly lost if you assume everyone is conspiring against you.
Is there any means of defining optimal strategy in this scenario that doesn't involve introducing some probabilistic model of your opponents' strategies?
Nplayer combinatorial game theory
Moderators: gmalivuk, Moderators General, Prelates

 Posts: 340
 Joined: Wed Sep 21, 2011 3:44 am UTC
Re: Nplayer combinatorial game theory
This is the only paper I've ever seen that does a decent job of exploring this kind of question. (In case of link rot: the paper is "Stable Winning Coalitions" by Daniel Loeb, and is part of Games of No Chance.)
Zµ«VjÕ«ZµjÖZµ«VµjÕZµkVZÕ«VµjÖZµ«VjÕ«ZµjÖZÕ«VµjÕZµkVZÕ«VµjÖZµ«VjÕ«ZµjÖZÕ«VµjÕZµkVZÕ«ZµjÖZµ«VjÕ«ZµjÖZÕ«VµjÕZ

 Posts: 340
 Joined: Wed Sep 21, 2011 3:44 am UTC
Re: Nplayer combinatorial game theory
Thanks, notzeb! That was a very interesting read.
Although the author still assumes perfect play on the part of all players, he also makes the useful observation that a player will typically prefer a situation in which victory is still *possible*, even if not forceable, over a situation in which victory is completely impossible. He further generalizes this to the principle that game A is preferred to B if it's possible for A to have a better outcome than B, and impossible for B to have a better outcome than A, where 'better outcome' refers to the immediately following situation, rather than the final outcome of the game. This can be applied to 2player games as well.
So a game end state has only two values: victory and defeat. A game of one move has three possible values: guaranteed victory, guaranteed defeat, and conditional victory depending on the move. A game of two moves has at least 5 values: guaranteed victory, guaranteed defeat, possibly guaranteed victory after 1 right move with 1 mistake allowed, possibly guaranteed defeat after 1 mistake, and guaranteed possibility of victory after the first move. The list of preferred scenarios gets more and more complex the further you get from the end state, and this provides a principled way of defining a best strategy even when you're doomed to lose against a perfect opponent.
Having a wider range of preferences also gives a reasonable way to use weaker assumptions of player skill, rather than assuming perfect play. Game A might be better than B if the outcome after 1 move in A is sometimes better than B and never worse, but what if each has the potential to be better than the other? This is where it makes sense to start assuming what the player will do with their turn. Look at all the possible moves that can be made in either game A or B, rank all the possible outcomes by what you would prefer if you were the player making the move, and disqualify all outcomes that are tied for last place on that ranking. This is a weak assumption of player skill, in which we assume that a given player can at least avoid the worst possible outcome, but still might not play perfectly. A' and B' are then the equivalent games with the relevant options removed, and can be compared with the original method. If there's still no clear preference, repeat the process of eliminating the worst remaining option (thereby strengthening the assumption of player skill) as many times as necessary until a preference becomes clear. If all options end up getting eliminated, then A and B are equally preferred.
This obviously goes against the reduction rule in the paper, which assumes perfect play by all players, but I think this might be a more realistic basis for strategy in many situations.
But if the game is already as simple as the reduced forms in the paper, or if we do assume perfect play, then it comes down to how we model incentives to give victory to one opponent or the other, when the value to you in the game is identical for both outcomes (you lose). Something like the iterated prisoner's dilemma is probably applicable in a lot of situations here. If you want to maximize your long term rate of victories, it'll come down to forming alliances with other players and taking turns getting wins. So in the figure 2 example game on page 4 of the pdf (page 454 of the book), A's long term strategy is to alternate between giving victory directly to B, and letting B choose to give the win to A or C. As long as B keeps choosing A on his turn, A can reward him with a win in the next game. If B rebels and chooses C, A can punish B by making B the kingmaker a second time in a row, or by giving victory directly to C.
This is another violation of the suggested reduction rule though, as the assumption was that A would always prefer the option in which B is kingmaker and A has a chance of winning. But the introduction of external incentives, in the form of multiple playthroughs and long term win rates, changes that, because now it's possible to form a meaningful alliance that benefits both members. So the 'fine classification' given in the paper is not a full classification under this assumption. But maybe there are other reasonable models of the incentives for a kingmaker, which still agree with the proposed reduction rules.
ETA:
I suppose you can model the longterm payoff of several games by just using a longer game with a greater range of possible payoffs. Then all the reduction rules based on preferences can still apply, and the introduction of a 'revenge rule' explains the formation of alliances. A revenge rule doesn't make much sense when the payoffs are only victory or defeat, because there's never an incentive for a player to throw away the possibility of victory, even if they know you'll punish them for their moves. But if there are multiple payoffs, a revenge rule can make sense, by cutting off one player's chance at a high payoff and giving them a low payoff, they can be better off by caving to your will and accepting a medium payoff. So the optimal strategy between two equally valuable outcomes is whichever one retroactively would have helped you most if your opponents had anticipated your retribution and acted rationally in response to that anticipation. Similar dynamics should also arise from formalizing the value of coming in 2nd vs. coming in last. The existence of a mediumvalue payoff makes all the difference to the incentives to cooperate.
Although the author still assumes perfect play on the part of all players, he also makes the useful observation that a player will typically prefer a situation in which victory is still *possible*, even if not forceable, over a situation in which victory is completely impossible. He further generalizes this to the principle that game A is preferred to B if it's possible for A to have a better outcome than B, and impossible for B to have a better outcome than A, where 'better outcome' refers to the immediately following situation, rather than the final outcome of the game. This can be applied to 2player games as well.
So a game end state has only two values: victory and defeat. A game of one move has three possible values: guaranteed victory, guaranteed defeat, and conditional victory depending on the move. A game of two moves has at least 5 values: guaranteed victory, guaranteed defeat, possibly guaranteed victory after 1 right move with 1 mistake allowed, possibly guaranteed defeat after 1 mistake, and guaranteed possibility of victory after the first move. The list of preferred scenarios gets more and more complex the further you get from the end state, and this provides a principled way of defining a best strategy even when you're doomed to lose against a perfect opponent.
Having a wider range of preferences also gives a reasonable way to use weaker assumptions of player skill, rather than assuming perfect play. Game A might be better than B if the outcome after 1 move in A is sometimes better than B and never worse, but what if each has the potential to be better than the other? This is where it makes sense to start assuming what the player will do with their turn. Look at all the possible moves that can be made in either game A or B, rank all the possible outcomes by what you would prefer if you were the player making the move, and disqualify all outcomes that are tied for last place on that ranking. This is a weak assumption of player skill, in which we assume that a given player can at least avoid the worst possible outcome, but still might not play perfectly. A' and B' are then the equivalent games with the relevant options removed, and can be compared with the original method. If there's still no clear preference, repeat the process of eliminating the worst remaining option (thereby strengthening the assumption of player skill) as many times as necessary until a preference becomes clear. If all options end up getting eliminated, then A and B are equally preferred.
This obviously goes against the reduction rule in the paper, which assumes perfect play by all players, but I think this might be a more realistic basis for strategy in many situations.
But if the game is already as simple as the reduced forms in the paper, or if we do assume perfect play, then it comes down to how we model incentives to give victory to one opponent or the other, when the value to you in the game is identical for both outcomes (you lose). Something like the iterated prisoner's dilemma is probably applicable in a lot of situations here. If you want to maximize your long term rate of victories, it'll come down to forming alliances with other players and taking turns getting wins. So in the figure 2 example game on page 4 of the pdf (page 454 of the book), A's long term strategy is to alternate between giving victory directly to B, and letting B choose to give the win to A or C. As long as B keeps choosing A on his turn, A can reward him with a win in the next game. If B rebels and chooses C, A can punish B by making B the kingmaker a second time in a row, or by giving victory directly to C.
This is another violation of the suggested reduction rule though, as the assumption was that A would always prefer the option in which B is kingmaker and A has a chance of winning. But the introduction of external incentives, in the form of multiple playthroughs and long term win rates, changes that, because now it's possible to form a meaningful alliance that benefits both members. So the 'fine classification' given in the paper is not a full classification under this assumption. But maybe there are other reasonable models of the incentives for a kingmaker, which still agree with the proposed reduction rules.
ETA:
I suppose you can model the longterm payoff of several games by just using a longer game with a greater range of possible payoffs. Then all the reduction rules based on preferences can still apply, and the introduction of a 'revenge rule' explains the formation of alliances. A revenge rule doesn't make much sense when the payoffs are only victory or defeat, because there's never an incentive for a player to throw away the possibility of victory, even if they know you'll punish them for their moves. But if there are multiple payoffs, a revenge rule can make sense, by cutting off one player's chance at a high payoff and giving them a low payoff, they can be better off by caving to your will and accepting a medium payoff. So the optimal strategy between two equally valuable outcomes is whichever one retroactively would have helped you most if your opponents had anticipated your retribution and acted rationally in response to that anticipation. Similar dynamics should also arise from formalizing the value of coming in 2nd vs. coming in last. The existence of a mediumvalue payoff makes all the difference to the incentives to cooperate.
Who is online
Users browsing this forum: No registered users and 3 guests