Skip to content

Genetic Algorithm applied to strategies playing in the Prisoner's Dilemma Game

Notifications You must be signed in to change notification settings

mkdir-eherms/Prisoner-Dilemma-Genetic-Algorithm

Repository files navigation

Prisoner-Dilemma-Genetic-Algorithm

Replication of genetic algorithms in the Prisoner’s Dilemma

The Prisoner’s Dilemma is a social, economic decision-making game invented by Merrill Flood and Melvin Dresher in the 1950’s. The premise of the game is that you and your partner in crime are arrested and held in separate cells with no communication. Your sentence is dependent on you and your partners cooperation. If you and your partner both agree to cooperate and stay silent, you each get 2 years in prison. If you cooperate and your partner defects (i.e., testifies against you) then you get 5 years in prison and your partner gets zero or vis versa if you defect and they cooperate. If you both defect, then you each get 4 years in prison. However, the game has been changed slightly, so that participants play for points rather than artificial years in prison. Points are motivating, easier to incentivize as a dollar amount, and can have more meaning as they accumulate in an iterated Prisoner’s Dilemma game. In this game, if both players cooperate, they each get 3 points. If one player cooperates and the other defects, the player that cooperated gets 0 points, while the player that defected gets 5 points. If both players defect, they each get 1 point. Therefore, the Prisoner’s Dilemma is classified as a social decision-making game due to the continuous interaction with one partner when the game is played iteratively. It can also be classified as an economic decision-making game for the potential money one can earn. What is the “rational” decision to earn the most points? If the game only lasts for one decision, the optimal choice is to defect because you can potentially earn the most points (5 points) if your partner cooperates. Also, by defecting you are guaranteed at least 1 point. However, once the game becomes iterative, it is possible for both players to earn more points by cooperating, rather than both constantly defecting and getting 1 point each. What strategies earn the most points? How is it possible to get both players to cooperate? The current project is a replication of Robert Axelrod’s genetic algorithm to evolve strategies for the Prisoner’s Dilemma (although no code was used from other sources, just the conceptualization). The extension of the current project is to manipulate the fitness function to evaluate how that impacts the evolved strategies and does or does not lead to strategies that will cooperate.

Robert Axelrod previously use a genetic algorithm to evolve strategies for the Prisoner’s Dilemma (Axelrod, 1984; Axelrod & Dion, 1988). He included eight different strategies, which he evolved over 50 runs of 40 simulations. Each strategy had access to the last 3 decisions it and the partner strategy made. Axelrod used a 70-character string, where the first 64 characters represented the decision a given strategy would make for each of the 64 possible combinations of 3 previous decisions (i.e., CC CC CC; CC CC CD; CC CC DD). The last 6 characters would represent the memory of the 3 previous decisions. In my set up I used a 64-character string and a dictionary to map the index of the string to a given combination of the last 3 games. I decided to use an array of dictionaries for each of my strategies I would evolve. This way I could store more information, including the original strategy, the previous 3 decisions, the points earned for each decision, the total points and the original vs ending string strategy to see how it evolved. The strategies that I used for my environment can be found in table 1. In each tournament a given strategy played every other strategy one time and the order in which the strategies played each other was randomized. Lastly, I decided to run the tournaments without mutation, but with a set recombination value to get a better sense of how the fitness function impacted the evolution of the strategies.

The first fitness function evaluated which strategy had the most points after each game (i.e. strategy 1 vs strategy 2 who had the most points). This meant that after each game the strategy with the least points underwent recombination, potentially taking on the genes (i.e. decision choice represented by the index of the string which maps onto specific combination of last 3 decisions) of the winning strategy. A potential con to this fitness function is that losing strategies had potentially gone through recombination by the time they played their second game of a tournament. At the end of the simulation, all the strategies had gone through recombination. On average, each of the strategies were scoring just over 50 points in each game. Taking a closer look at each of the final strategy strings, the ratio of cooperation to defect leans strongly toward defecting. There does not seem to be a pattern similar to Tit for Tat where the strategies cooperate if the other player is cooperating, which has previously been the “most fit” strategy in other simulations. Also, the strategies are averaging less points per game than they started off, which also suggests that they are not reciprocating cooperation. Since the fitness function is operating at a game level, a strategy that does bad in a single game will go through recombination, despite potentially winning more games overall. This seems to greatly influence the evolved strategies. The third fitness function (skipping the second fitness function in the code due to space) evaluated how many wins each strategy had following one tournament. The strategy with the most wins “infected” all other strategies by having them go through potential recombination. This meant that recombination did not happen until after each tournament. At the end of the simulation, all the strategies had gone through recombination. On average, each of the strategies were scoring around 80 points per game. Taking a closer look at each of the final strategy strings, the ratio of cooperation to defect was about 50:50, suggesting that the evolved strategies were cooperating, and defecting based on previous decisions. The strategies were similar to Tit for Tat but were not always basing decision off of the partners last decision. Also, the strategies had a higher average score, suggesting some level of cooperation was elicited (if both strategies always defecting would not make this many points). In conclusion, this fitness function was able to replicate Axelrod’s genetic algorithm. Depending on how the simulation is set up and specifically what fitness function was used, greatly impacts the evolved strategies.

Table 1. Strategies
Tit for Tat Next decision will be the same as the partners previous decision.
Random Decision Randomly assigned to cooperate or defect at each index of strategy string.
Defect Always Defect.
Cooperate Always Cooperate.
Bias for Cooperation If partner has cooperated in at least 1/3 previous decisions, will cooperate in next decision.
Copy Majority If partner cooperated on 2/3 last decisions, then cooperate. If partner defected on 2/3 last decisions, then defect.
Table 2. Terminology used
Decision One decision to cooperate or defect in a game of 50 decisions
Game Two strategies playing 50 iterative decisions against one another
Tournament Each strategy plays every other strategy in one game
Runs How may tournaments played in one simulation
Simulations How many simulations of x number of runs were done to find the winner on average. There may be some randomness in who plays who first, that impacts how the strategies evolve.

References

Axelrod, R. (1984). The evolution of cooperation. New York: Basic Books.

Axelrod, R., & Dion, D. (1988). The further evolution of cooperation. Science, 242(4884), 1385–1390. http:// doi.org/10.1126/science.242.4884.1385.

About

Genetic Algorithm applied to strategies playing in the Prisoner's Dilemma Game

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published