An attempt at replicating AlphaGo by DeepMind.
Taken from the Rochester-NRT/RocAlphaGo project.
Implements the data generation functionalities for RL and Value iteration stages.
Functions RL_Playout(numGames, policyModel, filename=None, opponentModel)
and Value_Playout(numGames, sl_model, rl_model, filename, U_MAX)
wraps around those functions numGames
times and stores the result to an .hdf5
file specified via filename
.
Functions Gym_DataGen(policyModel)
, RL_DataGen(policyModel, opponentModel)
, and valueDataGen(sl_model, rl_model, U_MAX)
implements 1 pass through of a simulation, and returns appropriate data for that simulation.
.hdf5 file contents for each functions are as follows:
*RL_Playout()
- 'states' 'actions' 'rewards' (actions not 1-hot encoded)
*Value_Playout()
- 'states' 'rewards'
Implements the Go player class.
Important Fields:
self.states
- A list of all states encountered while playingself.actions
- A list of all actions madeself.nnmodel
- NN backend that makes the decisionself.color
- NNGoPlayer.BLACK or NNGoPlayer.WHITEself.rocColor
- Rocgo.BLACK or Rocgo.WHITEself.pachiColor
- pachi_py.BLACK or pachi_py.WHITE
Important Functions:
makemoveGym()
makemoveRL(playRandom)
makeRandomValidMove()
nn_vs_nnGame(rocEnv, playBlack, nnBlack, nnWhite)
is also implemented, and it plays out a game between two NNGoPlayer classes starting at the board configuration specified in rocEnv
Implements I/O related functions.
Useful Functions:
write2hdf5(filename, dict2store)
hdf52dict(hdf5Filename)
hdf5Augment(filename, outfilename)
pachiGameRecorder(filename)
Wrapper functions for the Rochester Go Board implementations.
Useful Functions:
initRocBoard()
rocBoard2State(rocEnv)
printRocBoard(rocEnv)
returnRocBoard(rocEnv)
get_legal_coords(rocEnv)
intMove2rocMove(rocEnv)
A Monte-Carlo Tree Search implementation. Class MCNode
represents a node in a tree. MCTreeSearch()
can be called to initiate the search.