================= Chess-Ant Article ================= :author: Akihiro Kuroiwa :date: 2021/12/25 :abstract: Like AlphaFold developed by DeepMind, MCTS can be applied to bioinformatics and cheminformatics. Chess-Ant is influenced by AlphaZero, but tries a different method. I would like to show the possibility that the combination of MCTS Solver and Genetic Programming can be put to practical use. Influence from AlphaZero ======================== A variant of the Polynomial Upper Confidence Tree (PUCT) introduced in AlphaGo Zero and AlphaZero complements the exploration rate :math:`C(s)` with the prior probability :math:`P(s,a)`. My chess-ant uses the conventional Upper Confidence Tree (UCT) and replaces the adjustment of the constant :math:`C` with Genetic Programming (GP) [#]_. From :math:`1/\sqrt{1}` to :math:`1/\sqrt{9}`, GP chooses the value of the constant :math:`C` according to the conditions. Depending on the value of the constant :math:`C`, :file:`chess_ant.py` will either actively or passively search. AlphaZero's PUCT [#]_ [#]_ [#]_ [#]_: .. math:: a_{t} = arg\, max_{a}(Q(s_{t}, a) + U(s_{t}, a)) U(s,a)=C(s)P(s,a)\frac{\sqrt{N(s)}}{{1+N(s,a)}} C(s)=log\frac{1+N(s)+c_{base}}{c_{base}}+c_{init} A deep neural network :math:`(p, v) = f_{\theta}(s)` with parameters :math:`\theta` evaluates the leaf node :math:`s_{L}` : .. math:: (p, v) = f_{\theta}(s_{L}) Initialize: .. math:: N(s_{L} , a) = 0 W(s_{L} , a) = 0 Q(s_{L} , a) = 0 P(s_{L} , a) = p_{a} Update: .. math:: t \leq L N(s_{t} , a_{t}) = N(s_{t} , a_{t}) + 1 W (s_{t} , a_{t}) = W(s_{t} , a_{t}) + v Q(s_{t} , a_{t}) = \frac{W(s_{t} , a_{t})}{N(s_{t} , a_{t})} +-----------------+---------------------------------------------------------------------+ | |Details | +=================+=====================================================================+ |(s,a) |Each state-action pair. | +-----------------+---------------------------------------------------------------------+ |N(s) |The parent visit count. | +-----------------+---------------------------------------------------------------------+ |N(s,a) |The visit count. | +-----------------+---------------------------------------------------------------------+ |W(s,a) |The total action-value. | +-----------------+---------------------------------------------------------------------+ |Q(s,a) |The mean action-value. | +-----------------+---------------------------------------------------------------------+ |P(s,a) |The prior probability of selecting a in s. | +-----------------+---------------------------------------------------------------------+ |C(s) |The exploration rate. | +-----------------+---------------------------------------------------------------------+ |p |A vector of Move probabilities (predicted by a deep neural network). | +-----------------+---------------------------------------------------------------------+ |v |A scalar value (predicted by a deep neural network). | +-----------------+---------------------------------------------------------------------+ Chess-Ant's UCT [#]_ [#]_ [#]_ [#]_ [#]_: .. math:: a_{t} = arg\, max_{a}(Q(s_{t}, a) + C_{gp}\sqrt\frac{2lnN(s_{t})}{N(s_{t},a)}) C_{gp} = \begin{cases} 1/\sqrt{1}, \\ 1/\sqrt{2}, \\ 1/\sqrt{3}, \\ 1/\sqrt{4}, \\ 1/\sqrt{5}, & \mbox{If the previously selected node state is under certain conditions} \\ 1/\sqrt{6}, \\ 1/\sqrt{7}, \\ 1/\sqrt{8}, \\ 1/\sqrt{9}, \end{cases} The judgment conditions are as follows: +---------------+----------------------------------------------------------+ |Conditions |Details | +===============+==========================================================+ |if_improvement |If UCT increases from the previous time | +---------------+----------------------------------------------------------+ |if_shortcut |If move_stack is shorter than last time | +---------------+----------------------------------------------------------+ |if_result |Whether the last time rollout was a win, a loss or a draw | +---------------+----------------------------------------------------------+ |if_pruning |Whether UCT was infinity, -infinity or something else | +---------------+----------------------------------------------------------+ |if_is_check |Is the board of the node selected last time checked? | +---------------+----------------------------------------------------------+ Also, since GP can be redone in each generation, it is possible to prevent the UCT from being highly evaluated at the beginning of the search based on the number of visits. It may get the effect of this paper [#]_. MCTS Solver =========== With the introduction of MCTS Solver [#]_ [#]_, chess-ant will cut branches to speed up and reduce wrong answers. The reason for the significant changes to the pseudocode in the paper is that the code for `the package `__ referenced by the inherited classes is written differently. In addition, there is no ``goto`` in Python, so there is extra code. In the pseudo code, everything is completed inside the MCTS Solver, but in :file:`chess_ant.py`, it is written separately in ``_executeRound()``. Execute rollout in ``mctsSolver()``, perform processing such as pruning, output reward like rollout, and input it to ``backpropogate()``. Like negamax, the MCTS solver is a recursive function and requires a stop condition. Parallelization =============== In :mod:`mcts-solver` 0.0.5, with the help of OpenAI's ChatGPT and Google Bard, I modified the code to allow parallel processing [#]_ [#]_. Tree parallelization is more difficult than root parallelization because it requires the use of locks. I'm not sure if these changes are working correctly, so they may change significantly in the future. `Context managers `__ are useful, but they can get you into trouble if you use them incorrectly. Change History ============== Since both :mod:`chess-classification` and :mod:`chem-classification` use the same algorithm, I made similar changes at the same time. In :mod:`chess-classification` 0.0.4, fen is separated into tokens to create a dataset. The fen is separated by columns instead of letter by letter. Corrected by adding ``math.sqrt(2)`` since there was an error in the UCB1 algorithm until :mod:`chess-ant` 0.0.5 manual. It seems that the author of :mod:`mcts` initially set ``explorationConstant = 1 / math.sqrt(2)`` in :file:`mcts.py` to cancel out ``math.sqrt(2)``. I added a new terminal called ``selectNodeEphemeralConstant`` to replace the ephemeral constant in :mod:`chess-ant` 0.0.6. According to `Issues #658 `_, Deap does not work with Python 3.10, but with version 3.11. In :file:`~/.bashrc` or :file:`~/.bash_aliases`: .. code-block:: bash # Python version alias python3='/usr/bin/python3.11' and execute :command:`source` command: .. code-block:: bash source ~/.bash_aliases Another method is :command:`update-alternatives`. .. code-block:: bash update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 2 update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1 update-alternatives --config python3 python3 --version To revert to the original version that was set as top priority: .. code-block:: bash update-alternatives --auto python3 Please note that as of December 2022, gnome-terminal does not start with Python 3.11 on Ubuntu 22.04.1 LTS. A more common method is to use :mod:`venv`: .. code-block:: bash sudo apt install python3.11-venv python3.11 -m venv ~/.venv3.11 source ~/.venv3.11/bin/activate which pip3 To finish: .. code-block:: bash deactivate In :mod:`chess-classification` 0.0.5, you can load a local save. In order to shorten the training time, [#]_ the conventional model was changed to the checkpoint google/electra-small-discriminator [#]_ of the electra model. My pgn files are more like tactics [#]_ [#]_ than chess problems, so it's more efficient to create a dataset from tactics. More than 300 rows of pgn data are required for model training and model evaluation in order to increase the accuracy rate. Development Plan ================ .. todo:: As another project, I will introduce deep learning [#]_ [#]_ for Natural Language Processing (NLP) in FEN’s win / loss evaluation. :file:`genPgn.py` automatically plays with stockfish and outputs PGN files. By the way, version 0.0.1 of :file:`genPgn.py` contains the walrus operator, so it only works with Python 3.8 or higher. :file:`importPgn.py` creates a dataset from PGN files. :file:`chess_classification.py` generates a trained model with simple transformers. I plan to use this trained model to replace the rollout of :file:`chess_ant.py`, like AlphaZero. Reference ========= .. [#] `Cazenave, Tristan. “Evolving Monte-Carlo Tree Search Algorithms.” (2007). `__ .. [#] `AlphaZero: Shedding new light on chess, shogi, and Go `__ .. [#] `Silver, David et al. “A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.” Science 362 (2018): 1140 - 1144. `__ .. [#] `Foster, David. (2017). AlphaGo Zero Explained In One Diagram `__ .. [#] `Tadao Yamaoka’s diary `__ .. [#] `Auer, Peter et al. “Finite-time Analysis of the Multiarmed Bandit Problem.” Machine Learning 47 (2004): 235-256. `__ .. [#] `Kocsis, Levente and Csaba Szepesvari. “Bandit Based Monte-Carlo Planning.” ECML (2006). `__ .. [#] `Swiechowski, Maciej et al. “Monte Carlo Tree Search: A Review of Recent Modifications and Applications.” ArXiv abs/2103.04931 (2021): n. pag. `__ .. [#] `Wikipedia contributors. "Monte Carlo tree search." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 18 Oct. 2021. Web. 25 Dec. 2021. `__ .. [#] `Wikipedia contributors. "モンテカルロ木探索." Wikipedia. Wikipedia, 8 Oct. 2021. Web. 25 Dec. 2021. `__ .. [#] `Imagawa, Takahisa and Tomoyuki Kaneko. “Improvement of State’s Value Estimation for Monte Carlo Tree Search.” (2017). `__ .. [#] `Winands, Mark & Björnsson, Yngvi & Saito, Jahn-Takeshi. (2008). Monte-Carlo Tree Search Solver. 25-36. 10.1007/978-3-540-87608-3_3. `__ .. [#] `Baier, Hendrik & Winands, Mark. (2015). MCTS-Minimax Hybrids. IEEE Transactions on Computational Intelligence and AI in Games. 7. 167-179. 10.1109/TCIAIG.2014.2366555. `__ .. [#] `Chaslot, Guillaume & Winands, Mark & Herik, H.. (2008). Parallel Monte-Carlo Tree Search. 60-71. 10.1007/978-3-540-87608-3_6. `__ .. [#] `Soejima, Yusuke & Kishimoto, Akihiro & Watanabe, Osamu. (2009). Root Parallelization of Monte Carlo Tree Search and Its Effectiveness in Computer Go. `__ .. [#] `Rajapakse, Thilina. (2020). Battle of the Transformers: ELECTRA, BERT, RoBERTa, or XLNet `__ .. [#] `ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators `__ .. [#] `Download tactics database `__ .. [#] `Gorgonian's Chess Site `__ .. [#] `Savransky, Dmitriy. (2020). How to Use GPT-2 for Custom Data Generation. INTERSOG Inc. `__ .. [#] `Noever, David et al. “The Chess Transformer: Mastering Play using Generative Language Models.” arXiv: Artificial Intelligence (2020): n. pag. `__ Bibliography ============ - `Home Page of John R. Koza `__ - `Astro Teller \| Technical Papers `__ - `Lones, Michael. (2014). Genetic Programming: Memory, Loops and Modules. David Corne: Open Courseware. `__ - `Alpha“Othello” Zero `__ - `Czech, Johannes et al. “Monte-Carlo Graph Search for AlphaZero.” ArXiv abs/2012.11045 (2020): n. pag. `__ - `Prasad, Aditya. (2018). Lessons From Implementing AlphaZero `__ - `chess-alpha-zero `__ - `Yao, Yao. (2018). API Python Chess: Distribution of Chess Wins based on random moves `__ - `Stöckl, Andreas. (2018). Writing a chess program in one day `__ - `Stöckl, Andreas. (2019). An incremental evaluation function and a test-suite for computer chess `__ - `Stöckl, Andreas. (2019). Reconstructing chess positions `__ - `Python Chess `__ - `Fiekas, Niklas. (2015). An implementation of the Bratko-Kopec Test using python-chess `__ - `Bratko-Kopec Test `__ - `Kurt & Rolf Chess Homepage of Kurt Utzinger `__ - `PGN Mentor `__ - `Hart, Alex. (2011). Alpha Beta pruning on a Minimax tree in Python `__ - `PythonChessAi `__ - `easyAI `__ - `Shrott, Ryan. (2017). Genetic Programming applied to AI Heuristic Optimization `__ - `Alpha-Beta Pruning `__ - `Hartikka, Lauri. (2017). A step-by-step guide to building a simple chess AI `__ - `Simplified Evaluation Function `__ - `Brynjulfsson, Erlingur. A Genetic Minimax Game-Playing Strategy `__ - `Öberg, Viktor. “EVOLUTIONARY AI IN BOARD GAMES : An evaluation of the performance of an evolutionary algorithm in two perfect information board games with low branching factor.” (2015). `__ - `Agapitos, Alexandros & Lucas, Simon. (2006). Learning Recursive Functions with Object Oriented Genetic Programming. 3905. 166-177. 10.1007/11729976_15. `__ - `Yu, Tina and Christopher D. Clack. “Recursion , Lambda Abstractions and Genetic Programming.” . `__ - `YouTube channel of David Beazley `__ - `Welcome to AntWiki `__