Chess-Ant Article

author:

Akihiro Kuroiwa

date:

2021/12/25

abstract:

Like AlphaFold developed by DeepMind, MCTS can be applied to bioinformatics and cheminformatics. Chess-Ant is influenced by AlphaZero, but tries a different method. I would like to show the possibility that the combination of MCTS Solver and Genetic Programming can be put to practical use.

Influence from AlphaZero

A variant of the Polynomial Upper Confidence Tree (PUCT) introduced in AlphaGo Zero and AlphaZero complements the exploration rate \(C(s)\) with the prior probability \(P(s,a)\). My chess-ant uses the conventional Upper Confidence Tree (UCT) and replaces the adjustment of the constant \(C\) with Genetic Programming (GP) [1]. From \(1/\sqrt{1}\) to \(1/\sqrt{9}\), GP chooses the value of the constant \(C\) according to the conditions. Depending on the value of the constant \(C\), chess_ant.py will either actively or passively search.

AlphaZero’s PUCT [2] [3] [4] [5]:

\[ \begin{align}\begin{aligned}a_{t} = arg\, max_{a}(Q(s_{t}, a) + U(s_{t}, a))\\U(s,a)=C(s)P(s,a)\frac{\sqrt{N(s)}}{{1+N(s,a)}}\\C(s)=log\frac{1+N(s)+c_{base}}{c_{base}}+c_{init}\end{aligned}\end{align} \]

A deep neural network \((p, v) = f_{\theta}(s)\) with parameters \(\theta\) evaluates the leaf node \(s_{L}\) :

\[(p, v) = f_{\theta}(s_{L})\]

Initialize:

\[ \begin{align}\begin{aligned}N(s_{L} , a) = 0\\W(s_{L} , a) = 0\\Q(s_{L} , a) = 0\\P(s_{L} , a) = p_{a}\end{aligned}\end{align} \]

Update:

\[ \begin{align}\begin{aligned}t \leq L\\N(s_{t} , a_{t}) = N(s_{t} , a_{t}) + 1\\W (s_{t} , a_{t}) = W(s_{t} , a_{t}) + v\\Q(s_{t} , a_{t}) = \frac{W(s_{t} , a_{t})}{N(s_{t} , a_{t})}\end{aligned}\end{align} \]

Details

(s,a)

Each state-action pair.

N(s)

The parent visit count.

N(s,a)

The visit count.

W(s,a)

The total action-value.

Q(s,a)

The mean action-value.

P(s,a)

The prior probability of selecting a in s.

C(s)

The exploration rate.

p

A vector of Move probabilities (predicted by a deep neural network).

v

A scalar value (predicted by a deep neural network).

Chess-Ant’s UCT [6] [7] [8] [9] [10]:

\[ \begin{align}\begin{aligned}a_{t} = arg\, max_{a}(Q(s_{t}, a) + C_{gp}\sqrt\frac{2lnN(s_{t})}{N(s_{t},a)})\\\begin{split}C_{gp} = \begin{cases} 1/\sqrt{1}, \\ 1/\sqrt{2}, \\ 1/\sqrt{3}, \\ 1/\sqrt{4}, \\ 1/\sqrt{5}, & \mbox{If the previously selected node state is under certain conditions} \\ 1/\sqrt{6}, \\ 1/\sqrt{7}, \\ 1/\sqrt{8}, \\ 1/\sqrt{9}, \end{cases}\end{split}\end{aligned}\end{align} \]

The judgment conditions are as follows:

Conditions

Details

if_improvement

If UCT increases from the previous time

if_shortcut

If move_stack is shorter than last time

if_result

Whether the last time rollout was a win, a loss or a draw

if_pruning

Whether UCT was infinity, -infinity or something else

if_is_check

Is the board of the node selected last time checked?

Also, since GP can be redone in each generation, it is possible to prevent the UCT from being highly evaluated at the beginning of the search based on the number of visits. It may get the effect of this paper [11].

MCTS Solver

With the introduction of MCTS Solver [12] [13], chess-ant will cut branches to speed up and reduce wrong answers.

The reason for the significant changes to the pseudocode in the paper is that the code for the package referenced by the inherited classes is written differently.

In addition, there is no goto in Python, so there is extra code. In the pseudo code, everything is completed inside the MCTS Solver, but in chess_ant.py, it is written separately in _executeRound(). Execute rollout in mctsSolver(), perform processing such as pruning, output reward like rollout, and input it to backpropogate().

Like negamax, the MCTS solver is a recursive function and requires a stop condition.

Parallelization

In mcts-solver 0.0.5, with the help of OpenAI’s ChatGPT and Google Bard, I modified the code to allow parallel processing [14] [15]. Tree parallelization is more difficult than root parallelization because it requires the use of locks. I’m not sure if these changes are working correctly, so they may change significantly in the future. Context managers are useful, but they can get you into trouble if you use them incorrectly.

Change History

Since both chess-classification and chem-classification use the same algorithm, I made similar changes at the same time.

In chess-classification 0.0.4, fen is separated into tokens to create a dataset. The fen is separated by columns instead of letter by letter.

Corrected by adding math.sqrt(2) since there was an error in the UCB1 algorithm until chess-ant 0.0.5 manual. It seems that the author of mcts initially set explorationConstant = 1 / math.sqrt(2) in mcts.py to cancel out math.sqrt(2). I added a new terminal called selectNodeEphemeralConstant to replace the ephemeral constant in chess-ant 0.0.6.

According to Issues #658, Deap does not work with Python 3.10, but with version 3.11.

In ~/.bashrc or ~/.bash_aliases:

# Python version
alias python3='/usr/bin/python3.11'

and execute source command:

source ~/.bash_aliases

Another method is update-alternatives.

update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 2
update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1
update-alternatives --config python3
python3 --version

To revert to the original version that was set as top priority:

update-alternatives --auto python3

Please note that as of December 2022, gnome-terminal does not start with Python 3.11 on Ubuntu 22.04.1 LTS.

A more common method is to use venv:

sudo apt install python3.11-venv
python3.11 -m venv ~/.venv3.11
source ~/.venv3.11/bin/activate
which pip3

To finish:

deactivate

In chess-classification 0.0.5, you can load a local save.

In order to shorten the training time, [16] the conventional model was changed to the checkpoint google/electra-small-discriminator [17] of the electra model.

My pgn files are more like tactics [18] [19] than chess problems, so it’s more efficient to create a dataset from tactics. More than 300 rows of pgn data are required for model training and model evaluation in order to increase the accuracy rate.

Development Plan

Todo

As another project, I will introduce deep learning [20] [21] for Natural Language Processing (NLP) in FEN’s win / loss evaluation.

genPgn.py automatically plays with stockfish and outputs PGN files. By the way, version 0.0.1 of genPgn.py contains the walrus operator, so it only works with Python 3.8 or higher. importPgn.py creates a dataset from PGN files. chess_classification.py generates a trained model with simple transformers. I plan to use this trained model to replace the rollout of chess_ant.py, like AlphaZero.

Reference

Bibliography