Chem-Ant Article

author:

Akihiro Kuroiwa

date:

2022/07/08

abstract:

I started writing chess-ant in 2019, but at first I was particular about minimax and the work did not proceed slowly. With the COVID-19 outbreak of the cruise ship Diamond Princess in 2020, when the pandemic was finally beginning to attract attention, I decided to use the chess-ant algorithm for the development of therapeutic agents. After that, I read the paper of MCTS solver and the performance improved. At the same time, I learned how to use the cheminformatics software. Even after the SARS-CoV-2 pandemic has converged, the next pandemic is waiting. Let’s contribute to society with our skills.

UCSF Chimera

On my old laptop Fujitsu LIFEBOOK AH42/C, when I try to install UCSF ChimeraX, I get the following error:

ERROR: ChimeraX requires OpenGL graphics version 3.3.
Your computer graphics driver provided version 2.1
Try updating your graphics driver.

Therefore, in this experiment, I use UCSF Chimera. The advantage is that you can specify the binding site with the mouse.

chmod u+x chimera-alpha-linux_x86_64.bin
./chimera-alpha-linux_x86_64.bin

In ~/.profile on Ubuntu:

if [ -d "$HOME/.local/bin" ] ; then
    PATH="$HOME/.local/bin:$PATH"
fi

In my case, there are two installation locations:

~/.local/bin/
~/.local/UCSF-Chimera64-2022-05-18/

Run chimera from the terminal on the command line, or right-click the desktop icon to give execute permission and double-click to launch UCSF Chimera.

Prior to this experiment, get the latest release of the AutoDock Vina installer from GitHub and install it according to the manual. You can check the location of the binary file with the following command:

cd ~/.local/bin/
ln -s vina_1.2.3_linux_x86_64 vina
which vina

To verify the operation, proceed with the experiment as follows:

  1. Create fragments of the target molecule Nirmatrelvir and output some molecules.

  2. Material candidates including the target molecule are selected by similarity-mcts, and some molecules are output.

  3. Get the set difference.

similarity-genMols -t "CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C" -m "CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C" -b70 -p "gen_smiles" -f "gen1-1.csv"
similarity-mcts -l2 -e3 -r10 -b100 -p "gen_smiles" -f "gen1-2.csv"

After running, you would see something like this:

Material candidates: {'CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C', 'CCCC1=NC(=C(N1CC2=CC=C(C=C2)C3=CC=CC=C3C4=NNN=N4)C(=O)O)C(C)(C)O'}

There are some things to keep in mind when running similarity-genMols. Python doesn’t distinguish between single and double quotes, but bash and dash do. In addition, you don’t need commas on the command line:

similarity-genMols -t "CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C" -m "CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C" "CCCC1=NC(=C(N1CC2=CC=C(C=C2)C3=CC=CC=C3C4=NNN=N4)C(=O)O)C(C)(C)O" -b100 -f gen1-2.csv
similarity-genMols -t "CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C" -m "CCCC1=NC(=C(N1CC2=CC=C(C=C2)C3=CC=CC=C3C4=NNN=N4)C(=O)O)C(C)(C)O" -b100 -f gen1-3.csv
import pandas as pd
df1_1 = pd.read_csv("gen_smiles/gen1-1.csv", header=0, index_col=0)
df1_2 = pd.read_csv("gen_smiles/gen1-2.csv", header=0, index_col=0)
df1_3 = pd.read_csv("gen_smiles/gen1-3.csv", header=0, index_col=0)
df1_4 = pd.concat([df1_1, df1_1, df1_2, df1_3, df1_3], axis=0)
df1_4.drop_duplicates(subset="smiles", keep=False, inplace=True)
df1_4.sort_values(["lipinski", "dice_similarity"], inplace=True, ascending=False)
df1_4.reset_index(drop=True).to_csv("gen_smiles/gen1-4.csv")

Create a ligand file with Open Babel. Open gen3.csv and specify the smiles of high-scoring molecule with similarity-mcts. Don’t forget to add hydrogen atoms and assign partial charges. On Ubuntu:

sudo apt install openbabel
obabel -L
obabel -L charges
obabel -h -c -ican -:"CCCC1C2C(CN1C(=O)C1C3C(CN1C(=O)C(F)(F)F)C3(C)C)C2(C)C" -opdbqt -O ligand.pdbqt --gen3D --partialcharge gasteiger

Let’s go back to UCSF Chimera. Open the above file and follow the menu as follows:

  1. File ‣ Fetch Structure by ID ‣ PDB(mmCIF) ‣ 7tll

  2. File ‣ Open ‣ ligand.pdbqt ‣ file type PDB

  3. Tools ‣ Surface/Binding Analysis ‣ AutoDock Vina

Our Output file is all. Specify Receptor and Ligand. Check Resize search volume using for your mouse. Write vina path in Executable location.

In my case, when I specified the binding site with the mouse, the frame was not displayed unless I switched it with the Presets menu. When reconfirming the experimental results, open all.receptor.pdb and:

  1. Tools ‣ Surface/Binding Analysis ‣ ViewDock ‣ all.pdbqt

  2. Move ‣ Play

AutoDock Vina

Reuse the receptor file output by UCSF Chimera and experiment on the command line. You will prepare your own ligand file. The contents of conf.txt are as follows:

receptor = all.receptor.pdbqt
ligand = ligand.pdbqt

out = all.pdbqt

center_x = -2.68714
center_y = -1.23572
center_z = 13.8821

size_x = 25.747
size_y = 22.6627
size_z = 22.1881

similarity-mcts now chose Catechin and the mysterious molecule Gnididin [1]:

similarity-mcts -l2 -e3 -r10 -b100 -p "gen_smiles" -f "gen2-2.csv"
Material candidates: {'C1C(C(OC2=CC(=CC(=C21)O)O)C3=CC(=C(C=C3)O)O)O', 'CCCCCC=CC=CC(=O)OC1C(C23C4C=C(C(=O)C4(C(C5(C(C2C6C1(OC(O6)(O3)C7=CC=CC=C7)C(=C)C)O5)CO)O)O)C)C'}

gen2-2.csv:

,smiles,dice_similarity,lipinski
0,C=C(C)C12OC3(CO)OC1C1C4OC4(CO)C(O)C4(O)C(=O)C(C)=CC4C1(O3)C(C)C2CO,0.19672131147540983,1.0

Unfortunately, this molecule is made up of fragments produced solely by Gnididin:

obabel -h -c -ican -:"C=C(C)C12OC3(CO)OC1C1C4OC4(CO)C(O)C4(O)C(=O)C(C)=CC4C1(O3)C(C)C2CO" -opdbqt -O ligand.pdbqt --gen3D --partialcharge gasteiger

Execute AutoDock Vina:

vina --config conf.txt
mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -8.765          0          0
   2        -8.31       1.82      6.828
   3       -8.252      2.441      4.152
   4       -8.086      1.664      7.041
   5        -7.85      2.301      7.148
   6       -7.825      1.726      6.693
   7       -7.797      3.008      6.184
   8       -7.412      2.183      7.011
   9       -7.339      2.426      4.168

Mold for Smiles Casting

The amino acid interaction described in this paper [2] is based on PDB ID: 6LU7, while our experiments are based on the SARS-CoV-2 Mpro Omicron P132H contained in PDB ID: 7TLL. The first three letters of active site amino acid are abbreviations for amino acids, and the rest represent the positions of sequences. Let’s check with UCSF Chimera:

  1. Presets ‣ Interactive 1 (ribbons) with chimera.

  2. Hover your cursor over the receptor’s active site amino acid on the binding site to see its location.

  3. Display a nucleotide or amino acid sequence alignment with chimera from Tools ‣ Sequence ‣ Sequence and save it in fast format.

  4. If you want to check the Active site amino acid, right-click on the relevant part of the sequence.

If you cast from a mold, the casting should fit the original mold. That’s why I added amino acids and nucleotides to the file smiles.csv [3]. Whether the relationship between the binding site and the ligand in docking simulation can be said to be the same, let’s experiment with the following method:

  1. Convert the relevant part to smiles with rdkit. The range is from Phe140 to Glu166 in sequence.

  2. The smiles string is so long, let’s break it down into fragments and outputs them to some molecules.

from rdkit import Chem
from rdkit.Chem import BRICS
Chem.MolToSmiles(Chem.MolFromFASTA("FLNGSCGSVGFNIDYDCVSFCYMHHME"))
smiles = 'CC[C@H](C)[C@H](NC(=O)[C@H](CC(N)=O)NC(=O)[C@H](Cc1ccccc1)NC(=O)CNC(=O)[C@@H](NC(=O)[C@H](CO)NC(=O)CNC(=O)[C@H](CS)NC(=O)[C@H](CO)NC(=O)CNC(=O)[C@H](CC(N)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)Cc1ccccc1)C(C)C)C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CS)C(=O)N[C@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](CS)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](Cc1c[nH]cn1)C(=O)N[C@@H](Cc1c[nH]cn1)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CCC(=O)O)C(=O)O)C(C)C'
allfrags = set()
allfrags.update(BRICS.BRICSDecompose(Chem.MolFromSmiles(smiles), returnMols=True))
builder = BRICS.BRICSBuild(allfrags)
generated_smiles = []
for i in range(30):
    mol = next(builder)
    mol.UpdatePropertyCache(strict=True)
    generated_smiles.append(Chem.MolToSmiles(mol))
generated_smiles
['CSCC[C@H](SC)C(=O)N[C@@H](CCC(=O)O)C(=O)O', 'CSCC[C@H](SC)C(=O)Nc1c[nH]cn1', 'CSCC[C@H](SC)C(=O)Nc1ccc(O)cc1', 'CSCC[C@H](SC)C(=O)Nc1ccccc1', 'CS[C@@H](CC(C)C)C(=O)Nc1ccc(O)cc1', 'CC(C)C[C@H](N[C@@H](CCC(=O)O)C(=O)O)C(=O)Nc1ccc(O)cc1', 'CC(C)C[C@H](Nc1ccc(O)cc1)C(=O)Nc1ccc(O)cc1', 'CC(C)C[C@H](Nc1ccccc1)C(=O)Nc1ccc(O)cc1', 'CC(C)C[C@H](Nc1c[nH]cn1)C(=O)Nc1ccc(O)cc1', 'CS[C@@H](CC(C)C)C(=O)Nc1c[nH]cn1', 'CS[C@@H](CC(C)C)C(=O)Nc1ccccc1', 'CS[C@@H](CC(C)C)C(=O)N[C@@H](CCC(=O)O)C(=O)O', 'CS[C@@H](CC(=O)O)C(=O)NC(=O)[C@H](CC(C)C)SC', 'CS[C@@H](CC(C)C)C(=O)NC(=O)[C@@H](SC)C(C)C', 'CS[C@@H](CC(N)=O)C(=O)NC(=O)[C@H](CC(C)C)SC', 'CS[C@@H](CC(C)C)C(=O)NC(=O)[C@H](CS)SC', 'CS[C@@H](CC(C)C)C(=O)NC(=O)[C@@H](N)Cc1c[nH]cn1', 'CS[C@@H](CC(C)C)C(=O)NC(=O)[C@@H](N)Cc1ccc(O)cc1', 'CS[C@@H](CC(C)C)C(=O)NC(=O)[C@@H](N)Cc1ccccc1', 'CS[C@@H](CO)C(=O)NC(=O)[C@H](CC(C)C)SC', 'CS[C@@H](CC(C)C)C(=O)NC(=O)[C@H](CC(C)C)SC', 'CC[C@H](C)[C@H](SC)C(=O)NC(=O)[C@H](CC(C)C)SC', 'CSCC(=O)NC(=O)[C@H](CC(C)C)SC', 'CC(C)C[C@H](N[C@@H](CCC(=O)O)C(=O)O)C(=O)Nc1ccccc1', 'CC(C)C[C@H](Nc1c[nH]cn1)C(=O)Nc1ccccc1', 'CC(C)C[C@H](Nc1ccc(O)cc1)C(=O)Nc1ccccc1', 'CC(C)C[C@H](Nc1ccccc1)C(=O)Nc1ccccc1', 'CC(C)C[C@H](Nc1ccc(O)cc1)C(=O)N[C@@H](CCC(=O)O)C(=O)O', 'CC(C)C[C@H](Nc1ccccc1)C(=O)N[C@@H](CCC(=O)O)C(=O)O', 'CC(C)C[C@H](N[C@@H](CCC(=O)O)C(=O)O)C(=O)N[C@@H](CCC(=O)O)C(=O)O']
  1. From the output molecules, select the molecules with good results by docking simulation. Of course, it’s a good result among the options.

  2. Run similarity-mcts targeting that molecule.

obabel -h -c -ican -:"CC(C)C[C@H](Nc1ccc(O)cc1)C(=O)N[C@@H](CCC(=O)O)C(=O)O" -opdbqt -O ligand.pdbqt --gen3D --partialcharge gasteiger
vina --config conf.txt
mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -7.192          0          0
   2       -7.014      2.837      4.843
   3       -7.002      1.339      2.292
   4       -7.001      2.143      4.417
   5       -6.894      1.303       2.67
   6       -6.759      2.539       6.62
   7       -6.578      2.329      7.121
   8       -6.547      3.004      7.767
   9        -6.51        1.3      2.908
similarity-mcts -i -l2 -e3 -r10 -b100 -p "gen_smiles" -f "gen3-2.csv" -t "CC(C)C[C@H](Nc1ccc(O)cc1)C(=O)N[C@@H](CCC(=O)O)C(=O)O"
Material candidates: {'CC1CCC2C(C(OC3C24C1CCC(O3)(OO4)C)OC)C', 'C(CCN)CC(C(=O)O)N', 'CC(C)C[C@H](Nc1ccc(O)cc1)C(=O)N[C@@H](CCC(=O)O)C(=O)O'}
,smiles,dice_similarity,lipinski
0,CCNC1CCNC1=O,0.5454545454545454,1.0
1,CC(Nc1ccccc1)C(=O)Oc1ccccc1,0.5269607843137255,1.0
2,COC(=O)[C@H](CC(C)C)Nc1ccc(O)cc1,0.5084427767354597,1.0
3,CC(C)C(=O)OC(=O)C(C)C,0.5078125,1.0
4,O=C1NCCC1Nc1ccc(F)cc1,0.5066162570888468,1.0
5,COC(=O)[C@H](CC(C)C)OC,0.5040983606557377,1.0
6,C(c1nn[nH]n1)c1nn[nH]n1,0.4921875,1.0
7,CCOc1nn[nH]n1,0.4765625,1.0
8,COc1ccc(O)cc1,0.46875,1.0
9,CO[C@@H](CCC(=O)O)C(=O)O,0.4609053497942387,1.0
10,CC(C(=O)N1Cc2ccccc2CC1C(=O)O)N1Cc2ccccc2CC1c1ccccc1,0.4494649227110582,1.0
11,CC(C(=O)N1Cc2ccccc2CC1C(=O)O)N1Cc2ccccc2CC1C(=O)O,0.4436183395291202,1.0
12,c1ccc(C2CC3CCCC3N2c2ccccc2)cc1,0.4426666666666666,1.0
13,COC(=O)[C@H](CC(C)C)NC1OC2OC3(C)CCC4C(C)CCC(C1C)C24OO3,0.43861607142857145,1.0
14,O=C(O)C1Cc2ccccc2CN1c1ccccc1,0.4348387096774193,1.0
15,CC1CCC2C(C)C(Nc3ccc(O)cc3)OC3OC4(C)CCC1C32OO4,0.4311717861205916,1.0
16,CO[C@@H](CC(C)C)C(=O)NC1OC2OC3(C)CCC4C(C)CCC(C1C)C24OO3,0.4242761692650334,1.0
17,CC(C(=O)Oc1ccccc1)N1C(c2ccccc2)CC2CCCC21,0.4230287859824781,1.0
18,CCOc1ccccc1C(=O)O,0.4228971962616822,1.0
19,Cc1cc(NC2CCNC2=O)no1,0.4113924050632911,1.0

Ignore short smiles:

obabel -h -c -ican -:"CC(C(=O)N1Cc2ccccc2CC1C(=O)O)N1Cc2ccccc2CC1c1ccccc1" -opdbqt -O ligand.pdbqt --gen3D --partialcharge gasteiger
vina --config conf.txt
mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
   1       -8.417          0          0
   2       -8.302      2.467      6.754
   3       -8.003       4.96      7.503
   4       -7.624      3.907      6.517
   5       -7.506      5.171      7.983
   6       -7.485      2.979      5.385
   7       -7.433      5.416      9.353
   8       -7.375      2.857      5.204
   9       -7.296      2.998      5.526

If you need the target molecule itself, the above method may be useful.

Todo

  • Separate MCTS solver as another package.

  • Is it possible to get a high score by docking simulation without the target molecule?

  • Type bool is output as 1.0 or 0.0 in a csv file.

  • similarity-ant is so slow that it is far from practical.

  • Is similarity-mcts working properly in the first place?

  • Version 0.0.3 of similarity-mcts now imports MCTS solver, so the output is slightly different from this document.

  • Unravel the entangled spaghetti code.

Reference

Bibliography