================ Chem-Ant Article ================ :author: Akihiro Kuroiwa :date: 2022/07/08 :abstract: I started writing :mod:`chess-ant` in 2019, but at first I was particular about minimax and the work did not proceed slowly. With the COVID-19 outbreak of the cruise ship Diamond Princess in 2020, when the pandemic was finally beginning to attract attention, I decided to use the :mod:`chess-ant` algorithm for the development of therapeutic agents. After that, I read the paper of MCTS solver and the performance improved. At the same time, I learned how to use the cheminformatics software. Even after the SARS-CoV-2 pandemic has converged, the next pandemic is waiting. Let's contribute to society with our skills. UCSF Chimera ============ On my old laptop Fujitsu LIFEBOOK AH42/C, when I try to install `UCSF ChimeraX `__, I get the following error: :: ERROR: ChimeraX requires OpenGL graphics version 3.3. Your computer graphics driver provided version 2.1 Try updating your graphics driver. Therefore, in this experiment, I use `UCSF Chimera `__. The advantage is that you can specify the binding site with the mouse. .. code-block:: bash chmod u+x chimera-alpha-linux_x86_64.bin ./chimera-alpha-linux_x86_64.bin In :file:`~/.profile` on Ubuntu: .. code-block:: bash if [ -d "$HOME/.local/bin" ] ; then PATH="$HOME/.local/bin:$PATH" fi In my case, there are two installation locations: .. code-block:: bash ~/.local/bin/ ~/.local/UCSF-Chimera64-2022-05-18/ Run :command:`chimera` from the terminal on the command line, or right-click the desktop icon to give execute permission and double-click to launch UCSF Chimera. Prior to this experiment, get the latest release of the `AutoDock Vina installer from GitHub `__ and install it according to the manual. You can check the location of the binary file with the following command: .. code-block:: bash cd ~/.local/bin/ ln -s vina_1.2.3_linux_x86_64 vina which vina To verify the operation, proceed with the experiment as follows: #. Create fragments of the target molecule Nirmatrelvir and output some molecules. #. Material candidates including the target molecule are selected by :command:`similarity-mcts`, and some molecules are output. #. Get the `set difference `__. .. code-block:: bash similarity-genMols -t "CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C" -m "CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C" -b70 -p "gen_smiles" -f "gen1-1.csv" similarity-mcts -l2 -e3 -r10 -b100 -p "gen_smiles" -f "gen1-2.csv" After running, you would see something like this: :: Material candidates: {'CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C', 'CCCC1=NC(=C(N1CC2=CC=C(C=C2)C3=CC=CC=C3C4=NNN=N4)C(=O)O)C(C)(C)O'} There are some things to keep in mind when running :command:`similarity-genMols`. Python doesn't distinguish between single and double quotes, but bash and dash do. In addition, you don't need commas on the command line: .. code-block:: bash similarity-genMols -t "CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C" -m "CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C" "CCCC1=NC(=C(N1CC2=CC=C(C=C2)C3=CC=CC=C3C4=NNN=N4)C(=O)O)C(C)(C)O" -b100 -f gen1-2.csv similarity-genMols -t "CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C" -m "CCCC1=NC(=C(N1CC2=CC=C(C=C2)C3=CC=CC=C3C4=NNN=N4)C(=O)O)C(C)(C)O" -b100 -f gen1-3.csv .. code-block:: python import pandas as pd df1_1 = pd.read_csv("gen_smiles/gen1-1.csv", header=0, index_col=0) df1_2 = pd.read_csv("gen_smiles/gen1-2.csv", header=0, index_col=0) df1_3 = pd.read_csv("gen_smiles/gen1-3.csv", header=0, index_col=0) df1_4 = pd.concat([df1_1, df1_1, df1_2, df1_3, df1_3], axis=0) df1_4.drop_duplicates(subset="smiles", keep=False, inplace=True) df1_4.sort_values(["lipinski", "dice_similarity"], inplace=True, ascending=False) df1_4.reset_index(drop=True).to_csv("gen_smiles/gen1-4.csv") Create a ligand file with `Open Babel `__. Open :file:`gen3.csv` and specify the smiles of high-scoring molecule with :command:`similarity-mcts`. Don't forget to add hydrogen atoms and assign partial charges. On Ubuntu: .. code-block:: bash sudo apt install openbabel obabel -L obabel -L charges obabel -h -c -ican -:"CCCC1C2C(CN1C(=O)C1C3C(CN1C(=O)C(F)(F)F)C3(C)C)C2(C)C" -opdbqt -O ligand.pdbqt --gen3D --partialcharge gasteiger Let's go back to UCSF Chimera. Open the above file and follow the menu as follows: #. :menuselection:`File --> Fetch Structure by ID --> PDB(mmCIF) --> 7tll` #. :menuselection:`File --> Open --> ligand.pdbqt --> file type PDB` #. :menuselection:`Tools --> Surface/Binding Analysis --> AutoDock Vina` Our :guilabel:`Output file` is :file:`all`. Specify :guilabel:`Receptor` and :guilabel:`Ligand`. Check :guilabel:`Resize search volume using` for your mouse. Write vina path in :guilabel:`Executable location`. In my case, when I specified the binding site with the mouse, the frame was not displayed unless I switched it with the :guilabel:`Presets` menu. When reconfirming the experimental results, open :file:`all.receptor.pdb` and: #. :menuselection:`Tools --> Surface/Binding Analysis --> ViewDock --> all.pdbqt` #. :menuselection:`Move --> Play` AutoDock Vina ============= Reuse the receptor file output by UCSF Chimera and experiment on the command line. You will prepare your own ligand file. The contents of :file:`conf.txt` are as follows: .. code-block:: receptor = all.receptor.pdbqt ligand = ligand.pdbqt out = all.pdbqt center_x = -2.68714 center_y = -1.23572 center_z = 13.8821 size_x = 25.747 size_y = 22.6627 size_z = 22.1881 :command:`similarity-mcts` now chose Catechin and the mysterious molecule Gnididin [#]_: .. code-block:: bash similarity-mcts -l2 -e3 -r10 -b100 -p "gen_smiles" -f "gen2-2.csv" :: Material candidates: {'C1C(C(OC2=CC(=CC(=C21)O)O)C3=CC(=C(C=C3)O)O)O', 'CCCCCC=CC=CC(=O)OC1C(C23C4C=C(C(=O)C4(C(C5(C(C2C6C1(OC(O6)(O3)C7=CC=CC=C7)C(=C)C)O5)CO)O)O)C)C'} :file:`gen2-2.csv`: :: ,smiles,dice_similarity,lipinski 0,C=C(C)C12OC3(CO)OC1C1C4OC4(CO)C(O)C4(O)C(=O)C(C)=CC4C1(O3)C(C)C2CO,0.19672131147540983,1.0 Unfortunately, this molecule is made up of fragments produced solely by Gnididin: .. code-block:: bash obabel -h -c -ican -:"C=C(C)C12OC3(CO)OC1C1C4OC4(CO)C(O)C4(O)C(=O)C(C)=CC4C1(O3)C(C)C2CO" -opdbqt -O ligand.pdbqt --gen3D --partialcharge gasteiger Execute AutoDock Vina: .. code-block:: bash vina --config conf.txt :: mode | affinity | dist from best mode | (kcal/mol) | rmsd l.b.| rmsd u.b. -----+------------+----------+---------- 1 -8.765 0 0 2 -8.31 1.82 6.828 3 -8.252 2.441 4.152 4 -8.086 1.664 7.041 5 -7.85 2.301 7.148 6 -7.825 1.726 6.693 7 -7.797 3.008 6.184 8 -7.412 2.183 7.011 9 -7.339 2.426 4.168 Mold for Smiles Casting ======================= The amino acid interaction described in this paper [#]_ is based on `PDB ID: 6LU7 `__, while our experiments are based on the SARS-CoV-2 Mpro Omicron P132H contained in `PDB ID: 7TLL `__. The first three letters of active site amino acid are abbreviations for amino acids, and the rest represent the positions of sequences. Let's check with UCSF Chimera: #. :menuselection:`Presets --> Interactive 1 (ribbons)` with :command:`chimera`. #. Hover your cursor over the receptor's active site amino acid on the binding site to see its location. #. Display a nucleotide or amino acid sequence alignment with :command:`chimera` from :menuselection:`Tools --> Sequence --> Sequence` and save it in fast format. #. If you want to check the Active site amino acid, right-click on the relevant part of the sequence. If you cast from a mold, the casting should fit the original mold. That's why I added amino acids and nucleotides to the file :file:`smiles.csv` [#]_. Whether the relationship between the binding site and the ligand in docking simulation can be said to be the same, let's experiment with the following method: #. Convert the relevant part to smiles with :mod:`rdkit`. The range is from Phe140 to Glu166 in sequence. #. The smiles string is so long, let's break it down into fragments and outputs them to some molecules. .. code-block:: python from rdkit import Chem from rdkit.Chem import BRICS Chem.MolToSmiles(Chem.MolFromFASTA("FLNGSCGSVGFNIDYDCVSFCYMHHME")) smiles = 'CC[C@H](C)[C@H](NC(=O)[C@H](CC(N)=O)NC(=O)[C@H](Cc1ccccc1)NC(=O)CNC(=O)[C@@H](NC(=O)[C@H](CO)NC(=O)CNC(=O)[C@H](CS)NC(=O)[C@H](CO)NC(=O)CNC(=O)[C@H](CC(N)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)Cc1ccccc1)C(C)C)C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CS)C(=O)N[C@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](CS)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](Cc1c[nH]cn1)C(=O)N[C@@H](Cc1c[nH]cn1)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CCC(=O)O)C(=O)O)C(C)C' allfrags = set() allfrags.update(BRICS.BRICSDecompose(Chem.MolFromSmiles(smiles), returnMols=True)) builder = BRICS.BRICSBuild(allfrags) generated_smiles = [] for i in range(30): mol = next(builder) mol.UpdatePropertyCache(strict=True) generated_smiles.append(Chem.MolToSmiles(mol)) generated_smiles ['CSCC[C@H](SC)C(=O)N[C@@H](CCC(=O)O)C(=O)O', 'CSCC[C@H](SC)C(=O)Nc1c[nH]cn1', 'CSCC[C@H](SC)C(=O)Nc1ccc(O)cc1', 'CSCC[C@H](SC)C(=O)Nc1ccccc1', 'CS[C@@H](CC(C)C)C(=O)Nc1ccc(O)cc1', 'CC(C)C[C@H](N[C@@H](CCC(=O)O)C(=O)O)C(=O)Nc1ccc(O)cc1', 'CC(C)C[C@H](Nc1ccc(O)cc1)C(=O)Nc1ccc(O)cc1', 'CC(C)C[C@H](Nc1ccccc1)C(=O)Nc1ccc(O)cc1', 'CC(C)C[C@H](Nc1c[nH]cn1)C(=O)Nc1ccc(O)cc1', 'CS[C@@H](CC(C)C)C(=O)Nc1c[nH]cn1', 'CS[C@@H](CC(C)C)C(=O)Nc1ccccc1', 'CS[C@@H](CC(C)C)C(=O)N[C@@H](CCC(=O)O)C(=O)O', 'CS[C@@H](CC(=O)O)C(=O)NC(=O)[C@H](CC(C)C)SC', 'CS[C@@H](CC(C)C)C(=O)NC(=O)[C@@H](SC)C(C)C', 'CS[C@@H](CC(N)=O)C(=O)NC(=O)[C@H](CC(C)C)SC', 'CS[C@@H](CC(C)C)C(=O)NC(=O)[C@H](CS)SC', 'CS[C@@H](CC(C)C)C(=O)NC(=O)[C@@H](N)Cc1c[nH]cn1', 'CS[C@@H](CC(C)C)C(=O)NC(=O)[C@@H](N)Cc1ccc(O)cc1', 'CS[C@@H](CC(C)C)C(=O)NC(=O)[C@@H](N)Cc1ccccc1', 'CS[C@@H](CO)C(=O)NC(=O)[C@H](CC(C)C)SC', 'CS[C@@H](CC(C)C)C(=O)NC(=O)[C@H](CC(C)C)SC', 'CC[C@H](C)[C@H](SC)C(=O)NC(=O)[C@H](CC(C)C)SC', 'CSCC(=O)NC(=O)[C@H](CC(C)C)SC', 'CC(C)C[C@H](N[C@@H](CCC(=O)O)C(=O)O)C(=O)Nc1ccccc1', 'CC(C)C[C@H](Nc1c[nH]cn1)C(=O)Nc1ccccc1', 'CC(C)C[C@H](Nc1ccc(O)cc1)C(=O)Nc1ccccc1', 'CC(C)C[C@H](Nc1ccccc1)C(=O)Nc1ccccc1', 'CC(C)C[C@H](Nc1ccc(O)cc1)C(=O)N[C@@H](CCC(=O)O)C(=O)O', 'CC(C)C[C@H](Nc1ccccc1)C(=O)N[C@@H](CCC(=O)O)C(=O)O', 'CC(C)C[C@H](N[C@@H](CCC(=O)O)C(=O)O)C(=O)N[C@@H](CCC(=O)O)C(=O)O'] #. From the output molecules, select the molecules with good results by docking simulation. Of course, it's a good result among the options. #. Run :command:`similarity-mcts` targeting that molecule. .. code-block:: bash obabel -h -c -ican -:"CC(C)C[C@H](Nc1ccc(O)cc1)C(=O)N[C@@H](CCC(=O)O)C(=O)O" -opdbqt -O ligand.pdbqt --gen3D --partialcharge gasteiger vina --config conf.txt :: mode | affinity | dist from best mode | (kcal/mol) | rmsd l.b.| rmsd u.b. -----+------------+----------+---------- 1 -7.192 0 0 2 -7.014 2.837 4.843 3 -7.002 1.339 2.292 4 -7.001 2.143 4.417 5 -6.894 1.303 2.67 6 -6.759 2.539 6.62 7 -6.578 2.329 7.121 8 -6.547 3.004 7.767 9 -6.51 1.3 2.908 .. code-block:: bash similarity-mcts -i -l2 -e3 -r10 -b100 -p "gen_smiles" -f "gen3-2.csv" -t "CC(C)C[C@H](Nc1ccc(O)cc1)C(=O)N[C@@H](CCC(=O)O)C(=O)O" :: Material candidates: {'CC1CCC2C(C(OC3C24C1CCC(O3)(OO4)C)OC)C', 'C(CCN)CC(C(=O)O)N', 'CC(C)C[C@H](Nc1ccc(O)cc1)C(=O)N[C@@H](CCC(=O)O)C(=O)O'} :: ,smiles,dice_similarity,lipinski 0,CCNC1CCNC1=O,0.5454545454545454,1.0 1,CC(Nc1ccccc1)C(=O)Oc1ccccc1,0.5269607843137255,1.0 2,COC(=O)[C@H](CC(C)C)Nc1ccc(O)cc1,0.5084427767354597,1.0 3,CC(C)C(=O)OC(=O)C(C)C,0.5078125,1.0 4,O=C1NCCC1Nc1ccc(F)cc1,0.5066162570888468,1.0 5,COC(=O)[C@H](CC(C)C)OC,0.5040983606557377,1.0 6,C(c1nn[nH]n1)c1nn[nH]n1,0.4921875,1.0 7,CCOc1nn[nH]n1,0.4765625,1.0 8,COc1ccc(O)cc1,0.46875,1.0 9,CO[C@@H](CCC(=O)O)C(=O)O,0.4609053497942387,1.0 10,CC(C(=O)N1Cc2ccccc2CC1C(=O)O)N1Cc2ccccc2CC1c1ccccc1,0.4494649227110582,1.0 11,CC(C(=O)N1Cc2ccccc2CC1C(=O)O)N1Cc2ccccc2CC1C(=O)O,0.4436183395291202,1.0 12,c1ccc(C2CC3CCCC3N2c2ccccc2)cc1,0.4426666666666666,1.0 13,COC(=O)[C@H](CC(C)C)NC1OC2OC3(C)CCC4C(C)CCC(C1C)C24OO3,0.43861607142857145,1.0 14,O=C(O)C1Cc2ccccc2CN1c1ccccc1,0.4348387096774193,1.0 15,CC1CCC2C(C)C(Nc3ccc(O)cc3)OC3OC4(C)CCC1C32OO4,0.4311717861205916,1.0 16,CO[C@@H](CC(C)C)C(=O)NC1OC2OC3(C)CCC4C(C)CCC(C1C)C24OO3,0.4242761692650334,1.0 17,CC(C(=O)Oc1ccccc1)N1C(c2ccccc2)CC2CCCC21,0.4230287859824781,1.0 18,CCOc1ccccc1C(=O)O,0.4228971962616822,1.0 19,Cc1cc(NC2CCNC2=O)no1,0.4113924050632911,1.0 Ignore short smiles: .. code-block:: bash obabel -h -c -ican -:"CC(C(=O)N1Cc2ccccc2CC1C(=O)O)N1Cc2ccccc2CC1c1ccccc1" -opdbqt -O ligand.pdbqt --gen3D --partialcharge gasteiger vina --config conf.txt :: mode | affinity | dist from best mode | (kcal/mol) | rmsd l.b.| rmsd u.b. -----+------------+----------+---------- 1 -8.417 0 0 2 -8.302 2.467 6.754 3 -8.003 4.96 7.503 4 -7.624 3.907 6.517 5 -7.506 5.171 7.983 6 -7.485 2.979 5.385 7 -7.433 5.416 9.353 8 -7.375 2.857 5.204 9 -7.296 2.998 5.526 If you need the target molecule itself, the above method may be useful. .. todo:: - Separate MCTS solver as another package. - Is it possible to get a high score by docking simulation without the target molecule? - Type bool is output as 1.0 or 0.0 in a csv file. - :command:`similarity-ant` is so slow that it is far from practical. - Is :command:`similarity-mcts` working properly in the first place? - Version 0.0.3 of :command:`similarity-mcts` now imports MCTS solver, so the output is slightly different from this document. - Unravel the entangled spaghetti code. Reference ========= .. [#] `Sisakht, M., Mahmoodzadeh, A., & Darabian, M. (2021). Plant-derived chemicals as potential inhibitors of SARS-CoV-2 main protease (6LU7), a virtual screening study. Phytotherapy research : PTR, 35(6), 3262–3274. https://doi.org/10.1002/ptr.7041 `__ .. [#] `SAMANT, L., & Javle, V. (2020). Comparative Docking Analysis of Rational Drugs Against COVID-19 Main Protease. ChemRxiv. doi:10.26434/chemrxiv.12136002.v1 This content is a preprint and has not been peer-reviewed. `__ .. [#] `PubChem `__ Bibliography ============ - `化学の新しいカタチ `__ - `Python for chemoinformatics `__ - `English version of Python for Chemoinformatics (pdf) `__ - `Sharif, Suliman. Understanding drug-likeness filters with RDKit and exploring the WITHDRAWN database. (2020). `__ - `Panikar, S., Shoba, G., Arun, M., Sahayarayan, J. J., Usha Raja Nanthini, A., Chinnathambi, A., Alharbi, S. A., Nasif, O., & Kim, H. J. (2021). Essential oils as an effective alternative for the treatment of COVID-19: Molecular interaction analysis of protease (Mpro) with pharmacokinetics and toxicological properties. Journal of infection and public health, 14(5), 601–610. https://doi.org/10.1016/j.jiph.2020.12.037 `__ - `@cat_lover. 構造生成メモ. (2021). `__ - `GB-GM `__ - `Jensen, J. (2019). Graph-based Genetic Algorithm and Generative Model/Monte Carlo Tree Search for the Exploration of Chemical Space. ChemRxiv. doi:10.26434/chemrxiv.7240751.v2 This content is a preprint and has not been peer-reviewed. `__