Chem-Ant Introduction¶
Select material candidates to output molecules similar to the target molecule with MCTS Solver and Genetic Programming.
similarity_ant.py is based on the code of
deap/examples/gp/ant.py.
Requirements¶
On Ubuntu:
sudo -H -s
apt install python3-pip
pip3 install -r requirements.txt
exit
Or:
pip3 install deap
pip3 install mcts
pip3 install rdkit
pip3 install global_chem_extensions
pip3 install mcts-solver
Or:
pip3 install chem-ant
If you want to use chem-ant with chem-classification:
pip3 install simpletransformers
Or:
pip3 install chem-classification
Note
The new package rdkit supports Python versions 3.8 through 3.12, whereas rdkit-pypi only supports Python versions 3.7 through 3.11.
Chem-ant depends on global-chem-extensions, but both depend on rdkit-pypi. Chem-ant version 0.1.0 will depend on rdkit, but global-chem-extensions will support rdkit in v2.0. Therefore, if you want to install chem-ant on Python 3.12, you must follow these steps:
Get the git repository of global-chem
Manually edit
global-chem/global_chem_extensions/requirements.txtBuild and install it
git clone git@github.com:akuroiwa/global-chem.git
cd global_chem_extensions/
After editing the file requirements.txt:
sed -i 's/rdkit-pypi/rdkit/g' requirements.txt
pip install .
General Usage¶
By default, you get a list of molecules from smiles.csv. The target is Nirmatrelvir. From that list, the best material for the fragments is selected. The output csv file also contains molecules created during the execution of mcts. If you want to reuse the csv file as a smiles list, add --select option. If you want to run commands directly without installing the packages, execute just like python3 similarity_mcts.py --help:
similarity-mcts --help
similarity-mcts -i -l1 -e3 -r10 -b500 -p train_smiles
similarity-mcts -i -l1 -e3 -r10 -b500 -p eval_smiles
If you want to specify a target and execute:
similarity-mcts -i -l1 -e3 -r10 -b500 -p train_smiles -t "CC(C)(C)C(NC(=O)C(F)(F)F)C(=O)N1CC2C(C1C1CCNC1=O)C2(C)C"
similarity-mcts -i -l1 -e3 -r10 -b500 -p eval_smiles -t "CC(C)(C)C(NC(=O)C(F)(F)F)C(=O)N1CC2C(C1C1CCNC1=O)C2(C)C"
similarity-mcts selects and outputs the candidates that can be the material of the fragments from the smiles list. If you just want to output target-like molecules from the smiles list without running mcts:
similarity-genMols --help
similarity-genMols -t "CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C" -m "CC1=CC=CC=C1C(C)C" "Cc1ccccc1CC(C#N)NC1CCNC1=O" -f "gen2.csv"
The StopIteration problem has been fixed since chem-ant 0.1.0, so the similarity-ant command will run without stopping. I plan to continue improving this bug.
In addition, a new --GlobalChem option has been added. This gets smiles from the global-chem database as the material for fragments.
similarity-ant -n20 -g10 -b 1 -p train_smiles -e1 -c electrophilic_warheads_for_kinases
Chem-Classification¶
Output dataset in json format for chem-classification:
importSmiles -t "CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C" -p "train_smiles"
importSmiles -t "CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C" -p "eval_smiles"
If you want to output the dataset for regression model:
importSmiles -t "CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C" -p "train_smiles" -r
importSmiles -t "CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C" -p "eval_smiles" -r
Train the classification model and predict the similarity between Nirmatrelvir and YH-53:
from chem_classification.similarity_classification import SimilarityClassification
s = SimilarityClassification()
s.train_and_eval("train_smiles/smiles.json", "eval_smiles/smiles.json")
s.predict_smiles_pair(["CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C", "CC(C)CC(C(=O)NC(CC1CCNC1=O)C(=O)C2=NC3=CC=CC=C3S2)NC(=O)C4=CC5=C(N4)C=CC=C5OC"])
Loading a local save:
s = SimilarityClassification("local-path/your-outputs")
Train regression model to predict similarity between Nirmatrelvir and YH-53:
from chem_classification.similarity_classification import SimilarityRegression
s = SimilarityRegression()
s.train_and_eval("train_smiles/smiles.json", "eval_smiles/smiles.json")
s.predict_smiles_pair(["CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C", "CC(C)CC(C(=O)NC(CC1CCNC1=O)C(=O)C2=NC3=CC=CC=C3S2)NC(=O)C4=CC5=C(N4)C=CC=C5OC"])
Another regression model trained by json files output by similarity-mcts can predict the similarity with the target molecule from the material candidates and cooperate with similarity-ant:
similarity-mcts -i -l2 -e3 -r10 -b100 -p "train_smiles" -f "smiles.json" -j
similarity-mcts -i -l2 -e3 -r10 -b100 -p "eval_smiles" -f "smiles.json" -j
Note
From chem-ant 0.0.7,
I changed it to create datasets with molecular fragments as tokens, so the difference between the two regression models is gone.
Cooperation between chem-classification and similarity-ant (currently not working):
similarity-ant -n20 -g5 -b 1 -p gen_smiles -d -o "local-path/your-outputs"
Cooperation between regression model of chem-classification and similarity-ant:
similarity-ant -n20 -g5 -b 1 -p gen_smiles -r -o "local-path/your-outputs"