Graph Neural Network for ALK inhibitors

Published:

Abstract

Updating…

Workflow

  • Molecular representation with node and edge attributes
  • Model architecture:
    • 3 layers of GCN
    • 3 layers of fully connected neural networks
  • Cross validation and external validation

Requirements

This module requires the following modules:

Installation

Clone this repository to use

Note

Updating…

Folder segmentation

Finally the folder structure should look like this:

Mol2DSimi (project root)
|__  README.md
|__  MolGNN
|__  |__ GNN_architect.py
|    |__ GNN_ext_valid.py
|    |__ GNN_int_valid.py
|    |__ GNN_train.py
|    |__ MolData.py
|    |__ savemodel.py
|    |__ seed_everything.py
|    |__ visualization.py
|__  Image (saved images)
|__  Tutorial_GNN.ipynb 
|__  Model
|__  data
|__  LICENSE
|......

Usage


import sys
sys.path.append('./MolGNN')
from seed_everything import seed_everything
from visualization import historical_plot
from savemodel import SaveModel
from GNN_architect import GNN
from GNN_train import train
from GNN_int_valid import evaluate
from GNN_ext_valid import external_evaluate
from MolData import  MoleculeDataset


# 1. Data interagion

## 1a. Creating dataset
seed_everything(42)
train_dataset = MoleculeDataset(root = "./data/",
                                filename = "ALK_train.csv")
test_dataset = MoleculeDataset(root = "./data/",
                                filename = "ALK_test.csv", test = True)
valid_dataset = MoleculeDataset(root = "./data//",
                                filename = "ALK_val.csv", valid = True)
                                
## 1b Creating Data loader
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True,worker_init_fn=np.random.seed(42))
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=True,worker_init_fn=np.random.seed(42))
valid_loader = DataLoader(valid_dataset, batch_size=64, shuffle=True,worker_init_fn=np.random.seed(42))

# 2. Train
device = torch.device("cuda")
seed_everything(42)
model = GNN(feature_size=30) 
model = model.to(device)
print(model)


import random
import sys
if not sys.warnoptions:
    import warnings
    warnings.simplefilter("ignore")
seed_everything(42)
model = GNN(feature_size=30) 
model = model.to(device)
train_losses = []
validation_losses = []
train_f1s = []
val_f1s = []
train_aps = []
val_aps = []
criterion = torch.nn.BCEWithLogitsLoss()
optimizer = torch.optim.SGD(model.parameters(), 
                        lr=0.01,momentum=0.8,weight_decay=0.0001)
lr_scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.1)
save_best_model = SaveModel()
epochs = 50
for epoch in range(epochs):
    train1_loss,train1_f1, train1_ap = train(model, criterion, optimizer, train_loader)
    val1_loss,val1_f1, val1_ap,validation_loss = evaluate(model, criterion,valid_loader)
    train_losses.append(train1_loss)
    validation_losses.append(val1_loss)
    train_f1s.append(train1_f1)
    val_f1s.append(val1_f1)
    train_aps.append(train1_ap)
    val_aps.append(val1_ap)
    lr_scheduler.step(val1_ap) 
    #save_best_model(val1_ap, epoch, model, optimizer, criterion)
    
    if (epoch+1) % 5 == 0:
        print("Epoch: {}/{}.. ".format(epoch+1, epochs),
              "Training Loss: {:.3f}.. ".format(train1_loss),
              "validation Loss: {:.3f}.. ".format(val1_loss),
            "validation f1_score: {:.3f}.. ".format(val1_f1),
                  "validation average precision: {:.3f}.. ".format(val1_ap),
             )
    
save_best_model(epochs, model, optimizer, criterion)        

# 3. Evaluation
external_evaluate(model, criterion, test_loader)


Contributing

Please visit the GNN-ALK repository. Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. Please make sure to update tests as appropriate.