MolCluster

Published:

MolecularClustering

  • Clustering molecules into different group based on molecular fingerpirnt and Butina, K-means algorithms
  • To select diverse subset for pharmacophore modeling, docking retrospective control, or just select compounds for HTS

Use butina algorithm to cluster molecules

Requirements

This module requires the following modules:

Installation

Clone this repository to use

Folder segmentation

Finally the folder structure should look like this:

Molph4 (project root)
|__  README.md
|__  MolecularClustering
|__  |__ cluster_visualize 
|    |__ molecules_clustering
|    |__ diversesubset
|__  utility
|__  HIV_integrase.csv
|    
|......

Usage

import os
import sys
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from tqdm import tqdm # progress bar
tqdm.pandas()

sys.path.append('./MolecularClustering')
from molecules_clustering import Butina_clustering, Molecule_clustering
from diversesubset import distance_maxtrix, diverse_subset
from cluster_visualize import cluster_heat_map, cluster_scatter_plot


sys.path.append('./ultility')
from standardize import standardization
df = pd.read_csv("HIV_integrase.csv", index_col=None)
df.head(2)

# Standardize molecules
from rdkit.rdBase import BlockLogs
block = BlockLogs()
std = standardization(data=df,ID='ID', smiles_col='Canomical_smiles', active_col='Activity', ro5 =4)
data = std.filter_data()
data.head(2)


# Butina Clustering
butina = Butina_clustering(df = data, ID = "ID", smiles_col = "StandSmiles", active_col = 'Activity', 
                             mol_col = 'Molecule', activity_thresh = 7, radius= 2, nBits = 2048, 
                             dis_cutoff = 0.65, cps = 5)
active_set, cluster_centers,  df_active = butina.data_processing()

# heatmap visualize
plot = cluster_heat_map(cls_cps = cluster_centers)
plot.visualize_triangle()

# chemical space visualize
plot = cluster_scatter_plot(data=df_active, no_cls= 8, mol_col='Molecule', algo = 'Butina',cluster_col='Cluster',)
plot.visualize()

List of centroids:

Contributing

Please visit the MolCluster repository. Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. Please make sure to update tests as appropriate.