API
Chem Module
The Chem module provides tools for handling input and output operations related to the chemical converter. It allows seamless interaction with various chemical data formats.
- class synkit.Chem.Reaction.canon_rsmi.CanonRSMI(backend: str = 'wl', wl_iterations: int = 3, morgan_radius: int = 3, node_attrs: List[str] = ('element', 'aromatic', 'charge', 'hcount'))
Bases:
object
A pure-Python / pure-NetworkX utility for canonicalizing reaction SMILES by expanding atom-maps and deterministically reindexing reaction graphs.
Workflow
Expand atom-maps on reactants to ensure each atom has a unique map ID.
Convert reaction SMILES to reactant/product NetworkX graphs.
Canonicalize the reactant graph using GraphCanonicaliser (generic or WL backend).
Match atom-map IDs to compute pairwise indices between reactants and products.
Remap the product graph to align with the canonical reactant ordering.
Sync each node’s atom_map attribute to its new graph index.
Reassemble the reaction SMILES from the canonical graphs.
Classes
CanonRSMI – Main interface for transforming any reactants>>products SMILES into a canonicalized form, preserving all node and edge attributes.
Example
>>> from canonical_rsm import CanonRSMI >>> canon = CanonRSMI(backend='wl', wl_iterations=5) >>> result = canon.canonicalise('[CH3:3][CH2:5][OH:10]>>[CH2:3]=[CH2:5].[OH2:10]') >>> print(result.canonical_rsmi) [OH:1][CH2:3][CH3:2]>>[CH2:2]=[CH2:3].[OH2:1]
- property canonical_hash: str | None
Reaction-level hash combining reactant and product canonical hashes.
- property canonical_product_graph: Graph | None
NetworkX graph of canonicalised products.
- property canonical_reactant_graph: Graph | None
NetworkX graph of canonicalised reactants.
- property canonical_rsmi: str | None
Canonical SMILES after processing.
- canonicalise(rsmi: str) CanonRSMI
- Full pipeline returning self with properties populated:
raw_rsmi
raw_reactant_graph, raw_product_graph
mapping_pairs
canonical_reactant_graph, canonical_product_graph
canonical_rsmi
- expand_aam(rsmi: str) str
Assign new atom-map IDs to unmapped reactant atoms in ‘reactants>>products’ SMILES.
New IDs start at max(existing maps)+1.
- static get_aam_pairwise_indices(G: Graph, H: Graph, aam_key: str = 'atom_map') List[Tuple[int, int]]
Return sorted list of (G_node, H_node) for shared atom-map IDs.
- help() None
Pretty-print the class doc and public methods with signatures.
- property mapping_pairs: List[Tuple[int, int]] | None
List of atom-map index pairs between reactants and products.
- property raw_product_graph: Graph | None
NetworkX graph of raw products.
- property raw_reactant_graph: Graph | None
NetworkX graph of raw reactants.
- property raw_rsmi: str | None
Original SMILES before canonicalisation.
- static remap_graph(G: Graph, node_map: List[int] | List[Tuple[int, int]]) Graph
Remap a product graph to match a canonical reactant ordering:
- Parameters:
G (nx.Graph) – reactant graph
mapping (dict[int,int]) – mapping from old product node IDs to new IDs
- Returns:
remapped product graph
- Return type:
nx.Graph
- static sync_atom_map_with_index(G: Graph) None
In-place: set each node’s ‘atom_map’ attribute to its node ID.
- class synkit.Chem.Reaction.standardize.Standardize
Bases:
object
Utilities to normalize and filter reaction and molecule SMILES.
This class provides methods to remove atom‑mapping, filter invalid molecules, canonicalize reaction SMILES, and a full pipeline via fit.
- Variables:
None – Stateless helper class.
- static categorize_reactions(reactions: List[str], target_reaction: str) Tuple[List[str], List[str]]
Partition reactions into those matching a target and those not.
- Parameters:
reactions (List[str]) – List of reaction SMILES to categorize.
target_reaction (str) – Benchmark reaction SMILES for comparison.
- Returns:
Tuple of (matches, non_matches): - matches: reactions equal to standardized target - non_matches: all others
- Return type:
Tuple[List[str], List[str]]
- static filter_valid_molecules(smiles_list: List[str]) List[Mol]
Filter and sanitize a list of SMILES, returning only valid Mol objects.
- Parameters:
smiles_list (List[str]) – List of SMILES strings to validate.
- Returns:
List of sanitized RDKit Mol objects.
- Return type:
List[rdkit.Chem.Mol]
- fit(rsmi: str, remove_aam: bool = True, ignore_stereo: bool = True) str | None
Full standardization pipeline: strip atom‑mapping, normalize SMILES, fix hydrogen notation.
- Parameters:
rsmi (str) – Reaction SMILES to process.
remove_aam (bool) – If True, remove atom‑mapping annotations. Defaults to True.
ignore_stereo (bool) – If True, drop stereochemistry. Defaults to True.
- Returns:
The standardized reaction SMILES, or None if standardization fails.
- Return type:
Optional[str]
- static remove_atom_mapping(reaction_smiles: str, symbol: str = '>>') str
Remove atom‑map numbers from a reaction SMILES string.
- Parameters:
reaction_smiles (str) – Reaction SMILES with atom maps, e.g. ‘C[CH3:1]>>C’.
symbol (str) – Separator between reactants and products. Defaults to ‘>>’.
- Returns:
Reaction SMILES without atom‑mapping annotations.
- Return type:
str
- Raises:
ValueError – If the input format is invalid or contains invalid SMILES.
- static standardize_rsmi(rsmi: str, stereo: bool = False) str | None
Normalize a reaction SMILES: validate molecules, sort fragments, optionally keep stereo.
- Parameters:
rsmi (str) – Reaction SMILES in ‘reactants>>products’ format.
stereo (bool) – If True, include stereochemistry in the output. Defaults to False.
- Returns:
Standardized reaction SMILES or None if no valid molecules remain.
- Return type:
Optional[str]
- Raises:
ValueError – If the input format is invalid.
- class synkit.Chem.Reaction.aam_validator.AAMValidator
Bases:
object
A utility class for validating atom‐atom mappings (AAM) in reaction SMILES.
Provides methods to compare mapped SMILES against ground truth by using reaction‐center (RC) or ITS‐graph isomorphism checks, including tautomer enumeration support and batch validation over tabular data.
Quick start
>>> from synkit.Chem.Reaction import AAMValidator >>> validator = AAMValidator() >>> rsmi_1 = ( '[CH3:1][C:2](=[O:3])[OH:4].[CH3:5][OH:6]' '>>' '[CH3:1][C:2](=[O:3])[O:6][CH3:5].[OH2:4]') >>> rsmi_2 = ( '[CH3:5][C:1](=[O:2])[OH:3].[CH3:6][OH:4]' '>>' '[CH3:5][C:1](=[O:2])[O:4][CH3:6].[OH2:3]') >>> is_eq = validator.smiles_check(rsmi_1, rsmi_2, check_method='ITS') >>> print(is_eq) >>> True
- static check_equivariant_graph(its_graphs: List[Graph]) Tuple[List[Tuple[int, int]], int]
Identify all pairs of isomorphic ITS graphs.
- Parameters:
its_graphs (list of networkx.Graph) – A list of ITS graphs to compare.
- Returns:
A list of index‐pairs (i, j) where its_graphs[i] is isomorphic to its_graphs[j].
The total count of such isomorphic pairs.
- Return type:
tuple (list of tuple of int, int, int)
- static check_pair(mapping: Dict[str, str], mapped_col: str, ground_truth_col: str, check_method: str = 'RC', ignore_aromaticity: bool = False, ignore_tautomers: bool = True) bool
Validate a single record (dict) entry for equivalence.
- Parameters:
mapping (dict of str→str) – A record containing both mapped and ground‐truth SMILES.
mapped_col (str) – Key for the mapped SMILES in mapping.
ground_truth_col (str) – Key for the ground-truth SMILES in mapping.
check_method (str) – “RC” or “ITS”.
ignore_aromaticity (bool) – If True, ignore aromaticity in ITS construction.
ignore_tautomers (bool) – If True, skip tautomer enumeration.
- Returns:
Validation result for this single pair.
- Return type:
bool
- static smiles_check(mapped_smile: str, ground_truth: str, check_method: str = 'RC', ignore_aromaticity: bool = False) bool
Validate a single mapped SMILES string against ground truth.
- Parameters:
mapped_smile (str) – The mapped SMILES to validate.
ground_truth (str) – The reference SMILES string.
check_method (str) – Which method to use: “RC” for reaction‐center graph or “ITS” for full ITS‐graph isomorphism.
ignore_aromaticity (bool) – If True, ignore aromaticity differences in ITS construction.
- Returns:
True if exactly one isomorphic match is found; False otherwise.
- Return type:
bool
- static smiles_check_tautomer(mapped_smile: str, ground_truth: str, check_method: str = 'RC', ignore_aromaticity: bool = False) bool | None
Validate against all tautomers of a ground truth SMILES.
- Parameters:
mapped_smile (str) – The mapped SMILES to test.
ground_truth (str) – The reference SMILES for generating tautomers.
check_method (str) – “RC” or “ITS” as in smiles_check.
ignore_aromaticity (bool) – If True, ignore aromaticity in ITS construction.
- Returns:
True if any tautomer matches.
False if none match.
None if an error occurs.
- Return type:
bool or None
- static validate_smiles(data: DataFrame | List[Dict[str, str]], ground_truth_col: str = 'ground_truth', mapped_cols: List[str] = ['rxn_mapper', 'graphormer', 'local_mapper'], check_method: str = 'RC', ignore_aromaticity: bool = False, n_jobs: int = 1, verbose: int = 0, ignore_tautomers: bool = True) List[Dict[str, str | float | List[bool]]]
Batch-validate mapped SMILES in tabular or list-of-dicts form.
- Parameters:
data (pandas.DataFrame or list of dict) – A pandas DataFrame or list of dicts, each row containing at least ground_truth_col and each entry in mapped_cols.
ground_truth_col (str) – Column/key name for the ground-truth SMILES.
mapped_cols (list of str) – List of column/key names for mapped SMILES to validate.
check_method (str) – “RC” or “ITS” validation method.
ignore_aromaticity (bool) – If True, ignore aromaticity in ITS construction.
n_jobs (int) – Number of parallel jobs to use (joblib).
verbose (int) – Verbosity level for parallel execution.
ignore_tautomers (bool) – If True, use simple pairwise check; otherwise enumerate tautomers.
- Returns:
A list of dicts, one per mapper, with keys: - “mapper”: the mapper name - “accuracy”: percentage correct (float) - “results”: list of individual bool results - “success_rate”: mapping success rate metric
- Return type:
list of dict
- Raises:
ValueError – If data is not a DataFrame or list of dicts.
- class synkit.Chem.Reaction.balance_check.BalanceReactionCheck(n_jobs: int = 4, verbose: int = 0)
Bases:
object
Check elemental balance of chemical reactions in SMILES format.
Supports checking single reactions, reaction dictionaries, or lists in parallel.
- Variables:
n_jobs – Number of parallel jobs for batch checking.
verbose – Verbosity level for joblib.
- static dict_balance_check(reaction_dict: Dict[str, str], rsmi_column: str) Dict[str, Any]
Check balance for a single reaction dict, preserving original keys.
- Parameters:
reaction_dict (Dict[str, str]) – Dict containing at least a rsmi_column key.
rsmi_column (str) – Key for reaction SMILES in reaction_dict.
- Returns:
Original dict augmented with “balanced”: bool.
- Return type:
Dict[str, Any]
- dicts_balance_check(input_data: str | List[str | Dict[str, str]], rsmi_column: str = 'reactions') Tuple[List[Dict[str, Any]], List[Dict[str, Any]]]
Batch‐check balance for multiple reactions, in parallel.
- Parameters:
input_data (Union[str, List[Union[str, Dict[str, str]]]]) – Single reaction SMILES, list of SMILES, or list of dicts.
rsmi_column (str) – Key for reaction SMILES in each dict. Defaults to “reactions”.
- Returns:
Tuple (balanced_list, unbalanced_list) of dicts each including “balanced”.
- Return type:
Tuple[List[Dict[str, Any]], List[Dict[str, Any]]]
- static get_combined_molecular_formula(smiles: str) str
Compute the molecular formula of a SMILES.
- Parameters:
smiles (str) – SMILES string of the molecule.
- Returns:
Elemental formula (e.g., “C6H6”) or empty string if invalid.
- Return type:
str
- static parse_input(input_data: str | List[str | Dict[str, str]], rsmi_column: str = 'reactions') List[Dict[str, str]]
Normalize input into a list of reaction‐dicts.
- Parameters:
input_data (str or List[Union[str, Dict[str, str]]]) – A single SMILES, list of SMILES, or list of dicts containing rsmi_column.
rsmi_column (str) – Key in dicts for the reaction SMILES. Defaults to “reactions”.
- Returns:
List of dicts with a single key rsmi_column mapping to each reaction.
- Return type:
List[Dict[str, str]]
- Raises:
ValueError – If input_data is neither str nor list.
- static parse_reaction(reaction_smiles: str) Tuple[str, str]
Split a reaction SMILES into reactant and product SMILES strings.
- Parameters:
reaction_smiles (str) – Reaction SMILES in ‘reactants>>products’ format.
- Returns:
Tuple of (reactants, products) SMILES.
- Return type:
Tuple[str, str]
- static rsmi_balance_check(reaction_smiles: str) bool
Determine if a reaction SMILES is elementally balanced.
- Parameters:
reaction_smiles (str) – Reaction SMILES in ‘reactants>>products’ format.
- Returns:
True if reactant and product formulas match, else False.
- Return type:
bool
- class synkit.Chem.Fingerprint.fp_calculator.FPCalculator(n_jobs: int = 1, verbose: int = 0)
Bases:
object
Calculate fingerprint vectors for chemical reactions represented by SMILES strings.
- Variables:
fps (TransformationFP) – Shared fingerprint engine instance.
VALID_FP_TYPES (List[str]) – Supported fingerprint type identifiers.
- Parameters:
n_jobs (int) – Number of parallel jobs to use for batch processing.
verbose (int) – Verbosity level for parallel execution.
- VALID_FP_TYPES: List[str] = ['drfp', 'avalon', 'maccs', 'torsion', 'pharm2D', 'ecfp2', 'ecfp4', 'ecfp6', 'fcfp2', 'fcfp4', 'fcfp6', 'rdk5', 'rdk6', 'rdk7', 'ap']
- static dict_process(data_dict: Dict[str, Any], rsmi_key: str, symbol: str = '>>', fp_type: str = 'ecfp4', absolute: bool = True) Dict[str, Any]
Compute a fingerprint for a single reaction SMILES entry and add it to the dict.
- Parameters:
data_dict (dict) – Dictionary containing reaction data.
rsmi_key (str) – Key in data_dict for the reaction SMILES string.
symbol (str) – Delimiter between reactant and product in the SMILES.
fp_type (str) – Fingerprint type to compute.
absolute (bool) – Whether to take absolute values of the fingerprint difference.
- Returns:
The input dictionary with a new key fp_{fp_type} holding the fingerprint vector.
- Return type:
dict
- Raises:
ValueError – If rsmi_key is missing in data_dict.
- fps: TransformationFP = <TransformationFP>
- help() None
Print details about supported fingerprint types and usage.
- Returns:
None
- Return type:
NoneType
- parallel_process(data_dicts: List[Dict[str, Any]], rsmi_key: str, symbol: str = '>>', fp_type: str = 'ecfp4', absolute: bool = True) List[Dict[str, Any]]
Compute fingerprints for a batch of reaction dictionaries in parallel.
- Parameters:
data_dicts (list of dict) – List of dictionaries, each containing a reaction SMILES.
rsmi_key (str) – Key in each dict for the reaction SMILES string.
symbol (str) – Delimiter between reactant and product in the SMILES.
fp_type (str) – Fingerprint type to compute.
absolute (bool) – Whether to take absolute values of the fingerprint difference.
- Returns:
A list of dictionaries augmented with fp_{fp_type} entries.
- Return type:
list of dict
- Raises:
ValueError – If fp_type is unsupported or any dict is missing rsmi_key.
- class synkit.Chem.Cluster.butina.ButinaCluster
Bases:
object
Cluster chemical fingerprint vectors using the Butina algorithm from RDKit, with integrated t-SNE visualization of clusters.
Key features
Butina clustering – fast hierarchical clustering with a similarity cutoff.
t-SNE visualization – 2D embedding of fingerprints, highlighting top‑k clusters.
NumPy support – accepts 2D arrays of 0/1 fingerprint data.
Configurable – user‑defined cutoff, perplexity, and top‑k highlight.
Quick start
>>> from synkit.Chem.Fingerprint.fingerprint_clusterer import ButinaCluster >>> clusters = ButinaCluster.cluster(arr, cutoff=0.3) >>> ButinaCluster.visualize(arr, clusters, k=5)
- static cluster(arr: ndarray, cutoff: float = 0.2) List[List[int]]
Perform Butina clustering on fingerprint bit-vectors.
- Parameters:
arr (np.ndarray) – 2D array of shape (n_samples, n_bits) with 0/1 dtype.
cutoff (float) – Distance cutoff (1 – similarity) to form clusters. Defaults to 0.2.
- Returns:
List of clusters, each a list of sample indices.
- Return type:
list of list of int
- help() None
Print usage summary for clustering and visualization.
- Returns:
None
- Return type:
NoneType
- static visualize(arr: ndarray, clusters: List[List[int]], k: int | None = None, perplexity: float = 30.0, random_state: int = 42) None
Visualize clusters in 2D via t-SNE embedding.
- Parameters:
arr (np.ndarray) – 2D array of shape (n_samples, n_features) with fingerprint data.
clusters (list of list of int) – Clusters as returned by cluster().
k (int or None) – If provided, highlight only the top‑k largest clusters; others shown as ‘Other’.
perplexity (float) – t-SNE perplexity parameter. Defaults to 30.0.
random_state (int) – Random seed for reproducibility. Defaults to 42.
- Returns:
None
- Return type:
NoneType
- Example:
>>> clusters = ButinaCluster.cluster(arr, cutoff=0.3) >>> ButinaCluster.visualize(arr, clusters, k=5)
Synthesis Module
- class synkit.Synthesis.Reactor.syn_reactor.SynReactor(substrate: str | Graph | SynGraph, template: str | Graph | SynRule, invert: bool = False, canonicaliser: GraphCanonicaliser | None = None, explicit_h: bool = True, implicit_temp: bool = False, strategy: Strategy | str = Strategy.ALL, partial: bool = False, embed_threshold: int | None = None, embed_pre_filter: bool = False, automorphism: bool = False)
Bases:
object
A hardened and typed re-write of the original SynReactor, preserving API compatibility while offering safer, faster, and cleaner behavior.
- Parameters:
substrate (Union[str, nx.Graph, SynGraph]) – The input reaction substrate, as a SMILES string, a raw NetworkX graph, or a SynGraph.
template (Union[str, nx.Graph, SynRule]) – Reaction template, provided as SMILES/SMARTS, a raw NetworkX graph, or a SynRule.
invert (bool) – Whether to invert the reaction (predict precursors). Defaults to False.
canonicaliser (Optional[GraphCanonicaliser]) – Optional canonicaliser for intermediate graphs. If None, a default GraphCanonicaliser is used.
explicit_h (bool) – If True, render all hydrogens explicitly in the reaction-center SMARTS. Defaults to True.
implicit_temp (bool) – If True, treat the input template as implicit-H (forces explicit_h=False). Defaults to False.
strategy (Strategy or str) – Matching strategy, one of Strategy.ALL, ‘comp’, or ‘bt’. Defaults to Strategy.ALL.
partial (bool) – If True, use a partial matching fallback. Defaults to False.
- Variables:
_graph (Optional[SynGraph]) – Cached SynGraph for the substrate.
_rule (Optional[SynRule]) – Cached SynRule for the template.
_mappings (Optional[List[MappingDict]]) – Cached list of subgraph-mapping dicts.
_its (Optional[List[nx.Graph]]) – Cached list of ITS graphs.
_smarts (Optional[List[str]]) – Cached list of SMARTS strings.
_flag_pattern_has_explicit_H (bool) – Internal flag indicating explicit-H constraints.
- automorphism: bool = False
- canonicaliser: GraphCanonicaliser | None = None
- embed_pre_filter: bool = False
- embed_threshold: int | None = None
- explicit_h: bool = True
- classmethod from_smiles(smiles: str, template: str | Graph | SynRule, *, invert: bool = False, canonicaliser: GraphCanonicaliser | None = None, explicit_h: bool = True, implicit_temp: bool = False, strategy: Strategy | str = Strategy.ALL) SynReactor
Alternate constructor: build a SynReactor directly from SMILES.
- Parameters:
smiles (str) – SMILES string for the substrate.
template (str or networkx.Graph or SynRule) – Reaction template (SMILES/SMARTS string, Graph, or SynRule).
invert (bool) – If True, perform backward prediction (target→precursors). Defaults to False (forward prediction).
canonicaliser (GraphCanonicaliser or None) – Optional GraphCanonicaliser to use for internal graphs.
explicit_h (bool) – If True, keep explicit hydrogens in the reaction center.
implicit_temp (bool) – If True, treat the template as implicit-H (forces explicit_h=False).
strategy (Strategy or str) – Matching strategy: ALL, ‘comp’, or ‘bt’. Defaults to ALL.
- Returns:
A new SynReactor instance.
- Return type:
- property graph: SynGraph
Lazily wrap the substrate into a SynGraph.
- Returns:
The reaction substrate as a SynGraph.
- Return type:
SynGraph
- help(print_results=False) None
- implicit_temp: bool = False
- invert: bool = False
- property its
- property its_list: List[Graph]
Build ITS graphs for each subgraph mapping.
- Returns:
A list of ITS (Internal Transition State) graphs.
- Return type:
list of networkx.Graph
- property mapping_count
Number of mappings
- property mappings: List[Dict[Any, Any]]
Return unique sub‑graph mappings, optionally pruned via automorphisms.
- partial: bool = False
- property rule: SynRule
Lazily wrap the template into a SynRule.
- Returns:
The reaction template as a SynRule.
- Return type:
SynRule
- property smarts
- property smarts_list: List[str]
Serialise each ITS graph to a reaction-SMARTS string.
- Returns:
A list of SMARTS strings (inverted if invert=True).
- Return type:
list of str
- property smiles_list
- strategy: Strategy | str = 'all'
- substrate: str | Graph | SynGraph
- property substrate_smiles
- template: str | Graph | SynRule
- class synkit.Synthesis.Reactor.mod_reactor.MODReactor(substrate: str | List[str], rule_file: str | Path, *, invert: bool = False, strategy: str | Strategy = Strategy.BACKTRACK, verbosity: int = 0, print_results: bool = False)
Bases:
object
Lazy, ergonomic wrapper around the MØD toolkit’s derivation pipeline.
Workflow
Instantiate: give substrate SMILES and a rule GML (path or string).
Call .run() to execute the reaction strategy.
Inspect results via .get_reaction_smiles(), .product_sets, .get_dg(), etc.
Attributes
- initial_smilesList[str]
List of SMILES strings for reactants (or products, if inverted).
- rule_filePath
Filesystem path or raw GML string or raw smart with AAM for the reaction rule.
- invertbool
If True, apply the rule in reverse (products → reactants).
- strategyStrategy
One of ALL, COMPONENT, or BACKTRACK.
- verbosityint
Verbosity level for the MØD DG.apply() call.
- print_resultsbool
If True, prints the derivation graph to stdout.
- static generate_reaction_smiles(temp_results: List[List[str]], base_smiles: str, *, invert: bool = False, arrow: str = '>>', separator: str = '.') List[str]
Build reaction SMILES of the form “A>>B”, where A and B swap roles if invert=True.
Parameters
- temp_resultsList[List[str]]
Batches of product (or reactant) SMILES.
- base_smilesstr
The “other side” of the reaction: the reactant side when invert=False, or the product side when invert=True.
- invertbool
If False, generates “base_smiles>>joined_batch”; if True, generates “joined_batch>>base_smiles”.
- arrowstr
The reaction arrow to use (default “>>”).
- separatorstr
How to join multiple SMILES in a batch (default “.”).
Returns
- List[str]
Reaction SMILES strings, one per batch.
- get_dg() None
Access the underlying derivation graph.
Returns
- DG
The MØD derivation graph constructed during .run().
Raises
- RuntimeError
If .run() has not yet been called.
- get_reaction_smiles() List[str]
Retrieve the reaction SMILES strings (lazy).
Returns
- List[str]
List of reaction SMILES, in “A>>B” format.
- help() None
Print a one-page summary of reactor configuration and results.
- property prediction_count: int
Number of distinct prediction batches generated.
- property product_sets: List[List[str]]
Raw product sets (lists of SMILES) before joining into full reactions.
- property product_smiles: List[str]
Flattened list of all product SMILES (may contain duplicates).
- property reaction_smiles: List[str]
Lazy-loaded reaction SMILES strings of form “A>>B”.
Returns
List[str]
- run() MODReactor
Execute the chosen strategy once and return self so you can chain:
`python r = MODReactor(...).run() smiles = r.get_reaction_smiles() `
- class synkit.Synthesis.Reactor.mod_aam.MODAAM(substrate: str | List[str], rule_file: str | Path, *, invert: bool = False, strategy: str | Strategy = Strategy.BACKTRACK, verbosity: int = 0, print_results: bool = False, check_isomorphic: bool = True)
Bases:
object
Runs MØD (via MODReactor) then a full AAM/ITS post-processing pipeline.
Parameters
- substrateUnion[str, List[str]]
Dot-delimited SMILES or list of SMILES for reactants.
- rule_fileUnion[str, Path]
GML rule file path or raw GML/SMARTS string.
- invertbool, optional
If True, apply the rule in reverse (default False).
- strategyUnion[str, Strategy], optional
Matching strategy: ALL, COMPONENT, or BACKTRACK (default BACKTRACK).
- verbosityint, optional
Verbosity for MODReactor (default 0).
- print_resultsbool, optional
If True, print the derivation graph (default False).
- check_isomorphicbool, optional
If True, deduplicate results by isomorphism (default True).
- property dg: Any
The MØD derivation graph (DG).
- get_reaction_smiles() List[str]
Alias for accessing the processed reaction SMILES.
- get_smarts() List[str]
Synonym for .get_reaction_smiles().
- help() None
Print a summary of inputs and outputs.
- property product_count: int
Number of product SMILES generated.
- property reaction_smiles: List[str]
The post-processed reaction SMILES.
- run() List[str]
Re-run the entire pipeline (MØD + AAM) and return fresh results.
- synkit.Synthesis.Reactor.mod_aam.expand_aam(rsmi: str, rule: str) List[str]
Expand Atom–Atom Mapping (AAM) for a given reaction SMARTS/SMILES (rsmi) using a pre‐sanitized GML rule string.
Parameters
- rsmistr
Reaction SMILES/SMARTS in ‘reactants>>products’ form.
- rulestr
A GML rule string (already sanitized upstream).
Returns
- List[str]
All reaction SMILES from MODAAM whose standardized form matches rsmi.
Graph Module
ITS Submodule
The ITS submodule provides tools for constructing, decomposing, and validating ITS (input-transformation-output) graphs.
its_construction: Functions for constructing an ITS graph.
its_decompose: Functions for decomposing an ITS graph and extracting reaction center.
its_expand: Functions for expanding partial ITS graphs into full ITS graphs.
- class synkit.Graph.ITS.its_construction.ITSConstruction
Bases:
object
- CORE_EDGE_DEFAULTS: Dict[str, Any] = {'bond_type': '', 'conjugated': False, 'ez_isomer': '', 'in_ring': False, 'order': 0.0}
- CORE_NODE_DEFAULTS: Dict[str, Any] = {'aromatic': False, 'atom_map': 0, 'charge': 0, 'element': '*', 'hcount': 0, 'neighbors': <function ITSConstruction.<lambda>>}
- static ITSGraph(G: Graph, H: Graph, ignore_aromaticity: bool = False, attributes_defaults: Dict[str, Any] | None = None, balance_its: bool = False, store: bool = False) Graph
Backward-compatible wrapper that replicates the original ITSGraph signature while delegating to the improved construct implementation.
- Parameters:
G (nx.Graph) – The first input graph (reactant).
H (nx.Graph) – The second input graph (product).
ignore_aromaticity (bool) – If True, small order differences are treated as zero.
attributes_defaults (dict[str, Any] or None) – Defaults to use when a node attribute is missing.
balance_its (bool) – If True, base selection is balanced toward the smaller graph.
store (bool) – If True, keep full per-attribute tuples; if False, keep only G-side values.
- Returns:
Constructed ITS graph with legacy node attribute ordering.
- Return type:
nx.Graph
- static construct(G: Graph, H: Graph, *, ignore_aromaticity: bool = False, balance_its: bool = True, store: bool = True, node_attrs: List[str] | None = None, edge_attrs: List[str] | None = None, attributes_defaults: Dict[str, Any] | None = None) Graph
Construct an ITS graph by merging nodes and edges from G and H, preserving nodes present only in one graph and filling missing-side attributes with defaults.
Node-level attributes are always reflected in typesGH as ((G_tuple), (H_tuple)) over node_attrs. If store=True, the individual attributes are stored as (G_value, H_value) tuples under their own keys; if store=False, only the G-side value is stored under each attribute key.
- Parameters:
G (nx.Graph) – The first input graph (reactant-like).
H (nx.Graph) – The second input graph (product-like).
ignore_aromaticity (bool) – If True, small differences in bond order (<1) are treated as zero.
balance_its (bool) – If True, choose the smaller graph (by node count) as the base; otherwise the larger.
store (bool) – If True, keep per-attribute (G,H) tuples; if False, keep only the G-side value per attribute.
node_attrs (list[str] or None) – Ordered list of node attribute names to include in the node-level typesGH tuples. Defaults to [“element”, “aromatic”, “hcount”, “charge”, “neighbors”].
edge_attrs (list[str] or None) – (Legacy) ordered list of edge attribute names; not used for core behavior.
attributes_defaults (dict[str, Any] or None) – Optional overrides for default node attribute values.
- Returns:
ITS graph with merged node and edge annotations, including typesGH, order, and standard_order.
- Return type:
nx.Graph
- static get_node_attribute(graph: Graph, node: Hashable, attribute: str, default: Any) Any
Retrieve a node attribute or return a default if missing.
- static get_node_attributes_with_defaults(graph: Graph, node: Hashable, attributes_defaults: Dict[str, Any] = None) Tuple
Retrieve multiple node attributes applying provided simple defaults.
- static typesGH_info(node_attrs: List[str] | None = None, edge_attrs: List[str] | None = None) Dict[str, Dict[str, Tuple[type, Any]]]
Provide expected types and default values for interpreting typesGH tuples.
- Parameters:
node_attrs (list[str] or None) – List of node attributes used in the node-level typesGH.
edge_attrs (list[str] or None) – List of edge attributes used in the edge-level typesGH.
- Returns:
Nested dict describing (type, default) for each selected attribute.
- Return type:
dict[str, dict[str, tuple[type, Any]]]
- synkit.Graph.ITS.its_decompose.get_rc(ITS: Graph, element_key: List[str] = ['element', 'charge', 'typesGH', 'atom_map'], bond_key: str = 'order', standard_key: str = 'standard_order', disconnected: bool = False, keep_mtg: bool = False) Graph
Extract the reaction-center (RC) subgraph from an ITS graph.
- synkit.Graph.ITS.its_decompose.its_decompose(its_graph: Graph, nodes_share='typesGH', edges_share='order')
Decompose an ITS graph into two separate reactant (G) and product (H) graphs.
- Nodes and edges in its_graph carry composite attributes:
Each node has its_graph.nodes[nodes_share] = (node_attrs_G, node_attrs_H).
Each edge has its_graph.edges[edges_share] = (order_G, order_H).
This function splits those tuples to reconstruct the original G and H graphs.
- Parameters:
its_graph (nx.Graph) – The ITS graph with composite node/edge attributes.
nodes_share (str) – Node attribute key storing (G_attrs, H_attrs) tuples.
edges_share (str) – Edge attribute key storing (order_G, order_H) tuples.
- Returns:
A tuple of two graphs (G, H) reconstructed from the ITS.
- Return type:
Tuple[nx.Graph, nx.Graph]
- Example:
>>> its = nx.Graph() >>> # ... set its.nodes[n]['typesGH'] and its.edges[e]['order'] ... >>> G, H = its_decompose(its) >>> isinstance(G, nx.Graph) and isinstance(H, nx.Graph) True
- class synkit.Graph.ITS.its_expand.ITSExpand
Bases:
object
Partially expand a reaction SMILES (RSMI) by reconstructing intermediate transition states (ITS) and applying transformation rules based on the reaction center graph.
This class identifies the reaction center from an RSMI, builds and reconstructs the ITS graph, decomposes it back into reactants and products, and standardizes atom mappings to produce a fully mapped AAM RSMI.
- Variables:
std – Standardize instance for reaction SMILES standardization.
- static expand_aam_with_its(rsmi: str, relabel: bool = False, use_G: bool = True) str
Expand a partial reaction SMILES to a full AAM RSMI using ITS reconstruction.
- Parameters:
rsmi (str) – Reaction SMILES string in the format ‘reactant>>product’.
use_G (bool) – If True, expand using the reactant side; otherwise use the product side.
light_weight (bool) – Flag indicating whether to apply a lighter-weight standardization.
- Returns:
Fully atom-mapped reaction SMILES after ITS expansion and standardization.
- Return type:
str
- Raises:
ValueError – If input RSMI format is invalid or ITS reconstruction fails.
- Example:
>>> expander = ITSExpand() >>> expander.expand_aam_with_its("CC[CH2:3][Cl:1].[N:2]>>CC[CH2:3][N:2].[Cl:1]") '[CH3:1][CH2:2][CH2:3][Cl:4].[N:5]>>[CH3:1][CH2:2][CH2:3][N:5].[Cl:4]'
Matcher Submodule
The synkit.Graph.Matcher
package provides comprehensive tools for graph comparison, subgraph search, and clustering. It is organized into four main areas:
Matching Engines Perform graph‐to‐graph and subgraph isomorphism checks: -
GraphMatcherEngine
-SubgraphSearchEngine
Single-Graph Clustering Cluster a single graph’s nodes or components: -
graph_cluster
Batch Clustering Process and cluster multiple graphs in parallel: -
batch_cluster
High-Throughput Isomorphism Specialized routines for multi-pattern searches in a host graph: -
sing
-turbo_iso
Matching Engines
- class synkit.Graph.Matcher.graph_matcher.GraphMatcherEngine(*, backend: str = 'nx', node_attrs: List[str] | None = None, edge_attrs: List[str] | None = None, wl1_filter: bool = False, max_mappings: int | None = 1)
Bases:
object
Reusable engine for (sub‑)graph isomorphism checks & embeddings.
Parameters
- backend:
"nx"
(default) – pure‑Python implementation that relies on
GraphMatcher
. *"rule"
– optional, requires the third‑party mod package.- node_attrs, edge_attrs:
Lists of attribute keys that must match exactly between candidate nodes/edges.
hcount
is treated specially – the host must be ≥ the pattern (to allow aggregated counts).- wl1_filter:
If True, a fast WL‑based colour refinement pre‑filter discards host graphs that cannot possibly contain the pattern.
- max_mappings:
Upper bound on the number of mappings to enumerate in
get_mappings()
. None means “no limit”.
- static available_backends() List[str]
- get_mappings(host: Any, pattern: Any) List[Dict[int, int]]
- help() str
- isomorphic(obj1: Any, obj2: Any) bool
- class synkit.Graph.Matcher.subgraph_matcher.SubgraphMatch
Bases:
object
Boolean-only checks for graph isomorphism and subgraph (induced or monomorphic) matching.
Provides static methods for NetworkX-based checks and optional GML “rule” backend.
- static is_subgraph(pattern: Graph | str, host: Graph | str, node_label_names: List[str] = ['element', 'charge'], node_label_default: List[Any] = ['*', 0], edge_attribute: str = 'order', use_filter: bool = False, check_type: str = 'induced', backend: str = 'nx') bool
Unified API for subgraph/isomorphism either via NX or GML backend.
- static rule_subgraph_morphism(rule_1: str, rule_2: str, use_filter: bool = False) bool
Evaluates if two GML-formatted rule representations are isomorphic or one is a subgraph of the other (monomorphic).
Parameters: - rule_1 (str): GML string of the first rule. - rule_2 (str): GML string of the second rule. - use_filter (bool, optional): Whether to filter by node/edge labels and vertex counts.
Returns: - bool: True if the monomorphism condition is met, False otherwise.
- static subgraph_isomorphism(child_graph: Graph, parent_graph: Graph, node_label_names: List[str] = ['element', 'charge'], node_label_default: List[Any] = ['*', 0], edge_attribute: str = 'order', use_filter: bool = False, check_type: str = 'induced', node_comparator: Callable[[Any, Any], bool] | None = None, edge_comparator: Callable[[Any, Any], bool] | None = None) bool
Enhanced checks if the child graph is a subgraph isomorphic to the parent graph based on customizable node and edge attributes.
- class synkit.Graph.Matcher.subgraph_matcher.SubgraphSearchEngine
Bases:
object
Static helper routines for sub-graph monomorphism search.
- Variables:
DEFAULT_THRESHOLD – default cap on embedding enumeration (5000)
- DEFAULT_THRESHOLD: int = 5000
- static find_subgraph_mappings(host: Graph, pattern: Graph, *, node_attrs: List[str], edge_attrs: List[str], strategy: str | Strategy = Strategy.COMPONENT, max_results: int | None = None, strict_cc_count: bool = True, threshold: int | None = None, pre_filter: bool = False) List[Dict[int, int]]
Dispatch to a subgraph-matching strategy with optional guards.
Parameters
- host, pattern
NetworkX graphs (host ≥ pattern).
- node_attrs, edge_attrs
Keys of attributes to match exactly (plus hcount ≥).
- strategy
Matching strategy code or enum (“all”, “comp”, “bt”).
- max_results
Stop after this many embeddings (None = no limit).
- strict_cc_count
If True, host CC count must ≤ pattern CC count for COMPONENT/BACKTRACK.
- threshold
Override the default cap (DEFAULT_THRESHOLD) on embeddings.
- pre_filter
If True, run a cheap Cartesian-product pre-filter against the threshold.
Returns
List of dictionaries mapping pattern node→host node. Empty if none or if any guard (pre-filter or enumeration) exceeds the threshold.
- property help: str
Return the full module docstring.
Clustering
- class synkit.Graph.Matcher.graph_cluster.GraphCluster(node_label_names: List[str] = ['element', 'charge'], node_label_default: List[Any] = ['*', 0], edge_attribute: str = 'order', backend: str = 'nx')
Bases:
object
- available_backends() List[str]
Return available backends: always includes ‘nx’; adds ‘mode’ if the ‘mod’ package is installed.
- fit(data: List[Dict], rule_key: str = 'gml', attribute_key: str = 'WLHash', strip: bool = False) List[Dict]
Automatically clusters the rules and assigns them cluster indices based on the similarity, potentially using provided templates for clustering, or generating new templates.
Parameters: - data (List[Dict]): A list containing dictionaries, each representing a
rule along with metadata.
rule_key (str): The key in the dictionaries under data where the rule data is stored.
attribute_key (str): The key in the dictionaries under data where rule attributes are stored.
Returns: - List[Dict]: Updated list of dictionaries with an added ‘class’ key for cluster
identification.
- iterative_cluster(rules: List[str], attributes: List[Any] | None = None, nodeMatch: Callable | None = None, edgeMatch: Callable | None = None) Tuple[List[Set[int]], Dict[int, int]]
Clusters rules based on their similarities, which could include structural or attribute-based similarities depending on the given attributes.
Parameters: - rules (List[str]): List of rules, potentially serialized strings of rule
representations.
attributes (Optional[List[Any]]): Attributes associated with each rule for preliminary comparison, e.g., labels or properties.
Returns: - Tuple[List[Set[int]], Dict[int, int]]: A tuple containing a list of sets
(clusters), where each set contains indices of rules in the same cluster, and a dictionary mapping each rule index to its cluster index.
- class synkit.Graph.Matcher.batch_cluster.BatchCluster(node_label_names: List[str] = ['element', 'charge'], node_label_default: List[Any] = ['*', 0], edge_attribute: str = 'order', backend: str = 'nx')
Bases:
object
- available_backends() List[str]
Return available backends: always includes ‘nx’; adds ‘rule’ if the ‘mod’ package is installed.
- static batch_dicts(input_list, batch_size)
Splits a list of dictionaries into batches of a specified size.
Args: input_list (list of dict): The list of dictionaries to be batched. batch_size (int): The size of each batch.
Returns: list of list of dict: A list where each element is a batch (sublist) of dictionaries.
Raises: ValueError: If batch_size is less than 1.
- cluster(data: List[Dict], templates: List[Dict], rule_key: str = 'gml', attribute_key: str = 'WLHash') Tuple[List[Dict], List[Dict]]
Processes a list of graph data entries, classifying each based on existing templates.
Parameters: - data (List[Dict]): A list of dictionaries, each representing a graph or rule
to be classified.
templates (List[Dict]): Dynamic templates used for categorization.
Returns: - Tuple[List[Dict], List[Dict]]: A tuple containing the list of classified data
and the updated templates.
- fit(data: List[Dict], templates: List[Dict], rule_key: str = 'gml', attribute_key: str = 'WLHash', batch_size: int | None = None) Tuple[List[Dict], List[Dict]]
Processes and classifies data in batches. Uses GraphCluster for initial processing and a stratified sampling technique to update templates if there is only one batch and no initial templates are provided.
Parameters: - data (List[Dict]): Data to process. - templates (List[Dict]): Templates for categorization. - rule_key (str): Key to access rule or graph data. - attribute_key (str): Key to access attributes used for filtering. - batch_size (Optional[int]): Size of batches for processing, if not provided, processes all data at once.
Returns: - Tuple[List[Dict], List[Dict]]: The processed data and the potentially updated templates.
- lib_check(data: Dict, templates: List[Dict], rule_key: str = 'gml', attribute_key: str = 'signature', nodeMatch: Callable | None = None, edgeMatch: Callable | None = None) Dict
Checks and classifies a graph or rule based on existing templates using either graph or rule isomorphism.
Parameters: - data (Dict): A dictionary representing a graph or rule with its attributes and classification. - templates (List[Dict]): Dynamic templates used for categorization. If None, initializes to an empty list. - rule_key (str): Key to access the graph or rule data within the dictionary. - attribute_key (str): An attribute used to filter templates before isomorphism check. - nodeMatch (Optional[Callable]): A function to match nodes, defaults to a predefined generic_node_match. - edgeMatch (Optional[Callable]): A function to match edges, defaults to a predefined generic_edge_match.
Returns: - Dict: The updated dictionary with its classification.
High-Throughput Isomorphism
- class synkit.Graph.Matcher.sing.SING(graph: Graph, max_path_length: int = 3, node_att: str | List[str] = ['element', 'charge'], edge_att: str | List[str] | None = 'order')
Bases:
object
Subgraph search In Non-homogeneous Graphs (SING)
A lightweight Python implementation adopting a filter-and-refine strategy with path-based features. This version supports heterogeneous graphs through flexible node and edge attribute selections.
- class synkit.Graph.Matcher.turbo_iso.TurboISO(graph: Graph, node_label: str | List[str] = 'label', edge_label: str | List[str] | None = None, distance_threshold: int = 5000)
Bases:
object
TurboISO with pragmatic speed‑ups for many small queries.
Pre‑indexes the host graph by node‑signature → nodes bucket.
Uses lazy, radius‑bounded BFS instead of a pre‑computed all‑pairs matrix (saving both startup time and memory).
Skips distance consistency if the total candidate pool is already smaller than a configurable threshold (defaults to 5 000).
- search(Q: Graph, prune: bool = False) List[Dict[Any, Any]] | bool
MTG Submodule
- class synkit.Graph.MTG.mtg.MTG(sequences: List[Graph] | List[str], mappings: List[Dict[int, int]] | None = None, *, node_label_names: List[str] | None = None, canonicaliser: GraphCanonicaliser | None = None, mcs_mol: bool = False, mcs: bool = False)
Bases:
object
Fuse a chronological series of ITS graphs into a Mechanistic Transition Graph.
- Parameters:
sequences – A list of ITS-format NetworkX graphs or RSMI strings.
mappings – Optional list of precomputed mappings; computed via MCS if None.
node_label_names – Keys for node-label matching.
canonicaliser – Optional GraphCanonicaliser for snapshot canonicalisation.
- Raises:
ValueError – On invalid sequence or mapping lengths.
RuntimeError – On mapping failures.
- static describe() str
- get_aam(*, directed: bool = False, explicit_h: bool = False) str
- get_compose_its(*, directed: bool = False) Graph
- get_mtg(*, directed: bool = False) Graph
- property k: int
- property node_mapping: Dict[Tuple[int, int], int]
- to_dataframe()
Rule Module
The synkit.Rule
package provides a flexible framework for reaction rule manipulation, composition, and application in retrosynthesis and forward‐prediction workflows. It is organized into three main subpackages:
Compose Build new reaction rules by composing existing ones, supporting both SMARTS‐based and GML workflows.
Apply Apply rules to molecule or reaction graphs for retro‐prediction or forward‐simulation (e.g., in reactor contexts).
Modify Generate artificial rule, edit and adjust rule templates—add or remove explicit hydrogens, adjust contexts, and fine‐tune matching behavior.
- class synkit.Rule.Compose.rule_compose.RuleCompose
Bases:
object
- static filter_smallest_vertex(combo: List[object]) List[object]
Filters and returns the elements from a list that have the smallest number of vertices in their context.
Parameters: - combo (List[object]): A list of objects, each with a ‘context’ attribute that has a ‘numVertices’ attribute.
Returns: - List[object]: A list of objects from the input list that have the minimum number of vertices in their context.
- static rule_cluster(graphs: List) List
Clusters graphs based on their isomorphic relationship and returns a list of graphs, each from a different cluster.
Parameters: - graphs: A list of graph objects.
Returns: - List: A list of graphs where each graph is a representative from a different cluster.
- static save_gml_from_text(gml_content: str, gml_file_path: str, rule_id: str, parent_ids: List[str]) bool
Save a text string to a GML file by modifying the ‘ruleID’ line to include parent rule names. This function parses the given GML content, identifies any lines starting with ‘ruleID’, and replaces these lines with a new ruleID that incorporates identifiers from parent rules.
Parameters: - gml_content (str): The content to be saved to the GML file. This should be the entire textual content of a GML file. - gml_file_path (str): The file path where the GML file should be saved. If the path does not exist or is inaccessible, the function will return False and print an error message. - rule_id (str): The original rule ID from the content. This is the identifier that will be modified to include parent IDs in the new ruleID. - parent_ids (List[str]): List of parent rule IDs to prepend to the original rule ID. These are combined into a new identifier to reflect the hierarchical relationship in rule IDs.
Returns: - bool: True if the file was successfully saved, False otherwise. The function attempts to write the modified GML content to the specified file path.
- class synkit.Rule.Apply.reactor_rule.ReactorRule
Bases:
object
Handles the transformation of SMILES strings to reaction SMILES (RSMI) by applying chemical reaction rules defined in GML strings.
It can optionally reverse the reaction, exclude atom mappings, and include unchanged reagents in the output.
- class synkit.Rule.Modify.molecule_rule.MoleculeRule
Bases:
object
A class for generating molecule rules, atom-mapped SMILES, and GML representations from SMILES strings.
- static generate_atom_map(smiles: str) str | None
Generate atom-mapped SMILES by assigning unique map numbers to each atom in the molecule.
Parameters: - smiles (str): The SMILES string representing the molecule.
Returns: - Optional[str]: The atom-mapped SMILES string, or None if the SMILES string is invalid.
- generate_molecule_rule(smiles: str, name: str = 'molecule', explicit_hydrogen: bool = True, sanitize: bool = True) str | None
Generate a GML representation of the molecule rule from SMILES.
Parameters: - smiles (str): The SMILES string representing the molecule. - name (str, optional): The rule name used in GML generation. Defaults to ‘molecule’. - explicit_hydrogen (bool, optional): Whether to include explicit hydrogen atoms in GML. Defaults to True. - sanitize (bool, optional): Whether to sanitize the molecule before conversion. Defaults to True.
Returns: - Optional[str]: The GML representation of the molecule rule, or None if invalid.
- static generate_molecule_smart(smiles: str) str | None
Generate a SMARTS-like string from atom-mapped SMILES.
Parameters: - smiles (str): The SMILES string representing the molecule.
Returns: - Optional[str]: The SMARTS-like string derived from atom-mapped SMILES, or None if the SMILES is invalid.
- static remove_edges_from_left_right(input_str: str) str
Remove all contents from the ‘left’ and ‘right’ sections of a chemical rule description.
Parameters: - input_str (str): The string representation of the rule.
Returns: - str: The modified string with cleared ‘left’ and ‘right’ sections.
Vis Module
The synkit.Vis
package offers a suite of visualization utilities for both chemical reactions and graph structures, enabling clear interpretation of mechanisms, templates, and network architectures:
RXNVis (
RXNVis
) Render full reaction schemes with mapped atom‐colors, curved arrows, and publication‐quality layouts.RuleVis (
RuleVis
) Display rule templates (SMARTS/GML) as annotated graph transformations, highlighting bond changes.GraphVisualizer (
GraphVisualizer
) General‐purpose NetworkX graph plotting, with support for ITS, MTG, and custom node/edge styling.
- class synkit.Vis.rxn_vis.RXNVis(width: int = 800, height: int = 450, dpi: int = 96, background_colour: Tuple[float, float, float, float] | None = None, highlight_by_reactant: bool = True, bond_line_width: float = 2.0, atom_label_font_size: int = 12, show_atom_map: bool = False)
Bases:
object
- render(smiles: str, return_bytes: bool = False) Image | bytes
Render a molecule or reaction SMILES to a cropped PNG.
Parameters
- smilesstr
Molecule or reaction SMARTS/SMILES. Reactions must contain ‘>>’.
- return_bytesbool
If True, return raw PNG bytes instead of a PIL.Image.
Returns
- PIL.Image.Image or bytes
Cropped image (or raw PNG bytes) of the molecule/reaction.
- class synkit.Vis.rule_vis.RuleVis(backend: str = 'nx')
Bases:
object
- help() None
- mod_vis(gml: str, path: str = './') None
Simple MOD visualization via mod_post CLI.
- nx_vis(input: str | Tuple[Graph, Graph, Graph], sanitize: bool = False, figsize: Tuple[int, int] = (18, 5), orientation: str = 'horizontal', show_titles: bool = True, show_atom_map: bool = False, titles: Tuple[str, str, str] = ('Reactant', 'Imaginary Transition State', 'Product'), add_gridbox: bool = False, rule: bool = False) Figure
Visualize reactants, ITS, and products side-by-side or vertically, with interactive plotting turned off to prevent double-display, and correct handling of matplotlib axes arrays.
- post() None
Generate an external report via the mod_post CLI.
- vis(input: str | Tuple[Graph, Graph, Graph], **kwargs)
Wrapper to select between nx_vis and mod_vis based on backend and input type.
Converts input as needed.
- class synkit.Vis.graph_visualizer.GraphVisualizer(node_attributes: Dict[str, str] | None = None, edge_attributes: Dict[str, str] | None = None)
Bases:
object
High‑level wrapper around Weinbauer’s plotting utilities.
- property edge_attributes: Dict[str, str]
Mapping of edge keys used for RDKit conversion.
- help() None
Print a summary of GraphVisualizer methods and usage.
- property node_attributes: Dict[str, str]
Mapping of node keys used for RDKit conversion.
- plot_as_mol(g: Graph, ax: Axes, use_mol_coords: bool = True, node_color: str = '#FFFFFF', node_size: int = 500, edge_color: str = '#000000', edge_width: float = 2.0, label_color: str = '#000000', font_size: int = 12, show_atom_map: bool = False, bond_char: Dict[int | None, str] | None = None, symbol_key: str = 'element', bond_key: str = 'order', aam_key: str = 'atom_map') None
Core molecular plotting on a given Axes.
- plot_its(its: Graph, ax: Axes, use_mol_coords: bool = True, title: str | None = None, node_color: str = '#FFFFFF', node_size: int = 500, edge_color: str = '#000000', edge_weight: float = 2.0, show_atom_map: bool = False, use_edge_color: bool = False, symbol_key: str = 'element', bond_key: str = 'order', aam_key: str = 'atom_map', standard_order_key: str = 'standard_order', font_size: int = 12, og: bool = False, rule: bool = False, title_font_size: str = 20, title_font_weight: str = 'bold', title_font_style: str = 'italic') None
- save_molecule(g: Graph, path: str, **kwargs) None
Save molecular graph plot to file.
- visualize_its(its: Graph, **kwargs) Figure
Return a Matplotlib Figure plotting the ITS graph without duplicate display.
- visualize_its_grid(its_list: list[Graph], subplot_shape: tuple[int, int] | None = None, use_edge_color: bool = True, og: bool = False, figsize: tuple[float, float] = (12, 6), **kwargs) tuple[Figure, list[list[Axes]]]
Plot multiple ITS graphs in a grid layout.
Parameters
- its_listlist[nx.Graph]
List of ITS graphs to visualize.
- subplot_shapetuple[int, int] | None, optional
Grid shape (rows, cols). If None, determined by list length (supports up to 6).
- use_edge_colorbool, default True
Whether to color edges based on ‘standard_order’.
- ogbool, default False
Flag for original graph mode when coloring.
- figsizetuple[float, float], default (12,6)
Figure size.
- **kwargs
Additional parameters passed to plot_its (e.g. title, show_atom_map).
Returns
- figplt.Figure
The Matplotlib figure containing the grid.
- axeslist of list of plt.Axes
2D list of Axes objects for each subplot.
- visualize_molecule(g: Graph, **kwargs) Figure
Return a Figure plotting the molecular graph.
IO Module
The IO module provides tools for handling input and output operations related to the chemical converter. It allows seamless interaction with various chemical data formats.
Chemical Conversion
- synkit.IO.chem_converter.gml_to_its(gml: str) Graph
Convert a GML string representation of a reaction back into an ITS graph.
- Parameters:
gml (str) – The GML string representing the reaction.
- Returns:
The resulting ITS graph.
- Return type:
networkx.Graph
- synkit.IO.chem_converter.gml_to_smart(gml: str, sanitize: bool = True, explicit_hydrogen: bool = False, useSmiles: bool = True) Tuple[str, Graph]
Convert a GML string back to a SMARTS string and ITS graph.
- Parameters:
gml (str) – The GML string to convert.
sanitize (bool) – Whether to sanitize molecules upon conversion.
explicit_hydrogen (bool) – Whether hydrogens are explicitly represented.
useSmiles (bool) – If True, output SMILES; otherwise SMARTS.
- Returns:
A tuple of (SMARTS string, ITS graph).
- Return type:
tuple of (str, networkx.Graph)
- synkit.IO.chem_converter.graph_to_rsmi(r: Graph, p: Graph, its: Graph | None = None, sanitize: bool = True, explicit_hydrogen: bool = False) str | None
Convert reactant and product graphs into a reaction SMILES string.
- Parameters:
r (networkx.Graph) – Graph representing the reactants.
p (networkx.Graph) – Graph representing the products.
its (networkx.Graph or None) – Imaginary transition state graph. If None, it will be constructed.
sanitize (bool) – Whether to sanitize molecules during conversion.
explicit_hydrogen (bool) – Whether to preserve explicit hydrogens in the SMILES.
- Returns:
Reaction SMILES string in ‘reactants>>products’ format or None on failure.
- Return type:
str or None
- synkit.IO.chem_converter.graph_to_smi(graph: Graph, sanitize: bool = True, preserve_atom_maps: List[int] | None = None) str | None
Convert a NetworkX molecular graph to a SMILES string.
- Parameters:
graph (networkx.Graph) – Graph representation of the molecule. Nodes must carry chemical attributes (e.g. ‘element’, atom maps).
sanitize (bool) – Whether to perform RDKit sanitization on the resulting molecule.
preserve_atom_maps (list of int or None) – List of atom-map numbers for which hydrogens remain explicit.
- Returns:
SMILES string, or None if conversion fails.
- Return type:
str or None
- synkit.IO.chem_converter.its_to_gml(its: Graph, core: bool = True, rule_name: str = 'rule', reindex: bool = True, explicit_hydrogen: bool = False) str
Convert an ITS graph (reaction graph) to GML format.
- Parameters:
its (networkx.Graph) – The input ITS graph representing the reaction.
core (bool) – If True, focus only on the reaction center. Defaults to True.
rule_name (str) – Name of the reaction rule. Defaults to “rule”.
reindex (bool) – If True, reindex graph nodes. Defaults to True.
explicit_hydrogen (bool) – If True, include explicit hydrogens. Defaults to False.
- Returns:
The GML representation of the ITS graph.
- Return type:
str
- synkit.IO.chem_converter.its_to_rsmi(its: Graph, sanitize: bool = True, explicit_hydrogen: bool = False, clean_wildcards: bool = False) str
Convert an ITS graph into a reaction SMILES (rSMI) string.
- Parameters:
its (networkx.Graph) – A fully annotated ITS graph (nodes with atom-map attributes).
sanitize (bool) – If True, sanitize prior to SMILES generation.
explicit_hydrogen (bool) – If True, include explicit hydrogens.
- Returns:
A canonical reaction-SMILES string (‘reactants>agents>products’).
- Return type:
str
- Raises:
ValueError – If graph cannot be decomposed or sanitisation fails.
- synkit.IO.chem_converter.rsmarts_to_rsmi(rsmarts: str) str
Convert a reaction SMARTS to a reaction SMILES string.
- Parameters:
rsmarts (str) – Reaction SMARTS input.
- Returns:
Reaction SMILES string.
- Return type:
str
- Raises:
ValueError – If conversion fails.
- synkit.IO.chem_converter.rsmi_to_graph(rsmi: str, drop_non_aam: bool = True, sanitize: bool = True, use_index_as_atom_map: bool = True, node_attrs: List[str] | None = ['element', 'aromatic', 'hcount', 'charge', 'neighbors', 'atom_map'], edge_attrs: List[str] | None = ['order']) Tuple[Graph | None, Graph | None]
Convert a reaction SMILES (RSMI) into reactant and product graphs.
- Parameters:
rsmi (str) – Reaction SMILES string in “reactants>>products” format.
drop_non_aam (bool) – If True, drop nodes without atom mapping numbers.
light_weight (bool) – If True, create a light-weight graph.
sanitize (bool) – If True, sanitize molecules during conversion.
use_index_as_atom_map (bool) – Whether to use atom indices as atom- map numbers.
- Returns:
A tuple (reactant_graph, product_graph), each a NetworkX graph or None.
- Return type:
tuple of (networkx.Graph or None, networkx.Graph or None)
- synkit.IO.chem_converter.rsmi_to_its(rsmi: str, drop_non_aam: bool = True, sanitize: bool = True, use_index_as_atom_map: bool = True, core: bool = False, node_attrs: List[str] | None = ['element', 'aromatic', 'hcount', 'charge', 'neighbors', 'atom_map'], edge_attrs: List[str] | None = ['order'], explicit_hydrogen: bool = False) Graph
Convert a reaction SMILES (rSMI) to an ITS (Imaginary Transition State) graph.
- Parameters:
rsmi (str) – The reaction SMILES string, optionally containing atom- map labels.
drop_non_aam (bool) – If True, discard any molecular fragments without atom-atom maps.
sanitize (bool) – If True, perform molecule sanitization (valence checks, kekulization).
use_index_as_atom_map (bool) – If True, override atom-map labels by atom indices.
core (bool) – If True, return only the reaction-center subgraph of the ITS.
node_attrs (list[str]) – Node attributes to include in the ITS graph (e.g., element, charge).
edge_attrs (list[str]) – Edge attributes to include in the ITS graph (e.g., order).
explicit_hydrogen (bool) – If True, convert implicit hydrogens to explicit nodes.
- Returns:
A NetworkX graph representing the complete or core ITS.
- Return type:
networkx.Graph
- Raises:
ValueError – If the SMILES string is invalid or graph construction fails.
- synkit.IO.chem_converter.rsmi_to_rsmarts(rsmi: str) str
Convert a mapped reaction SMILES to a reaction SMARTS string.
- Parameters:
rsmi (str) – Reaction SMILES input.
- Returns:
Reaction SMARTS string.
- Return type:
str
- Raises:
ValueError – If conversion fails.
- synkit.IO.chem_converter.smart_to_gml(smart: str, core: bool = True, sanitize: bool = True, rule_name: str = 'rule', reindex: bool = False, explicit_hydrogen: bool = False, useSmiles: bool = True) str
Convert a reaction SMARTS (or SMILES) template into a GML‐encoded DPO rule.
- Parameters:
smart (str) – The reaction SMARTS or SMILES string.
core (bool) – If True, include only the reaction core in the GML. Defaults to True.
sanitize (bool) – If True, sanitize molecules during conversion. Defaults to True.
rule_name (str) – Identifier for the output rule. Defaults to “rule”.
reindex (bool) – If True, reindex graph nodes before exporting. Defaults to False.
explicit_hydrogen (bool) – If True, include explicit hydrogen atoms. Defaults to False.
useSmiles (bool) – If True, treat input as SMILES; if False, as SMARTS. Defaults to True.
- Returns:
The GML representation of the reaction rule.
- Return type:
str
- synkit.IO.chem_converter.smiles_to_graph(smiles: str, drop_non_aam: bool = False, sanitize: bool = True, use_index_as_atom_map: bool = False, node_attrs: List[str] | None = ['element', 'aromatic', 'hcount', 'charge', 'neighbors', 'atom_map'], edge_attrs: List[str] | None = ['order']) Graph | None
Helper function to convert a SMILES string to a NetworkX graph.
- Parameters:
smiles (str) – SMILES representation of the molecule.
drop_non_aam (bool) – Whether to drop nodes without atom mapping numbers.
light_weight (bool) – Whether to create a light-weight graph.
sanitize (bool) – Whether to sanitize the molecule during conversion.
use_index_as_atom_map (bool) – Whether to use atom indices as atom- map numbers.
- Returns:
The NetworkX graph representation, or None if conversion fails.
- Return type:
networkx.Graph or None
- class synkit.IO.mol_to_graph.MolToGraph(node_attrs: List[str] | None = ['element', 'aromatic', 'hcount', 'charge', 'neighbors', 'atom_map'], edge_attrs: List[str] | None = ['order'])
Bases:
object
RDKit → NetworkX helper with attribute selection
This class converts RDKit molecules into NetworkX graphs. The original conversion methods (_create_light_weight_graph, _create_detailed_graph, and mol_to_graph) are preserved for full-featured graph creation. The new transform method builds a NetworkX graph including only a specified subset of node and edge attributes.
- Parameters:
node_attrs (List[str]) – List of node attribute names to retain. If empty or None, all are included.
edge_attrs (List[str]) – List of edge attribute names to retain. If empty or None, all are included.
- static add_partial_charges(mol: Mol) None
Compute and assign Gasteiger charges to all atoms in the molecule.
- Parameters:
mol (Chem.Mol) – The RDKit molecule.
- static get_bond_stereochemistry(bond: Bond) str
Determine the stereochemistry (E/Z) of a double bond.
- Parameters:
bond (Chem.Bond) – The RDKit Bond object.
- Returns:
‘E’, ‘Z’, or ‘N’ for non-stereospecific or non-double bond.
- Return type:
str
- static get_stereochemistry(atom: Atom) str
Determine the stereochemistry (R/S) of a chiral atom.
- Parameters:
atom (Chem.Atom) – The RDKit Atom object.
- Returns:
‘R’, ‘S’, or ‘N’ for non-chiral.
- Return type:
str
- static has_atom_mapping(mol: Mol) bool
Check if any atom in the molecule has an atom mapping number.
- Parameters:
mol (Chem.Mol) – The RDKit molecule.
- Returns:
True if at least one atom has a mapping number.
- Return type:
bool
- classmethod mol_to_graph(mol: Mol, drop_non_aam: bool = False, light_weight: bool = False, use_index_as_atom_map: bool = False) Graph
Convert a molecule to a full-featured NetworkX graph.
- Parameters:
mol (Chem.Mol) – The RDKit molecule to convert.
drop_non_aam (bool) – If True, drop atoms without mapping numbers (requires use_index_as_atom_map=True). Defaults to False.
light_weight (bool) – If True, create a lightweight graph with minimal attributes. Defaults to False.
use_index_as_atom_map (bool) – If True, prefer atom maps as node IDs. Defaults to False.
- Returns:
A NetworkX graph of the molecule with all attributes.
- Return type:
nx.Graph
- static random_atom_mapping(mol: Mol) Mol
Assign random atom mapping numbers to all atoms in the molecule.
- Parameters:
mol (Chem.Mol) – The RDKit molecule.
- Returns:
The molecule with new random atom mapping numbers.
- Return type:
Chem.Mol
- transform(mol: Mol, drop_non_aam: bool = False, use_index_as_atom_map: bool = False) Graph
Build a graph directly from a molecule, including only selected attributes.
- Parameters:
mol (Chem.Mol) – The RDKit molecule to convert.
drop_non_aam (bool) – If True, skips atoms without atom mapping numbers (requires use_index_as_atom_map=True). Defaults to False.
use_index_as_atom_map (bool) – If True, uses atom mapping numbers as node IDs when present; otherwise uses atom index+1. Defaults to False.
- Returns:
A NetworkX graph containing only the specified node and edge attributes.
- Return type:
nx.Graph
- class synkit.IO.graph_to_mol.GraphToMol(node_attributes: Dict[str, str] = {'atom_map': 'atom_map', 'charge': 'charge', 'element': 'element'}, edge_attributes: Dict[str, str] = {'order': 'order'})
Bases:
object
Converts a NetworkX graph representation of a molecule into an RDKit molecule object.
This class reconstructs RDKit molecules from node and edge attributes in a graph, correctly interpreting atom types, charges, mapping numbers, bond orders, and optionally explicit hydrogen counts.
- Parameters:
node_attributes (Dict[str, str]) – Mapping of expected attribute names to node keys in the graph. For example, {“element”: “element”, “charge”: “charge”, “atom_map”: “atom_map”}.
edge_attributes (Dict[str, str]) – Mapping of expected attribute names to edge keys in the graph. For example, {“order”: “order”}.
- static get_bond_type_from_order(order: float) BondType
Converts a numerical bond order into the corresponding RDKit BondType.
- Parameters:
order (float) – The numerical bond order (typically 1, 2, or 3).
- Returns:
The corresponding RDKit bond type (single, double, triple, or aromatic).
- Return type:
Chem.BondType
- graph_to_mol(graph: Graph, ignore_bond_order: bool = False, sanitize: bool = True, use_h_count: bool = False) Mol
Converts a NetworkX graph into an RDKit molecule.
- Parameters:
graph (nx.Graph) – The NetworkX graph representing the molecule.
ignore_bond_order (bool) – If True, all bonds are created as single bonds regardless of edge attributes. Defaults to False.
sanitize (bool) – If True, the resulting RDKit molecule will be sanitized after construction. Defaults to True.
use_h_count (bool) – If True, the ‘hcount’ attribute (if present) will be used to set explicit hydrogen counts on atoms. Defaults to False.
- Returns:
An RDKit molecule constructed from the graph’s nodes and edges.
- Return type:
Chem.Mol
- class synkit.IO.nx_to_gml.NXToGML
Bases:
object
Converts NetworkX graph representations of chemical reactions to GML (Graph Modelling Language) strings. Useful for exporting reaction rules in a standard graph format.
This class provides static methods for converting individual graphs, sets of reaction graphs, and managing charge/attribute changes in the export process.
- static transform(graph_rules: Tuple[Graph, Graph, Graph], rule_name: str = 'Test', reindex: bool = False, attributes: List[str] = ['charge'], explicit_hydrogen: bool = False) str
Processes a triple of reaction graphs to generate a GML string rule, with options for node reindexing and explicit hydrogen expansion.
- Parameters:
graph_rules (tuple[nx.Graph, nx.Graph, nx.Graph]) – Tuple containing (L, R, K) reaction graphs.
rule_name (str) – The rule name to use in the output.
reindex (bool) – Whether to reindex node IDs based on the L graph sequence.
attributes (list[str]) – List of attribute names to check for node changes.
explicit_hydrogen (bool) – Whether to explicitly include hydrogen atoms in the output.
- Returns:
The GML string representing the chemical rule.
- Return type:
str
- class synkit.IO.gml_to_nx.GMLToNX(gml_text: str)
Bases:
object
Parses GML-like text and transforms it into three NetworkX graphs representing the left, right, and context graphs of a chemical reaction step.
- Parameters:
gml_text (str) – The GML-like text to parse.
- Variables:
graphs (dict[str, nx.Graph]) – A dictionary containing ‘left’, ‘right’, and ‘context’ NetworkX graphs.
- transform() Tuple[Graph, Graph, Graph]
Transforms the GML-like text into three NetworkX graphs: left, right, and context.
- Returns:
A tuple of (left_graph, right_graph, context_graph), each a NetworkX graph.
- Return type:
tuple[nx.Graph, nx.Graph, nx.Graph]
IO Functions
- synkit.IO.data_io.collect_data(num_batches: int, temp_dir: str, file_template: str) List[Any]
Collects and aggregates data from multiple pickle files into a single list.
- Parameters:
num_batches (int) – The number of batch files to process.
temp_dir (str) – The directory where the batch files are stored.
file_template (str) – The template string for batch file names, expecting an integer formatter.
- Returns:
A list of aggregated data items from all batch files.
- Return type:
list
- synkit.IO.data_io.load_compressed(filename: str) ndarray
Loads a NumPy array from a compressed .npz file.
- Parameters:
filename (str) – The path of the .npz file to load.
- Returns:
The loaded NumPy array.
- Return type:
numpy.ndarray
- Raises:
KeyError – If the .npz file does not contain an array with the key ‘array’.
- synkit.IO.data_io.load_database(pathname: str = './Data/database.json') List[Dict]
Load a database (a list of dictionaries) from a JSON file.
- Parameters:
pathname (str) – The path from where the database will be loaded. Defaults to ‘./Data/database.json’.
- Returns:
The loaded database.
- Return type:
list[dict]
- Raises:
ValueError – If there is an error reading the file.
- synkit.IO.data_io.load_dg(path: str, graph_db: list, rule_db: list)
Load a DG instance from a dumped file.
- Parameters:
path (str) – The file path of the dumped graph.
graph_db (list) – List of Graph objects representing the graph database.
rule_db (list) – List of Rule objects required for loading the DG.
- Returns:
The loaded derivation graph instance.
- Return type:
DG
- Raises:
Exception – If loading fails.
- synkit.IO.data_io.load_dict_from_json(file_path: str) dict | None
Load a dictionary from a JSON file.
- Parameters:
file_path (str) – The path to the JSON file from which to load the dictionary.
- Returns:
The dictionary loaded from the JSON file, or None if an error occurs.
- Return type:
dict or None
- synkit.IO.data_io.load_from_pickle(filename: str) List[Any]
Load data from a pickle file.
- Parameters:
filename (str) – The name of the pickle file to load data from.
- Returns:
The data loaded from the pickle file.
- Return type:
list
- synkit.IO.data_io.load_from_pickle_generator(file_path: str) Generator[Any, None, None]
A generator that yields items from a pickle file where each pickle load returns a list of dictionaries.
- Parameters:
file_path (str) – The path to the pickle file to load.
- Yields:
A single item from the list of dictionaries stored in the pickle file.
- Return type:
Any
- synkit.IO.data_io.load_gml_as_text(gml_file_path: str) str | None
Load the contents of a GML file as a text string.
- Parameters:
gml_file_path (str) – The file path to the GML file.
- Returns:
The text content of the GML file, or None if the file does not exist or an error occurs.
- Return type:
str or None
- synkit.IO.data_io.load_list_from_file(file_path: str) list
Load a list from a JSON-formatted file.
- Parameters:
file_path (str) – The path to the file to read the list from.
- Returns:
The list loaded from the file.
- Return type:
list
- synkit.IO.data_io.load_model(filename: str) Any
Load a machine learning model from a file using joblib.
- Parameters:
filename (str) – The path to the file from which the model will be loaded.
- Returns:
The loaded machine learning model.
- Return type:
object
- synkit.IO.data_io.save_compressed(array: ndarray, filename: str) None
Saves a NumPy array in a compressed format using .npz extension.
- Parameters:
array (numpy.ndarray) – The NumPy array to be saved.
filename (str) – The file path or name to save the array to, with a ‘.npz’ extension.
- synkit.IO.data_io.save_database(database: List[Dict], pathname: str = './Data/database.json') None
Save a database (a list of dictionaries) to a JSON file.
- Parameters:
database (list[dict]) – The database to be saved.
pathname (str) – The path where the database will be saved. Defaults to ‘./Data/database.json’.
- Raises:
TypeError – If the database is not a list of dictionaries.
ValueError – If there is an error writing the file.
- synkit.IO.data_io.save_dg(dg, path: str) str
Save a DG instance to disk using MØD’s dump method.
- Parameters:
dg (DG) – The derivation graph to save.
path (str) – The file path where the graph will be dumped.
- Returns:
The path of the dumped file.
- Return type:
str
- Raises:
Exception – If saving fails.
- synkit.IO.data_io.save_dict_to_json(data: dict, file_path: str) None
Save a dictionary to a JSON file.
- Parameters:
data (dict) – The dictionary to be saved.
file_path (str) – The path to the file where the dictionary should be saved.
- synkit.IO.data_io.save_list_to_file(data_list: list, file_path: str) None
Save a list to a file in JSON format.
- Parameters:
data_list (list) – The list to save.
file_path (str) – The path to the file where the list will be saved.
- synkit.IO.data_io.save_model(model: Any, filename: str) None
Save a machine learning model to a file using joblib.
- Parameters:
model (object) – The machine learning model to save.
filename (str) – The path to the file where the model will be saved.
- synkit.IO.data_io.save_text_as_gml(gml_text: str, file_path: str) bool
Save a GML text string to a file.
- Parameters:
gml_text (str) – The GML content as a text string.
file_path (str) – The file path where the GML text will be saved.
- Returns:
True if saving was successful, False otherwise.
- Return type:
bool
- synkit.IO.data_io.save_to_pickle(data: List[Dict[str, Any]], filename: str) None
Save a list of dictionaries to a pickle file.
- Parameters:
data (list[dict]) – A list of dictionaries to be saved.
filename (str) – The name of the file where the data will be saved.
- synkit.IO.data_io.collect_data(num_batches: int, temp_dir: str, file_template: str) List[Any]
Collects and aggregates data from multiple pickle files into a single list.
- Parameters:
num_batches (int) – The number of batch files to process.
temp_dir (str) – The directory where the batch files are stored.
file_template (str) – The template string for batch file names, expecting an integer formatter.
- Returns:
A list of aggregated data items from all batch files.
- Return type:
list
- synkit.IO.data_io.load_compressed(filename: str) ndarray
Loads a NumPy array from a compressed .npz file.
- Parameters:
filename (str) – The path of the .npz file to load.
- Returns:
The loaded NumPy array.
- Return type:
numpy.ndarray
- Raises:
KeyError – If the .npz file does not contain an array with the key ‘array’.
- synkit.IO.data_io.load_database(pathname: str = './Data/database.json') List[Dict]
Load a database (a list of dictionaries) from a JSON file.
- Parameters:
pathname (str) – The path from where the database will be loaded. Defaults to ‘./Data/database.json’.
- Returns:
The loaded database.
- Return type:
list[dict]
- Raises:
ValueError – If there is an error reading the file.
- synkit.IO.data_io.load_dg(path: str, graph_db: list, rule_db: list)
Load a DG instance from a dumped file.
- Parameters:
path (str) – The file path of the dumped graph.
graph_db (list) – List of Graph objects representing the graph database.
rule_db (list) – List of Rule objects required for loading the DG.
- Returns:
The loaded derivation graph instance.
- Return type:
DG
- Raises:
Exception – If loading fails.
- synkit.IO.data_io.load_dict_from_json(file_path: str) dict | None
Load a dictionary from a JSON file.
- Parameters:
file_path (str) – The path to the JSON file from which to load the dictionary.
- Returns:
The dictionary loaded from the JSON file, or None if an error occurs.
- Return type:
dict or None
- synkit.IO.data_io.load_from_pickle(filename: str) List[Any]
Load data from a pickle file.
- Parameters:
filename (str) – The name of the pickle file to load data from.
- Returns:
The data loaded from the pickle file.
- Return type:
list
- synkit.IO.data_io.load_from_pickle_generator(file_path: str) Generator[Any, None, None]
A generator that yields items from a pickle file where each pickle load returns a list of dictionaries.
- Parameters:
file_path (str) – The path to the pickle file to load.
- Yields:
A single item from the list of dictionaries stored in the pickle file.
- Return type:
Any
- synkit.IO.data_io.load_gml_as_text(gml_file_path: str) str | None
Load the contents of a GML file as a text string.
- Parameters:
gml_file_path (str) – The file path to the GML file.
- Returns:
The text content of the GML file, or None if the file does not exist or an error occurs.
- Return type:
str or None
- synkit.IO.data_io.load_list_from_file(file_path: str) list
Load a list from a JSON-formatted file.
- Parameters:
file_path (str) – The path to the file to read the list from.
- Returns:
The list loaded from the file.
- Return type:
list
- synkit.IO.data_io.load_model(filename: str) Any
Load a machine learning model from a file using joblib.
- Parameters:
filename (str) – The path to the file from which the model will be loaded.
- Returns:
The loaded machine learning model.
- Return type:
object
- synkit.IO.data_io.save_compressed(array: ndarray, filename: str) None
Saves a NumPy array in a compressed format using .npz extension.
- Parameters:
array (numpy.ndarray) – The NumPy array to be saved.
filename (str) – The file path or name to save the array to, with a ‘.npz’ extension.
- synkit.IO.data_io.save_database(database: List[Dict], pathname: str = './Data/database.json') None
Save a database (a list of dictionaries) to a JSON file.
- Parameters:
database (list[dict]) – The database to be saved.
pathname (str) – The path where the database will be saved. Defaults to ‘./Data/database.json’.
- Raises:
TypeError – If the database is not a list of dictionaries.
ValueError – If there is an error writing the file.
- synkit.IO.data_io.save_dg(dg, path: str) str
Save a DG instance to disk using MØD’s dump method.
- Parameters:
dg (DG) – The derivation graph to save.
path (str) – The file path where the graph will be dumped.
- Returns:
The path of the dumped file.
- Return type:
str
- Raises:
Exception – If saving fails.
- synkit.IO.data_io.save_dict_to_json(data: dict, file_path: str) None
Save a dictionary to a JSON file.
- Parameters:
data (dict) – The dictionary to be saved.
file_path (str) – The path to the file where the dictionary should be saved.
- synkit.IO.data_io.save_list_to_file(data_list: list, file_path: str) None
Save a list to a file in JSON format.
- Parameters:
data_list (list) – The list to save.
file_path (str) – The path to the file where the list will be saved.
- synkit.IO.data_io.save_model(model: Any, filename: str) None
Save a machine learning model to a file using joblib.
- Parameters:
model (object) – The machine learning model to save.
filename (str) – The path to the file where the model will be saved.
- synkit.IO.data_io.save_text_as_gml(gml_text: str, file_path: str) bool
Save a GML text string to a file.
- Parameters:
gml_text (str) – The GML content as a text string.
file_path (str) – The file path where the GML text will be saved.
- Returns:
True if saving was successful, False otherwise.
- Return type:
bool
- synkit.IO.data_io.save_to_pickle(data: List[Dict[str, Any]], filename: str) None
Save a list of dictionaries to a pickle file.
- Parameters:
data (list[dict]) – A list of dictionaries to be saved.
filename (str) – The name of the file where the data will be saved.