API

Chem Module

The Chem module provides tools for handling input and output operations related to the chemical converter. It allows seamless interaction with various chemical data formats.

class synkit.Chem.Reaction.canon_rsmi.CanonRSMI(backend: str = 'wl', wl_iterations: int = 3, morgan_radius: int = 3, node_attrs: List[str] = ('element', 'aromatic', 'charge', 'hcount'))

Bases: object

A pure-Python / pure-NetworkX utility for canonicalizing reaction SMILES by expanding atom-maps and deterministically reindexing reaction graphs.

Workflow

  1. Expand atom-maps on reactants to ensure each atom has a unique map ID.

  2. Convert reaction SMILES to reactant/product NetworkX graphs.

  3. Canonicalize the reactant graph using GraphCanonicaliser (generic or WL backend).

  4. Match atom-map IDs to compute pairwise indices between reactants and products.

  5. Remap the product graph to align with the canonical reactant ordering.

  6. Sync each node’s atom_map attribute to its new graph index.

  7. Reassemble the reaction SMILES from the canonical graphs.

Classes

  • CanonRSMI – Main interface for transforming any reactants>>products SMILES into a canonicalized form, preserving all node and edge attributes.

Example

>>> from canonical_rsm import CanonRSMI
>>> canon = CanonRSMI(backend='wl', wl_iterations=5)
>>> result = canon.canonicalise('[CH3:3][CH2:5][OH:10]>>[CH2:3]=[CH2:5].[OH2:10]')
>>> print(result.canonical_rsmi)
[OH:1][CH2:3][CH3:2]>>[CH2:2]=[CH2:3].[OH2:1]
property canonical_hash: str | None

Reaction-level hash combining reactant and product canonical hashes.

property canonical_product_graph: Graph | None

NetworkX graph of canonicalised products.

property canonical_reactant_graph: Graph | None

NetworkX graph of canonicalised reactants.

property canonical_rsmi: str | None

Canonical SMILES after processing.

canonicalise(rsmi: str) CanonRSMI
Full pipeline returning self with properties populated:
  • raw_rsmi

  • raw_reactant_graph, raw_product_graph

  • mapping_pairs

  • canonical_reactant_graph, canonical_product_graph

  • canonical_rsmi

expand_aam(rsmi: str) str

Assign new atom-map IDs to unmapped reactant atoms in ‘reactants>>products’ SMILES.

New IDs start at max(existing maps)+1.

static get_aam_pairwise_indices(G: Graph, H: Graph, aam_key: str = 'atom_map') List[Tuple[int, int]]

Return sorted list of (G_node, H_node) for shared atom-map IDs.

help() None

Pretty-print the class doc and public methods with signatures.

property mapping_pairs: List[Tuple[int, int]] | None

List of atom-map index pairs between reactants and products.

property raw_product_graph: Graph | None

NetworkX graph of raw products.

property raw_reactant_graph: Graph | None

NetworkX graph of raw reactants.

property raw_rsmi: str | None

Original SMILES before canonicalisation.

static remap_graph(G: Graph, node_map: List[int] | List[Tuple[int, int]]) Graph

Remap a product graph to match a canonical reactant ordering:

Parameters:
  • G (nx.Graph) – reactant graph

  • mapping (dict[int,int]) – mapping from old product node IDs to new IDs

Returns:

remapped product graph

Return type:

nx.Graph

static sync_atom_map_with_index(G: Graph) None

In-place: set each node’s ‘atom_map’ attribute to its node ID.

class synkit.Chem.Reaction.standardize.Standardize

Bases: object

Utilities to normalize and filter reaction and molecule SMILES.

This class provides methods to remove atom‑mapping, filter invalid molecules, canonicalize reaction SMILES, and a full pipeline via fit.

Variables:

None – Stateless helper class.

static categorize_reactions(reactions: List[str], target_reaction: str) Tuple[List[str], List[str]]

Partition reactions into those matching a target and those not.

Parameters:
  • reactions (List[str]) – List of reaction SMILES to categorize.

  • target_reaction (str) – Benchmark reaction SMILES for comparison.

Returns:

Tuple of (matches, non_matches): - matches: reactions equal to standardized target - non_matches: all others

Return type:

Tuple[List[str], List[str]]

static filter_valid_molecules(smiles_list: List[str]) List[Mol]

Filter and sanitize a list of SMILES, returning only valid Mol objects.

Parameters:

smiles_list (List[str]) – List of SMILES strings to validate.

Returns:

List of sanitized RDKit Mol objects.

Return type:

List[rdkit.Chem.Mol]

fit(rsmi: str, remove_aam: bool = True, ignore_stereo: bool = True) str | None

Full standardization pipeline: strip atom‑mapping, normalize SMILES, fix hydrogen notation.

Parameters:
  • rsmi (str) – Reaction SMILES to process.

  • remove_aam (bool) – If True, remove atom‑mapping annotations. Defaults to True.

  • ignore_stereo (bool) – If True, drop stereochemistry. Defaults to True.

Returns:

The standardized reaction SMILES, or None if standardization fails.

Return type:

Optional[str]

static remove_atom_mapping(reaction_smiles: str, symbol: str = '>>') str

Remove atom‑map numbers from a reaction SMILES string.

Parameters:
  • reaction_smiles (str) – Reaction SMILES with atom maps, e.g. ‘C[CH3:1]>>C’.

  • symbol (str) – Separator between reactants and products. Defaults to ‘>>’.

Returns:

Reaction SMILES without atom‑mapping annotations.

Return type:

str

Raises:

ValueError – If the input format is invalid or contains invalid SMILES.

static standardize_rsmi(rsmi: str, stereo: bool = False) str | None

Normalize a reaction SMILES: validate molecules, sort fragments, optionally keep stereo.

Parameters:
  • rsmi (str) – Reaction SMILES in ‘reactants>>products’ format.

  • stereo (bool) – If True, include stereochemistry in the output. Defaults to False.

Returns:

Standardized reaction SMILES or None if no valid molecules remain.

Return type:

Optional[str]

Raises:

ValueError – If the input format is invalid.

class synkit.Chem.Reaction.aam_validator.AAMValidator

Bases: object

A utility class for validating atom‐atom mappings (AAM) in reaction SMILES.

Provides methods to compare mapped SMILES against ground truth by using reaction‐center (RC) or ITS‐graph isomorphism checks, including tautomer enumeration support and batch validation over tabular data.

Quick start

>>> from synkit.Chem.Reaction import AAMValidator
>>> validator = AAMValidator()
>>> rsmi_1 = (
   '[CH3:1][C:2](=[O:3])[OH:4].[CH3:5][OH:6]'
   '>>'
   '[CH3:1][C:2](=[O:3])[O:6][CH3:5].[OH2:4]')
>>> rsmi_2 = (
   '[CH3:5][C:1](=[O:2])[OH:3].[CH3:6][OH:4]'
   '>>'
   '[CH3:5][C:1](=[O:2])[O:4][CH3:6].[OH2:3]')
>>> is_eq = validator.smiles_check(rsmi_1, rsmi_2, check_method='ITS')
>>> print(is_eq)
>>> True
static check_equivariant_graph(its_graphs: List[Graph]) Tuple[List[Tuple[int, int]], int]

Identify all pairs of isomorphic ITS graphs.

Parameters:

its_graphs (list of networkx.Graph) – A list of ITS graphs to compare.

Returns:

  • A list of index‐pairs (i, j) where its_graphs[i] is isomorphic to its_graphs[j].

  • The total count of such isomorphic pairs.

Return type:

tuple (list of tuple of int, int, int)

static check_pair(mapping: Dict[str, str], mapped_col: str, ground_truth_col: str, check_method: str = 'RC', ignore_aromaticity: bool = False, ignore_tautomers: bool = True) bool

Validate a single record (dict) entry for equivalence.

Parameters:
  • mapping (dict of str→str) – A record containing both mapped and ground‐truth SMILES.

  • mapped_col (str) – Key for the mapped SMILES in mapping.

  • ground_truth_col (str) – Key for the ground-truth SMILES in mapping.

  • check_method (str) – “RC” or “ITS”.

  • ignore_aromaticity (bool) – If True, ignore aromaticity in ITS construction.

  • ignore_tautomers (bool) – If True, skip tautomer enumeration.

Returns:

Validation result for this single pair.

Return type:

bool

static smiles_check(mapped_smile: str, ground_truth: str, check_method: str = 'RC', ignore_aromaticity: bool = False) bool

Validate a single mapped SMILES string against ground truth.

Parameters:
  • mapped_smile (str) – The mapped SMILES to validate.

  • ground_truth (str) – The reference SMILES string.

  • check_method (str) – Which method to use: “RC” for reaction‐center graph or “ITS” for full ITS‐graph isomorphism.

  • ignore_aromaticity (bool) – If True, ignore aromaticity differences in ITS construction.

Returns:

True if exactly one isomorphic match is found; False otherwise.

Return type:

bool

static smiles_check_tautomer(mapped_smile: str, ground_truth: str, check_method: str = 'RC', ignore_aromaticity: bool = False) bool | None

Validate against all tautomers of a ground truth SMILES.

Parameters:
  • mapped_smile (str) – The mapped SMILES to test.

  • ground_truth (str) – The reference SMILES for generating tautomers.

  • check_method (str) – “RC” or “ITS” as in smiles_check.

  • ignore_aromaticity (bool) – If True, ignore aromaticity in ITS construction.

Returns:

  • True if any tautomer matches.

  • False if none match.

  • None if an error occurs.

Return type:

bool or None

static validate_smiles(data: DataFrame | List[Dict[str, str]], ground_truth_col: str = 'ground_truth', mapped_cols: List[str] = ['rxn_mapper', 'graphormer', 'local_mapper'], check_method: str = 'RC', ignore_aromaticity: bool = False, n_jobs: int = 1, verbose: int = 0, ignore_tautomers: bool = True) List[Dict[str, str | float | List[bool]]]

Batch-validate mapped SMILES in tabular or list-of-dicts form.

Parameters:
  • data (pandas.DataFrame or list of dict) – A pandas DataFrame or list of dicts, each row containing at least ground_truth_col and each entry in mapped_cols.

  • ground_truth_col (str) – Column/key name for the ground-truth SMILES.

  • mapped_cols (list of str) – List of column/key names for mapped SMILES to validate.

  • check_method (str) – “RC” or “ITS” validation method.

  • ignore_aromaticity (bool) – If True, ignore aromaticity in ITS construction.

  • n_jobs (int) – Number of parallel jobs to use (joblib).

  • verbose (int) – Verbosity level for parallel execution.

  • ignore_tautomers (bool) – If True, use simple pairwise check; otherwise enumerate tautomers.

Returns:

A list of dicts, one per mapper, with keys: - “mapper”: the mapper name - “accuracy”: percentage correct (float) - “results”: list of individual bool results - “success_rate”: mapping success rate metric

Return type:

list of dict

Raises:

ValueError – If data is not a DataFrame or list of dicts.

class synkit.Chem.Reaction.balance_check.BalanceReactionCheck(n_jobs: int = 4, verbose: int = 0)

Bases: object

Check elemental balance of chemical reactions in SMILES format.

Supports checking single reactions, reaction dictionaries, or lists in parallel.

Variables:
  • n_jobs – Number of parallel jobs for batch checking.

  • verbose – Verbosity level for joblib.

static dict_balance_check(reaction_dict: Dict[str, str], rsmi_column: str) Dict[str, Any]

Check balance for a single reaction dict, preserving original keys.

Parameters:
  • reaction_dict (Dict[str, str]) – Dict containing at least a rsmi_column key.

  • rsmi_column (str) – Key for reaction SMILES in reaction_dict.

Returns:

Original dict augmented with “balanced”: bool.

Return type:

Dict[str, Any]

dicts_balance_check(input_data: str | List[str | Dict[str, str]], rsmi_column: str = 'reactions') Tuple[List[Dict[str, Any]], List[Dict[str, Any]]]

Batch‐check balance for multiple reactions, in parallel.

Parameters:
  • input_data (Union[str, List[Union[str, Dict[str, str]]]]) – Single reaction SMILES, list of SMILES, or list of dicts.

  • rsmi_column (str) – Key for reaction SMILES in each dict. Defaults to “reactions”.

Returns:

Tuple (balanced_list, unbalanced_list) of dicts each including “balanced”.

Return type:

Tuple[List[Dict[str, Any]], List[Dict[str, Any]]]

static get_combined_molecular_formula(smiles: str) str

Compute the molecular formula of a SMILES.

Parameters:

smiles (str) – SMILES string of the molecule.

Returns:

Elemental formula (e.g., “C6H6”) or empty string if invalid.

Return type:

str

static parse_input(input_data: str | List[str | Dict[str, str]], rsmi_column: str = 'reactions') List[Dict[str, str]]

Normalize input into a list of reaction‐dicts.

Parameters:
  • input_data (str or List[Union[str, Dict[str, str]]]) – A single SMILES, list of SMILES, or list of dicts containing rsmi_column.

  • rsmi_column (str) – Key in dicts for the reaction SMILES. Defaults to “reactions”.

Returns:

List of dicts with a single key rsmi_column mapping to each reaction.

Return type:

List[Dict[str, str]]

Raises:

ValueError – If input_data is neither str nor list.

static parse_reaction(reaction_smiles: str) Tuple[str, str]

Split a reaction SMILES into reactant and product SMILES strings.

Parameters:

reaction_smiles (str) – Reaction SMILES in ‘reactants>>products’ format.

Returns:

Tuple of (reactants, products) SMILES.

Return type:

Tuple[str, str]

static rsmi_balance_check(reaction_smiles: str) bool

Determine if a reaction SMILES is elementally balanced.

Parameters:

reaction_smiles (str) – Reaction SMILES in ‘reactants>>products’ format.

Returns:

True if reactant and product formulas match, else False.

Return type:

bool

class synkit.Chem.Fingerprint.fp_calculator.FPCalculator(n_jobs: int = 1, verbose: int = 0)

Bases: object

Calculate fingerprint vectors for chemical reactions represented by SMILES strings.

Variables:
  • fps (TransformationFP) – Shared fingerprint engine instance.

  • VALID_FP_TYPES (List[str]) – Supported fingerprint type identifiers.

Parameters:
  • n_jobs (int) – Number of parallel jobs to use for batch processing.

  • verbose (int) – Verbosity level for parallel execution.

VALID_FP_TYPES: List[str] = ['drfp', 'avalon', 'maccs', 'torsion', 'pharm2D', 'ecfp2', 'ecfp4', 'ecfp6', 'fcfp2', 'fcfp4', 'fcfp6', 'rdk5', 'rdk6', 'rdk7', 'ap']
static dict_process(data_dict: Dict[str, Any], rsmi_key: str, symbol: str = '>>', fp_type: str = 'ecfp4', absolute: bool = True) Dict[str, Any]

Compute a fingerprint for a single reaction SMILES entry and add it to the dict.

Parameters:
  • data_dict (dict) – Dictionary containing reaction data.

  • rsmi_key (str) – Key in data_dict for the reaction SMILES string.

  • symbol (str) – Delimiter between reactant and product in the SMILES.

  • fp_type (str) – Fingerprint type to compute.

  • absolute (bool) – Whether to take absolute values of the fingerprint difference.

Returns:

The input dictionary with a new key fp_{fp_type} holding the fingerprint vector.

Return type:

dict

Raises:

ValueError – If rsmi_key is missing in data_dict.

fps: TransformationFP = <TransformationFP>
help() None

Print details about supported fingerprint types and usage.

Returns:

None

Return type:

NoneType

parallel_process(data_dicts: List[Dict[str, Any]], rsmi_key: str, symbol: str = '>>', fp_type: str = 'ecfp4', absolute: bool = True) List[Dict[str, Any]]

Compute fingerprints for a batch of reaction dictionaries in parallel.

Parameters:
  • data_dicts (list of dict) – List of dictionaries, each containing a reaction SMILES.

  • rsmi_key (str) – Key in each dict for the reaction SMILES string.

  • symbol (str) – Delimiter between reactant and product in the SMILES.

  • fp_type (str) – Fingerprint type to compute.

  • absolute (bool) – Whether to take absolute values of the fingerprint difference.

Returns:

A list of dictionaries augmented with fp_{fp_type} entries.

Return type:

list of dict

Raises:

ValueError – If fp_type is unsupported or any dict is missing rsmi_key.

class synkit.Chem.Cluster.butina.ButinaCluster

Bases: object

Cluster chemical fingerprint vectors using the Butina algorithm from RDKit, with integrated t-SNE visualization of clusters.

Key features

  • Butina clustering – fast hierarchical clustering with a similarity cutoff.

  • t-SNE visualization – 2D embedding of fingerprints, highlighting top‑k clusters.

  • NumPy support – accepts 2D arrays of 0/1 fingerprint data.

  • Configurable – user‑defined cutoff, perplexity, and top‑k highlight.

Quick start

>>> from synkit.Chem.Fingerprint.fingerprint_clusterer import ButinaCluster
>>> clusters = ButinaCluster.cluster(arr, cutoff=0.3)
>>> ButinaCluster.visualize(arr, clusters, k=5)
static cluster(arr: ndarray, cutoff: float = 0.2) List[List[int]]

Perform Butina clustering on fingerprint bit-vectors.

Parameters:
  • arr (np.ndarray) – 2D array of shape (n_samples, n_bits) with 0/1 dtype.

  • cutoff (float) – Distance cutoff (1 – similarity) to form clusters. Defaults to 0.2.

Returns:

List of clusters, each a list of sample indices.

Return type:

list of list of int

help() None

Print usage summary for clustering and visualization.

Returns:

None

Return type:

NoneType

static visualize(arr: ndarray, clusters: List[List[int]], k: int | None = None, perplexity: float = 30.0, random_state: int = 42) None

Visualize clusters in 2D via t-SNE embedding.

Parameters:
  • arr (np.ndarray) – 2D array of shape (n_samples, n_features) with fingerprint data.

  • clusters (list of list of int) – Clusters as returned by cluster().

  • k (int or None) – If provided, highlight only the top‑k largest clusters; others shown as ‘Other’.

  • perplexity (float) – t-SNE perplexity parameter. Defaults to 30.0.

  • random_state (int) – Random seed for reproducibility. Defaults to 42.

Returns:

None

Return type:

NoneType

Example:

>>> clusters = ButinaCluster.cluster(arr, cutoff=0.3)
>>> ButinaCluster.visualize(arr, clusters, k=5)

Synthesis Module

class synkit.Synthesis.Reactor.syn_reactor.SynReactor(substrate: str | Graph | SynGraph, template: str | Graph | SynRule, invert: bool = False, canonicaliser: GraphCanonicaliser | None = None, explicit_h: bool = True, implicit_temp: bool = False, strategy: Strategy | str = Strategy.ALL, partial: bool = False, embed_threshold: int | None = None, embed_pre_filter: bool = False, automorphism: bool = False)

Bases: object

A hardened and typed re-write of the original SynReactor, preserving API compatibility while offering safer, faster, and cleaner behavior.

Parameters:
  • substrate (Union[str, nx.Graph, SynGraph]) – The input reaction substrate, as a SMILES string, a raw NetworkX graph, or a SynGraph.

  • template (Union[str, nx.Graph, SynRule]) – Reaction template, provided as SMILES/SMARTS, a raw NetworkX graph, or a SynRule.

  • invert (bool) – Whether to invert the reaction (predict precursors). Defaults to False.

  • canonicaliser (Optional[GraphCanonicaliser]) – Optional canonicaliser for intermediate graphs. If None, a default GraphCanonicaliser is used.

  • explicit_h (bool) – If True, render all hydrogens explicitly in the reaction-center SMARTS. Defaults to True.

  • implicit_temp (bool) – If True, treat the input template as implicit-H (forces explicit_h=False). Defaults to False.

  • strategy (Strategy or str) – Matching strategy, one of Strategy.ALL, ‘comp’, or ‘bt’. Defaults to Strategy.ALL.

  • partial (bool) – If True, use a partial matching fallback. Defaults to False.

Variables:
  • _graph (Optional[SynGraph]) – Cached SynGraph for the substrate.

  • _rule (Optional[SynRule]) – Cached SynRule for the template.

  • _mappings (Optional[List[MappingDict]]) – Cached list of subgraph-mapping dicts.

  • _its (Optional[List[nx.Graph]]) – Cached list of ITS graphs.

  • _smarts (Optional[List[str]]) – Cached list of SMARTS strings.

  • _flag_pattern_has_explicit_H (bool) – Internal flag indicating explicit-H constraints.

automorphism: bool = False
canonicaliser: GraphCanonicaliser | None = None
embed_pre_filter: bool = False
embed_threshold: int | None = None
explicit_h: bool = True
classmethod from_smiles(smiles: str, template: str | Graph | SynRule, *, invert: bool = False, canonicaliser: GraphCanonicaliser | None = None, explicit_h: bool = True, implicit_temp: bool = False, strategy: Strategy | str = Strategy.ALL) SynReactor

Alternate constructor: build a SynReactor directly from SMILES.

Parameters:
  • smiles (str) – SMILES string for the substrate.

  • template (str or networkx.Graph or SynRule) – Reaction template (SMILES/SMARTS string, Graph, or SynRule).

  • invert (bool) – If True, perform backward prediction (target→precursors). Defaults to False (forward prediction).

  • canonicaliser (GraphCanonicaliser or None) – Optional GraphCanonicaliser to use for internal graphs.

  • explicit_h (bool) – If True, keep explicit hydrogens in the reaction center.

  • implicit_temp (bool) – If True, treat the template as implicit-H (forces explicit_h=False).

  • strategy (Strategy or str) – Matching strategy: ALL, ‘comp’, or ‘bt’. Defaults to ALL.

Returns:

A new SynReactor instance.

Return type:

SynReactor

property graph: SynGraph

Lazily wrap the substrate into a SynGraph.

Returns:

The reaction substrate as a SynGraph.

Return type:

SynGraph

help(print_results=False) None
implicit_temp: bool = False
invert: bool = False
property its
property its_list: List[Graph]

Build ITS graphs for each subgraph mapping.

Returns:

A list of ITS (Internal Transition State) graphs.

Return type:

list of networkx.Graph

property mapping_count

Number of mappings

property mappings: List[Dict[Any, Any]]

Return unique sub‑graph mappings, optionally pruned via automorphisms.

partial: bool = False
property rule: SynRule

Lazily wrap the template into a SynRule.

Returns:

The reaction template as a SynRule.

Return type:

SynRule

property smarts
property smarts_list: List[str]

Serialise each ITS graph to a reaction-SMARTS string.

Returns:

A list of SMARTS strings (inverted if invert=True).

Return type:

list of str

property smiles_list
strategy: Strategy | str = 'all'
substrate: str | Graph | SynGraph
property substrate_smiles
template: str | Graph | SynRule
class synkit.Synthesis.Reactor.mod_reactor.MODReactor(substrate: str | List[str], rule_file: str | Path, *, invert: bool = False, strategy: str | Strategy = Strategy.BACKTRACK, verbosity: int = 0, print_results: bool = False)

Bases: object

Lazy, ergonomic wrapper around the MØD toolkit’s derivation pipeline.

Workflow

  1. Instantiate: give substrate SMILES and a rule GML (path or string).

  2. Call .run() to execute the reaction strategy.

  3. Inspect results via .get_reaction_smiles(), .product_sets, .get_dg(), etc.

Attributes

initial_smilesList[str]

List of SMILES strings for reactants (or products, if inverted).

rule_filePath

Filesystem path or raw GML string or raw smart with AAM for the reaction rule.

invertbool

If True, apply the rule in reverse (products → reactants).

strategyStrategy

One of ALL, COMPONENT, or BACKTRACK.

verbosityint

Verbosity level for the MØD DG.apply() call.

print_resultsbool

If True, prints the derivation graph to stdout.

property dg: None

DG or None – cached derivation graph.

See also

get_dg

static generate_reaction_smiles(temp_results: List[List[str]], base_smiles: str, *, invert: bool = False, arrow: str = '>>', separator: str = '.') List[str]

Build reaction SMILES of the form “A>>B”, where A and B swap roles if invert=True.

Parameters

temp_resultsList[List[str]]

Batches of product (or reactant) SMILES.

base_smilesstr

The “other side” of the reaction: the reactant side when invert=False, or the product side when invert=True.

invertbool

If False, generates “base_smiles>>joined_batch”; if True, generates “joined_batch>>base_smiles”.

arrowstr

The reaction arrow to use (default “>>”).

separatorstr

How to join multiple SMILES in a batch (default “.”).

Returns

List[str]

Reaction SMILES strings, one per batch.

get_dg() None

Access the underlying derivation graph.

Returns

DG

The MØD derivation graph constructed during .run().

Raises

RuntimeError

If .run() has not yet been called.

get_reaction_smiles() List[str]

Retrieve the reaction SMILES strings (lazy).

Returns

List[str]

List of reaction SMILES, in “A>>B” format.

help() None

Print a one-page summary of reactor configuration and results.

property prediction_count: int

Number of distinct prediction batches generated.

property product_sets: List[List[str]]

Raw product sets (lists of SMILES) before joining into full reactions.

property product_smiles: List[str]

Flattened list of all product SMILES (may contain duplicates).

property reaction_smiles: List[str]

Lazy-loaded reaction SMILES strings of form “A>>B”.

Returns

List[str]

run() MODReactor

Execute the chosen strategy once and return self so you can chain:

`python r = MODReactor(...).run() smiles = r.get_reaction_smiles() `

property temp_results: List[List[str]]

Lazy-loaded raw product lists.

Returns

List[List[str]]

class synkit.Synthesis.Reactor.mod_aam.MODAAM(substrate: str | List[str], rule_file: str | Path, *, invert: bool = False, strategy: str | Strategy = Strategy.BACKTRACK, verbosity: int = 0, print_results: bool = False, check_isomorphic: bool = True)

Bases: object

Runs MØD (via MODReactor) then a full AAM/ITS post-processing pipeline.

Parameters

substrateUnion[str, List[str]]

Dot-delimited SMILES or list of SMILES for reactants.

rule_fileUnion[str, Path]

GML rule file path or raw GML/SMARTS string.

invertbool, optional

If True, apply the rule in reverse (default False).

strategyUnion[str, Strategy], optional

Matching strategy: ALL, COMPONENT, or BACKTRACK (default BACKTRACK).

verbosityint, optional

Verbosity for MODReactor (default 0).

print_resultsbool, optional

If True, print the derivation graph (default False).

check_isomorphicbool, optional

If True, deduplicate results by isomorphism (default True).

property dg: Any

The MØD derivation graph (DG).

get_reaction_smiles() List[str]

Alias for accessing the processed reaction SMILES.

get_smarts() List[str]

Synonym for .get_reaction_smiles().

help() None

Print a summary of inputs and outputs.

property product_count: int

Number of product SMILES generated.

property reaction_smiles: List[str]

The post-processed reaction SMILES.

run() List[str]

Re-run the entire pipeline (MØD + AAM) and return fresh results.

synkit.Synthesis.Reactor.mod_aam.expand_aam(rsmi: str, rule: str) List[str]

Expand Atom–Atom Mapping (AAM) for a given reaction SMARTS/SMILES (rsmi) using a pre‐sanitized GML rule string.

Parameters

rsmistr

Reaction SMILES/SMARTS in ‘reactants>>products’ form.

rulestr

A GML rule string (already sanitized upstream).

Returns

List[str]

All reaction SMILES from MODAAM whose standardized form matches rsmi.

Graph Module

ITS Submodule

The ITS submodule provides tools for constructing, decomposing, and validating ITS (input-transformation-output) graphs.

  • its_construction: Functions for constructing an ITS graph.

  • its_decompose: Functions for decomposing an ITS graph and extracting reaction center.

  • its_expand: Functions for expanding partial ITS graphs into full ITS graphs.

class synkit.Graph.ITS.its_construction.ITSConstruction

Bases: object

CORE_EDGE_DEFAULTS: Dict[str, Any] = {'bond_type': '', 'conjugated': False, 'ez_isomer': '', 'in_ring': False, 'order': 0.0}
CORE_NODE_DEFAULTS: Dict[str, Any] = {'aromatic': False, 'atom_map': 0, 'charge': 0, 'element': '*', 'hcount': 0, 'neighbors': <function ITSConstruction.<lambda>>}
static ITSGraph(G: Graph, H: Graph, ignore_aromaticity: bool = False, attributes_defaults: Dict[str, Any] | None = None, balance_its: bool = False, store: bool = False) Graph

Backward-compatible wrapper that replicates the original ITSGraph signature while delegating to the improved construct implementation.

Parameters:
  • G (nx.Graph) – The first input graph (reactant).

  • H (nx.Graph) – The second input graph (product).

  • ignore_aromaticity (bool) – If True, small order differences are treated as zero.

  • attributes_defaults (dict[str, Any] or None) – Defaults to use when a node attribute is missing.

  • balance_its (bool) – If True, base selection is balanced toward the smaller graph.

  • store (bool) – If True, keep full per-attribute tuples; if False, keep only G-side values.

Returns:

Constructed ITS graph with legacy node attribute ordering.

Return type:

nx.Graph

static construct(G: Graph, H: Graph, *, ignore_aromaticity: bool = False, balance_its: bool = True, store: bool = True, node_attrs: List[str] | None = None, edge_attrs: List[str] | None = None, attributes_defaults: Dict[str, Any] | None = None) Graph

Construct an ITS graph by merging nodes and edges from G and H, preserving nodes present only in one graph and filling missing-side attributes with defaults.

Node-level attributes are always reflected in typesGH as ((G_tuple), (H_tuple)) over node_attrs. If store=True, the individual attributes are stored as (G_value, H_value) tuples under their own keys; if store=False, only the G-side value is stored under each attribute key.

Parameters:
  • G (nx.Graph) – The first input graph (reactant-like).

  • H (nx.Graph) – The second input graph (product-like).

  • ignore_aromaticity (bool) – If True, small differences in bond order (<1) are treated as zero.

  • balance_its (bool) – If True, choose the smaller graph (by node count) as the base; otherwise the larger.

  • store (bool) – If True, keep per-attribute (G,H) tuples; if False, keep only the G-side value per attribute.

  • node_attrs (list[str] or None) – Ordered list of node attribute names to include in the node-level typesGH tuples. Defaults to [“element”, “aromatic”, “hcount”, “charge”, “neighbors”].

  • edge_attrs (list[str] or None) – (Legacy) ordered list of edge attribute names; not used for core behavior.

  • attributes_defaults (dict[str, Any] or None) – Optional overrides for default node attribute values.

Returns:

ITS graph with merged node and edge annotations, including typesGH, order, and standard_order.

Return type:

nx.Graph

static get_node_attribute(graph: Graph, node: Hashable, attribute: str, default: Any) Any

Retrieve a node attribute or return a default if missing.

static get_node_attributes_with_defaults(graph: Graph, node: Hashable, attributes_defaults: Dict[str, Any] = None) Tuple

Retrieve multiple node attributes applying provided simple defaults.

static typesGH_info(node_attrs: List[str] | None = None, edge_attrs: List[str] | None = None) Dict[str, Dict[str, Tuple[type, Any]]]

Provide expected types and default values for interpreting typesGH tuples.

Parameters:
  • node_attrs (list[str] or None) – List of node attributes used in the node-level typesGH.

  • edge_attrs (list[str] or None) – List of edge attributes used in the edge-level typesGH.

Returns:

Nested dict describing (type, default) for each selected attribute.

Return type:

dict[str, dict[str, tuple[type, Any]]]

synkit.Graph.ITS.its_decompose.get_rc(ITS: Graph, element_key: List[str] = ['element', 'charge', 'typesGH', 'atom_map'], bond_key: str = 'order', standard_key: str = 'standard_order', disconnected: bool = False, keep_mtg: bool = False) Graph

Extract the reaction-center (RC) subgraph from an ITS graph.

synkit.Graph.ITS.its_decompose.its_decompose(its_graph: Graph, nodes_share='typesGH', edges_share='order')

Decompose an ITS graph into two separate reactant (G) and product (H) graphs.

Nodes and edges in its_graph carry composite attributes:
  • Each node has its_graph.nodes[nodes_share] = (node_attrs_G, node_attrs_H).

  • Each edge has its_graph.edges[edges_share] = (order_G, order_H).

This function splits those tuples to reconstruct the original G and H graphs.

Parameters:
  • its_graph (nx.Graph) – The ITS graph with composite node/edge attributes.

  • nodes_share (str) – Node attribute key storing (G_attrs, H_attrs) tuples.

  • edges_share (str) – Edge attribute key storing (order_G, order_H) tuples.

Returns:

A tuple of two graphs (G, H) reconstructed from the ITS.

Return type:

Tuple[nx.Graph, nx.Graph]

Example:

>>> its = nx.Graph()
>>> # ... set its.nodes[n]['typesGH'] and its.edges[e]['order'] ...
>>> G, H = its_decompose(its)
>>> isinstance(G, nx.Graph) and isinstance(H, nx.Graph)
True
class synkit.Graph.ITS.its_expand.ITSExpand

Bases: object

Partially expand a reaction SMILES (RSMI) by reconstructing intermediate transition states (ITS) and applying transformation rules based on the reaction center graph.

This class identifies the reaction center from an RSMI, builds and reconstructs the ITS graph, decomposes it back into reactants and products, and standardizes atom mappings to produce a fully mapped AAM RSMI.

Variables:

std – Standardize instance for reaction SMILES standardization.

static expand_aam_with_its(rsmi: str, relabel: bool = False, use_G: bool = True) str

Expand a partial reaction SMILES to a full AAM RSMI using ITS reconstruction.

Parameters:
  • rsmi (str) – Reaction SMILES string in the format ‘reactant>>product’.

  • use_G (bool) – If True, expand using the reactant side; otherwise use the product side.

  • light_weight (bool) – Flag indicating whether to apply a lighter-weight standardization.

Returns:

Fully atom-mapped reaction SMILES after ITS expansion and standardization.

Return type:

str

Raises:

ValueError – If input RSMI format is invalid or ITS reconstruction fails.

Example:

>>> expander = ITSExpand()
>>> expander.expand_aam_with_its("CC[CH2:3][Cl:1].[N:2]>>CC[CH2:3][N:2].[Cl:1]")
'[CH3:1][CH2:2][CH2:3][Cl:4].[N:5]>>[CH3:1][CH2:2][CH2:3][N:5].[Cl:4]'

Matcher Submodule

The synkit.Graph.Matcher package provides comprehensive tools for graph comparison, subgraph search, and clustering. It is organized into four main areas:

  • Matching Engines Perform graph‐to‐graph and subgraph isomorphism checks: - GraphMatcherEngine - SubgraphSearchEngine

  • Single-Graph Clustering Cluster a single graph’s nodes or components: - graph_cluster

  • Batch Clustering Process and cluster multiple graphs in parallel: - batch_cluster

  • High-Throughput Isomorphism Specialized routines for multi-pattern searches in a host graph: - sing - turbo_iso

Matching Engines

class synkit.Graph.Matcher.graph_matcher.GraphMatcherEngine(*, backend: str = 'nx', node_attrs: List[str] | None = None, edge_attrs: List[str] | None = None, wl1_filter: bool = False, max_mappings: int | None = 1)

Bases: object

Reusable engine for (sub‑)graph isomorphism checks & embeddings.

Parameters
backend:
  • "nx" (default) – pure‑Python implementation that relies on

GraphMatcher. * "rule" – optional, requires the third‑party mod package.

node_attrs, edge_attrs:

Lists of attribute keys that must match exactly between candidate nodes/edges. hcount is treated specially – the host must be the pattern (to allow aggregated counts).

wl1_filter:

If True, a fast WL‑based colour refinement pre‑filter discards host graphs that cannot possibly contain the pattern.

max_mappings:

Upper bound on the number of mappings to enumerate in get_mappings(). None means “no limit”.

static available_backends() List[str]
get_mappings(host: Any, pattern: Any) List[Dict[int, int]]
help() str
isomorphic(obj1: Any, obj2: Any) bool
class synkit.Graph.Matcher.subgraph_matcher.SubgraphMatch

Bases: object

Boolean-only checks for graph isomorphism and subgraph (induced or monomorphic) matching.

Provides static methods for NetworkX-based checks and optional GML “rule” backend.

static is_subgraph(pattern: Graph | str, host: Graph | str, node_label_names: List[str] = ['element', 'charge'], node_label_default: List[Any] = ['*', 0], edge_attribute: str = 'order', use_filter: bool = False, check_type: str = 'induced', backend: str = 'nx') bool

Unified API for subgraph/isomorphism either via NX or GML backend.

static rule_subgraph_morphism(rule_1: str, rule_2: str, use_filter: bool = False) bool

Evaluates if two GML-formatted rule representations are isomorphic or one is a subgraph of the other (monomorphic).

Parameters: - rule_1 (str): GML string of the first rule. - rule_2 (str): GML string of the second rule. - use_filter (bool, optional): Whether to filter by node/edge labels and vertex counts.

Returns: - bool: True if the monomorphism condition is met, False otherwise.

static subgraph_isomorphism(child_graph: Graph, parent_graph: Graph, node_label_names: List[str] = ['element', 'charge'], node_label_default: List[Any] = ['*', 0], edge_attribute: str = 'order', use_filter: bool = False, check_type: str = 'induced', node_comparator: Callable[[Any, Any], bool] | None = None, edge_comparator: Callable[[Any, Any], bool] | None = None) bool

Enhanced checks if the child graph is a subgraph isomorphic to the parent graph based on customizable node and edge attributes.

class synkit.Graph.Matcher.subgraph_matcher.SubgraphSearchEngine

Bases: object

Static helper routines for sub-graph monomorphism search.

Variables:

DEFAULT_THRESHOLD – default cap on embedding enumeration (5000)

DEFAULT_THRESHOLD: int = 5000
static find_subgraph_mappings(host: Graph, pattern: Graph, *, node_attrs: List[str], edge_attrs: List[str], strategy: str | Strategy = Strategy.COMPONENT, max_results: int | None = None, strict_cc_count: bool = True, threshold: int | None = None, pre_filter: bool = False) List[Dict[int, int]]

Dispatch to a subgraph-matching strategy with optional guards.

Parameters
host, pattern

NetworkX graphs (host ≥ pattern).

node_attrs, edge_attrs

Keys of attributes to match exactly (plus hcount ≥).

strategy

Matching strategy code or enum (“all”, “comp”, “bt”).

max_results

Stop after this many embeddings (None = no limit).

strict_cc_count

If True, host CC count must ≤ pattern CC count for COMPONENT/BACKTRACK.

threshold

Override the default cap (DEFAULT_THRESHOLD) on embeddings.

pre_filter

If True, run a cheap Cartesian-product pre-filter against the threshold.

Returns

List of dictionaries mapping pattern node→host node. Empty if none or if any guard (pre-filter or enumeration) exceeds the threshold.

property help: str

Return the full module docstring.

Clustering

class synkit.Graph.Matcher.graph_cluster.GraphCluster(node_label_names: List[str] = ['element', 'charge'], node_label_default: List[Any] = ['*', 0], edge_attribute: str = 'order', backend: str = 'nx')

Bases: object

available_backends() List[str]

Return available backends: always includes ‘nx’; adds ‘mode’ if the ‘mod’ package is installed.

fit(data: List[Dict], rule_key: str = 'gml', attribute_key: str = 'WLHash', strip: bool = False) List[Dict]

Automatically clusters the rules and assigns them cluster indices based on the similarity, potentially using provided templates for clustering, or generating new templates.

Parameters: - data (List[Dict]): A list containing dictionaries, each representing a

rule along with metadata.

  • rule_key (str): The key in the dictionaries under data where the rule data is stored.

  • attribute_key (str): The key in the dictionaries under data where rule attributes are stored.

Returns: - List[Dict]: Updated list of dictionaries with an added ‘class’ key for cluster

identification.

iterative_cluster(rules: List[str], attributes: List[Any] | None = None, nodeMatch: Callable | None = None, edgeMatch: Callable | None = None) Tuple[List[Set[int]], Dict[int, int]]

Clusters rules based on their similarities, which could include structural or attribute-based similarities depending on the given attributes.

Parameters: - rules (List[str]): List of rules, potentially serialized strings of rule

representations.

  • attributes (Optional[List[Any]]): Attributes associated with each rule for preliminary comparison, e.g., labels or properties.

Returns: - Tuple[List[Set[int]], Dict[int, int]]: A tuple containing a list of sets

(clusters), where each set contains indices of rules in the same cluster, and a dictionary mapping each rule index to its cluster index.

class synkit.Graph.Matcher.batch_cluster.BatchCluster(node_label_names: List[str] = ['element', 'charge'], node_label_default: List[Any] = ['*', 0], edge_attribute: str = 'order', backend: str = 'nx')

Bases: object

available_backends() List[str]

Return available backends: always includes ‘nx’; adds ‘rule’ if the ‘mod’ package is installed.

static batch_dicts(input_list, batch_size)

Splits a list of dictionaries into batches of a specified size.

Args: input_list (list of dict): The list of dictionaries to be batched. batch_size (int): The size of each batch.

Returns: list of list of dict: A list where each element is a batch (sublist) of dictionaries.

Raises: ValueError: If batch_size is less than 1.

cluster(data: List[Dict], templates: List[Dict], rule_key: str = 'gml', attribute_key: str = 'WLHash') Tuple[List[Dict], List[Dict]]

Processes a list of graph data entries, classifying each based on existing templates.

Parameters: - data (List[Dict]): A list of dictionaries, each representing a graph or rule

to be classified.

  • templates (List[Dict]): Dynamic templates used for categorization.

Returns: - Tuple[List[Dict], List[Dict]]: A tuple containing the list of classified data

and the updated templates.

fit(data: List[Dict], templates: List[Dict], rule_key: str = 'gml', attribute_key: str = 'WLHash', batch_size: int | None = None) Tuple[List[Dict], List[Dict]]

Processes and classifies data in batches. Uses GraphCluster for initial processing and a stratified sampling technique to update templates if there is only one batch and no initial templates are provided.

Parameters: - data (List[Dict]): Data to process. - templates (List[Dict]): Templates for categorization. - rule_key (str): Key to access rule or graph data. - attribute_key (str): Key to access attributes used for filtering. - batch_size (Optional[int]): Size of batches for processing, if not provided, processes all data at once.

Returns: - Tuple[List[Dict], List[Dict]]: The processed data and the potentially updated templates.

lib_check(data: Dict, templates: List[Dict], rule_key: str = 'gml', attribute_key: str = 'signature', nodeMatch: Callable | None = None, edgeMatch: Callable | None = None) Dict

Checks and classifies a graph or rule based on existing templates using either graph or rule isomorphism.

Parameters: - data (Dict): A dictionary representing a graph or rule with its attributes and classification. - templates (List[Dict]): Dynamic templates used for categorization. If None, initializes to an empty list. - rule_key (str): Key to access the graph or rule data within the dictionary. - attribute_key (str): An attribute used to filter templates before isomorphism check. - nodeMatch (Optional[Callable]): A function to match nodes, defaults to a predefined generic_node_match. - edgeMatch (Optional[Callable]): A function to match edges, defaults to a predefined generic_edge_match.

Returns: - Dict: The updated dictionary with its classification.

High-Throughput Isomorphism

class synkit.Graph.Matcher.sing.SING(graph: Graph, max_path_length: int = 3, node_att: str | List[str] = ['element', 'charge'], edge_att: str | List[str] | None = 'order')

Bases: object

Subgraph search In Non-homogeneous Graphs (SING)

A lightweight Python implementation adopting a filter-and-refine strategy with path-based features. This version supports heterogeneous graphs through flexible node and edge attribute selections.

search(query_graph: Graph, prune: bool = False) List[Dict[Any, Any]] | bool

Find subgraph isomorphisms.

Parameters
query_graphnx.Graph

Pattern graph to match.

prunebool, default False

If True, returns a boolean indicating existence of at least one mapping. Otherwise returns a list of all mappings.

class synkit.Graph.Matcher.turbo_iso.TurboISO(graph: Graph, node_label: str | List[str] = 'label', edge_label: str | List[str] | None = None, distance_threshold: int = 5000)

Bases: object

TurboISO with pragmatic speed‑ups for many small queries.

  1. Pre‑indexes the host graph by node‑signature → nodes bucket.

  2. Uses lazy, radius‑bounded BFS instead of a pre‑computed all‑pairs matrix (saving both startup time and memory).

  3. Skips distance consistency if the total candidate pool is already smaller than a configurable threshold (defaults to 5 000).

search(Q: Graph, prune: bool = False) List[Dict[Any, Any]] | bool

MTG Submodule

class synkit.Graph.MTG.mtg.MTG(sequences: List[Graph] | List[str], mappings: List[Dict[int, int]] | None = None, *, node_label_names: List[str] | None = None, canonicaliser: GraphCanonicaliser | None = None, mcs_mol: bool = False, mcs: bool = False)

Bases: object

Fuse a chronological series of ITS graphs into a Mechanistic Transition Graph.

Parameters:
  • sequences – A list of ITS-format NetworkX graphs or RSMI strings.

  • mappings – Optional list of precomputed mappings; computed via MCS if None.

  • node_label_names – Keys for node-label matching.

  • canonicaliser – Optional GraphCanonicaliser for snapshot canonicalisation.

Raises:
  • ValueError – On invalid sequence or mapping lengths.

  • RuntimeError – On mapping failures.

static describe() str
get_aam(*, directed: bool = False, explicit_h: bool = False) str
get_compose_its(*, directed: bool = False) Graph
get_mtg(*, directed: bool = False) Graph
property k: int
property node_mapping: Dict[Tuple[int, int], int]
to_dataframe()

Rule Module

The synkit.Rule package provides a flexible framework for reaction rule manipulation, composition, and application in retrosynthesis and forward‐prediction workflows. It is organized into three main subpackages:

  • Compose Build new reaction rules by composing existing ones, supporting both SMARTS‐based and GML workflows.

  • Apply Apply rules to molecule or reaction graphs for retro‐prediction or forward‐simulation (e.g., in reactor contexts).

  • Modify Generate artificial rule, edit and adjust rule templates—add or remove explicit hydrogens, adjust contexts, and fine‐tune matching behavior.

class synkit.Rule.Compose.rule_compose.RuleCompose

Bases: object

static filter_smallest_vertex(combo: List[object]) List[object]

Filters and returns the elements from a list that have the smallest number of vertices in their context.

Parameters: - combo (List[object]): A list of objects, each with a ‘context’ attribute that has a ‘numVertices’ attribute.

Returns: - List[object]: A list of objects from the input list that have the minimum number of vertices in their context.

static rule_cluster(graphs: List) List

Clusters graphs based on their isomorphic relationship and returns a list of graphs, each from a different cluster.

Parameters: - graphs: A list of graph objects.

Returns: - List: A list of graphs where each graph is a representative from a different cluster.

static save_gml_from_text(gml_content: str, gml_file_path: str, rule_id: str, parent_ids: List[str]) bool

Save a text string to a GML file by modifying the ‘ruleID’ line to include parent rule names. This function parses the given GML content, identifies any lines starting with ‘ruleID’, and replaces these lines with a new ruleID that incorporates identifiers from parent rules.

Parameters: - gml_content (str): The content to be saved to the GML file. This should be the entire textual content of a GML file. - gml_file_path (str): The file path where the GML file should be saved. If the path does not exist or is inaccessible, the function will return False and print an error message. - rule_id (str): The original rule ID from the content. This is the identifier that will be modified to include parent IDs in the new ruleID. - parent_ids (List[str]): List of parent rule IDs to prepend to the original rule ID. These are combined into a new identifier to reflect the hierarchical relationship in rule IDs.

Returns: - bool: True if the file was successfully saved, False otherwise. The function attempts to write the modified GML content to the specified file path.

class synkit.Rule.Apply.reactor_rule.ReactorRule

Bases: object

Handles the transformation of SMILES strings to reaction SMILES (RSMI) by applying chemical reaction rules defined in GML strings.

It can optionally reverse the reaction, exclude atom mappings, and include unchanged reagents in the output.

class synkit.Rule.Modify.molecule_rule.MoleculeRule

Bases: object

A class for generating molecule rules, atom-mapped SMILES, and GML representations from SMILES strings.

static generate_atom_map(smiles: str) str | None

Generate atom-mapped SMILES by assigning unique map numbers to each atom in the molecule.

Parameters: - smiles (str): The SMILES string representing the molecule.

Returns: - Optional[str]: The atom-mapped SMILES string, or None if the SMILES string is invalid.

generate_molecule_rule(smiles: str, name: str = 'molecule', explicit_hydrogen: bool = True, sanitize: bool = True) str | None

Generate a GML representation of the molecule rule from SMILES.

Parameters: - smiles (str): The SMILES string representing the molecule. - name (str, optional): The rule name used in GML generation. Defaults to ‘molecule’. - explicit_hydrogen (bool, optional): Whether to include explicit hydrogen atoms in GML. Defaults to True. - sanitize (bool, optional): Whether to sanitize the molecule before conversion. Defaults to True.

Returns: - Optional[str]: The GML representation of the molecule rule, or None if invalid.

static generate_molecule_smart(smiles: str) str | None

Generate a SMARTS-like string from atom-mapped SMILES.

Parameters: - smiles (str): The SMILES string representing the molecule.

Returns: - Optional[str]: The SMARTS-like string derived from atom-mapped SMILES, or None if the SMILES is invalid.

static remove_edges_from_left_right(input_str: str) str

Remove all contents from the ‘left’ and ‘right’ sections of a chemical rule description.

Parameters: - input_str (str): The string representation of the rule.

Returns: - str: The modified string with cleared ‘left’ and ‘right’ sections.

Vis Module

The synkit.Vis package offers a suite of visualization utilities for both chemical reactions and graph structures, enabling clear interpretation of mechanisms, templates, and network architectures:

  • RXNVis (RXNVis) Render full reaction schemes with mapped atom‐colors, curved arrows, and publication‐quality layouts.

  • RuleVis (RuleVis) Display rule templates (SMARTS/GML) as annotated graph transformations, highlighting bond changes.

  • GraphVisualizer (GraphVisualizer) General‐purpose NetworkX graph plotting, with support for ITS, MTG, and custom node/edge styling.

class synkit.Vis.rxn_vis.RXNVis(width: int = 800, height: int = 450, dpi: int = 96, background_colour: Tuple[float, float, float, float] | None = None, highlight_by_reactant: bool = True, bond_line_width: float = 2.0, atom_label_font_size: int = 12, show_atom_map: bool = False)

Bases: object

render(smiles: str, return_bytes: bool = False) Image | bytes

Render a molecule or reaction SMILES to a cropped PNG.

Parameters

smilesstr

Molecule or reaction SMARTS/SMILES. Reactions must contain ‘>>’.

return_bytesbool

If True, return raw PNG bytes instead of a PIL.Image.

Returns

PIL.Image.Image or bytes

Cropped image (or raw PNG bytes) of the molecule/reaction.

save_pdf(smiles: str, path: str, resolution: float = 300.0) None

Render and save as a single‐page PDF.

Parameters

smilesstr

Molecule or reaction SMARTS/SMILES.

pathstr

Output filename ending in .pdf.

resolutionfloat

DPI metadata for the PDF.

save_png(smiles: str, path: str) None

Render and save as a PNG file.

Parameters

smilesstr

Molecule or reaction SMARTS/SMILES.

pathstr

Output filename ending in .png.

class synkit.Vis.rule_vis.RuleVis(backend: str = 'nx')

Bases: object

help() None
mod_vis(gml: str, path: str = './') None

Simple MOD visualization via mod_post CLI.

nx_vis(input: str | Tuple[Graph, Graph, Graph], sanitize: bool = False, figsize: Tuple[int, int] = (18, 5), orientation: str = 'horizontal', show_titles: bool = True, show_atom_map: bool = False, titles: Tuple[str, str, str] = ('Reactant', 'Imaginary Transition State', 'Product'), add_gridbox: bool = False, rule: bool = False) Figure

Visualize reactants, ITS, and products side-by-side or vertically, with interactive plotting turned off to prevent double-display, and correct handling of matplotlib axes arrays.

post() None

Generate an external report via the mod_post CLI.

vis(input: str | Tuple[Graph, Graph, Graph], **kwargs)

Wrapper to select between nx_vis and mod_vis based on backend and input type.

Converts input as needed.

class synkit.Vis.graph_visualizer.GraphVisualizer(node_attributes: Dict[str, str] | None = None, edge_attributes: Dict[str, str] | None = None)

Bases: object

High‑level wrapper around Weinbauer’s plotting utilities.

property edge_attributes: Dict[str, str]

Mapping of edge keys used for RDKit conversion.

help() None

Print a summary of GraphVisualizer methods and usage.

property node_attributes: Dict[str, str]

Mapping of node keys used for RDKit conversion.

plot_as_mol(g: Graph, ax: Axes, use_mol_coords: bool = True, node_color: str = '#FFFFFF', node_size: int = 500, edge_color: str = '#000000', edge_width: float = 2.0, label_color: str = '#000000', font_size: int = 12, show_atom_map: bool = False, bond_char: Dict[int | None, str] | None = None, symbol_key: str = 'element', bond_key: str = 'order', aam_key: str = 'atom_map') None

Core molecular plotting on a given Axes.

plot_its(its: Graph, ax: Axes, use_mol_coords: bool = True, title: str | None = None, node_color: str = '#FFFFFF', node_size: int = 500, edge_color: str = '#000000', edge_weight: float = 2.0, show_atom_map: bool = False, use_edge_color: bool = False, symbol_key: str = 'element', bond_key: str = 'order', aam_key: str = 'atom_map', standard_order_key: str = 'standard_order', font_size: int = 12, og: bool = False, rule: bool = False, title_font_size: str = 20, title_font_weight: str = 'bold', title_font_style: str = 'italic') None
save_molecule(g: Graph, path: str, **kwargs) None

Save molecular graph plot to file.

visualize_its(its: Graph, **kwargs) Figure

Return a Matplotlib Figure plotting the ITS graph without duplicate display.

visualize_its_grid(its_list: list[Graph], subplot_shape: tuple[int, int] | None = None, use_edge_color: bool = True, og: bool = False, figsize: tuple[float, float] = (12, 6), **kwargs) tuple[Figure, list[list[Axes]]]

Plot multiple ITS graphs in a grid layout.

Parameters

its_listlist[nx.Graph]

List of ITS graphs to visualize.

subplot_shapetuple[int, int] | None, optional

Grid shape (rows, cols). If None, determined by list length (supports up to 6).

use_edge_colorbool, default True

Whether to color edges based on ‘standard_order’.

ogbool, default False

Flag for original graph mode when coloring.

figsizetuple[float, float], default (12,6)

Figure size.

**kwargs

Additional parameters passed to plot_its (e.g. title, show_atom_map).

Returns

figplt.Figure

The Matplotlib figure containing the grid.

axeslist of list of plt.Axes

2D list of Axes objects for each subplot.

visualize_molecule(g: Graph, **kwargs) Figure

Return a Figure plotting the molecular graph.

IO Module

The IO module provides tools for handling input and output operations related to the chemical converter. It allows seamless interaction with various chemical data formats.

Chemical Conversion

synkit.IO.chem_converter.gml_to_its(gml: str) Graph

Convert a GML string representation of a reaction back into an ITS graph.

Parameters:

gml (str) – The GML string representing the reaction.

Returns:

The resulting ITS graph.

Return type:

networkx.Graph

synkit.IO.chem_converter.gml_to_smart(gml: str, sanitize: bool = True, explicit_hydrogen: bool = False, useSmiles: bool = True) Tuple[str, Graph]

Convert a GML string back to a SMARTS string and ITS graph.

Parameters:
  • gml (str) – The GML string to convert.

  • sanitize (bool) – Whether to sanitize molecules upon conversion.

  • explicit_hydrogen (bool) – Whether hydrogens are explicitly represented.

  • useSmiles (bool) – If True, output SMILES; otherwise SMARTS.

Returns:

A tuple of (SMARTS string, ITS graph).

Return type:

tuple of (str, networkx.Graph)

synkit.IO.chem_converter.graph_to_rsmi(r: Graph, p: Graph, its: Graph | None = None, sanitize: bool = True, explicit_hydrogen: bool = False) str | None

Convert reactant and product graphs into a reaction SMILES string.

Parameters:
  • r (networkx.Graph) – Graph representing the reactants.

  • p (networkx.Graph) – Graph representing the products.

  • its (networkx.Graph or None) – Imaginary transition state graph. If None, it will be constructed.

  • sanitize (bool) – Whether to sanitize molecules during conversion.

  • explicit_hydrogen (bool) – Whether to preserve explicit hydrogens in the SMILES.

Returns:

Reaction SMILES string in ‘reactants>>products’ format or None on failure.

Return type:

str or None

synkit.IO.chem_converter.graph_to_smi(graph: Graph, sanitize: bool = True, preserve_atom_maps: List[int] | None = None) str | None

Convert a NetworkX molecular graph to a SMILES string.

Parameters:
  • graph (networkx.Graph) – Graph representation of the molecule. Nodes must carry chemical attributes (e.g. ‘element’, atom maps).

  • sanitize (bool) – Whether to perform RDKit sanitization on the resulting molecule.

  • preserve_atom_maps (list of int or None) – List of atom-map numbers for which hydrogens remain explicit.

Returns:

SMILES string, or None if conversion fails.

Return type:

str or None

synkit.IO.chem_converter.its_to_gml(its: Graph, core: bool = True, rule_name: str = 'rule', reindex: bool = True, explicit_hydrogen: bool = False) str

Convert an ITS graph (reaction graph) to GML format.

Parameters:
  • its (networkx.Graph) – The input ITS graph representing the reaction.

  • core (bool) – If True, focus only on the reaction center. Defaults to True.

  • rule_name (str) – Name of the reaction rule. Defaults to “rule”.

  • reindex (bool) – If True, reindex graph nodes. Defaults to True.

  • explicit_hydrogen (bool) – If True, include explicit hydrogens. Defaults to False.

Returns:

The GML representation of the ITS graph.

Return type:

str

synkit.IO.chem_converter.its_to_rsmi(its: Graph, sanitize: bool = True, explicit_hydrogen: bool = False, clean_wildcards: bool = False) str

Convert an ITS graph into a reaction SMILES (rSMI) string.

Parameters:
  • its (networkx.Graph) – A fully annotated ITS graph (nodes with atom-map attributes).

  • sanitize (bool) – If True, sanitize prior to SMILES generation.

  • explicit_hydrogen (bool) – If True, include explicit hydrogens.

Returns:

A canonical reaction-SMILES string (‘reactants>agents>products’).

Return type:

str

Raises:

ValueError – If graph cannot be decomposed or sanitisation fails.

synkit.IO.chem_converter.rsmarts_to_rsmi(rsmarts: str) str

Convert a reaction SMARTS to a reaction SMILES string.

Parameters:

rsmarts (str) – Reaction SMARTS input.

Returns:

Reaction SMILES string.

Return type:

str

Raises:

ValueError – If conversion fails.

synkit.IO.chem_converter.rsmi_to_graph(rsmi: str, drop_non_aam: bool = True, sanitize: bool = True, use_index_as_atom_map: bool = True, node_attrs: List[str] | None = ['element', 'aromatic', 'hcount', 'charge', 'neighbors', 'atom_map'], edge_attrs: List[str] | None = ['order']) Tuple[Graph | None, Graph | None]

Convert a reaction SMILES (RSMI) into reactant and product graphs.

Parameters:
  • rsmi (str) – Reaction SMILES string in “reactants>>products” format.

  • drop_non_aam (bool) – If True, drop nodes without atom mapping numbers.

  • light_weight (bool) – If True, create a light-weight graph.

  • sanitize (bool) – If True, sanitize molecules during conversion.

  • use_index_as_atom_map (bool) – Whether to use atom indices as atom- map numbers.

Returns:

A tuple (reactant_graph, product_graph), each a NetworkX graph or None.

Return type:

tuple of (networkx.Graph or None, networkx.Graph or None)

synkit.IO.chem_converter.rsmi_to_its(rsmi: str, drop_non_aam: bool = True, sanitize: bool = True, use_index_as_atom_map: bool = True, core: bool = False, node_attrs: List[str] | None = ['element', 'aromatic', 'hcount', 'charge', 'neighbors', 'atom_map'], edge_attrs: List[str] | None = ['order'], explicit_hydrogen: bool = False) Graph

Convert a reaction SMILES (rSMI) to an ITS (Imaginary Transition State) graph.

Parameters:
  • rsmi (str) – The reaction SMILES string, optionally containing atom- map labels.

  • drop_non_aam (bool) – If True, discard any molecular fragments without atom-atom maps.

  • sanitize (bool) – If True, perform molecule sanitization (valence checks, kekulization).

  • use_index_as_atom_map (bool) – If True, override atom-map labels by atom indices.

  • core (bool) – If True, return only the reaction-center subgraph of the ITS.

  • node_attrs (list[str]) – Node attributes to include in the ITS graph (e.g., element, charge).

  • edge_attrs (list[str]) – Edge attributes to include in the ITS graph (e.g., order).

  • explicit_hydrogen (bool) – If True, convert implicit hydrogens to explicit nodes.

Returns:

A NetworkX graph representing the complete or core ITS.

Return type:

networkx.Graph

Raises:

ValueError – If the SMILES string is invalid or graph construction fails.

synkit.IO.chem_converter.rsmi_to_rsmarts(rsmi: str) str

Convert a mapped reaction SMILES to a reaction SMARTS string.

Parameters:

rsmi (str) – Reaction SMILES input.

Returns:

Reaction SMARTS string.

Return type:

str

Raises:

ValueError – If conversion fails.

synkit.IO.chem_converter.smart_to_gml(smart: str, core: bool = True, sanitize: bool = True, rule_name: str = 'rule', reindex: bool = False, explicit_hydrogen: bool = False, useSmiles: bool = True) str

Convert a reaction SMARTS (or SMILES) template into a GML‐encoded DPO rule.

Parameters:
  • smart (str) – The reaction SMARTS or SMILES string.

  • core (bool) – If True, include only the reaction core in the GML. Defaults to True.

  • sanitize (bool) – If True, sanitize molecules during conversion. Defaults to True.

  • rule_name (str) – Identifier for the output rule. Defaults to “rule”.

  • reindex (bool) – If True, reindex graph nodes before exporting. Defaults to False.

  • explicit_hydrogen (bool) – If True, include explicit hydrogen atoms. Defaults to False.

  • useSmiles (bool) – If True, treat input as SMILES; if False, as SMARTS. Defaults to True.

Returns:

The GML representation of the reaction rule.

Return type:

str

synkit.IO.chem_converter.smiles_to_graph(smiles: str, drop_non_aam: bool = False, sanitize: bool = True, use_index_as_atom_map: bool = False, node_attrs: List[str] | None = ['element', 'aromatic', 'hcount', 'charge', 'neighbors', 'atom_map'], edge_attrs: List[str] | None = ['order']) Graph | None

Helper function to convert a SMILES string to a NetworkX graph.

Parameters:
  • smiles (str) – SMILES representation of the molecule.

  • drop_non_aam (bool) – Whether to drop nodes without atom mapping numbers.

  • light_weight (bool) – Whether to create a light-weight graph.

  • sanitize (bool) – Whether to sanitize the molecule during conversion.

  • use_index_as_atom_map (bool) – Whether to use atom indices as atom- map numbers.

Returns:

The NetworkX graph representation, or None if conversion fails.

Return type:

networkx.Graph or None

class synkit.IO.mol_to_graph.MolToGraph(node_attrs: List[str] | None = ['element', 'aromatic', 'hcount', 'charge', 'neighbors', 'atom_map'], edge_attrs: List[str] | None = ['order'])

Bases: object

RDKit → NetworkX helper with attribute selection

This class converts RDKit molecules into NetworkX graphs. The original conversion methods (_create_light_weight_graph, _create_detailed_graph, and mol_to_graph) are preserved for full-featured graph creation. The new transform method builds a NetworkX graph including only a specified subset of node and edge attributes.

Parameters:
  • node_attrs (List[str]) – List of node attribute names to retain. If empty or None, all are included.

  • edge_attrs (List[str]) – List of edge attribute names to retain. If empty or None, all are included.

static add_partial_charges(mol: Mol) None

Compute and assign Gasteiger charges to all atoms in the molecule.

Parameters:

mol (Chem.Mol) – The RDKit molecule.

static get_bond_stereochemistry(bond: Bond) str

Determine the stereochemistry (E/Z) of a double bond.

Parameters:

bond (Chem.Bond) – The RDKit Bond object.

Returns:

‘E’, ‘Z’, or ‘N’ for non-stereospecific or non-double bond.

Return type:

str

static get_stereochemistry(atom: Atom) str

Determine the stereochemistry (R/S) of a chiral atom.

Parameters:

atom (Chem.Atom) – The RDKit Atom object.

Returns:

‘R’, ‘S’, or ‘N’ for non-chiral.

Return type:

str

static has_atom_mapping(mol: Mol) bool

Check if any atom in the molecule has an atom mapping number.

Parameters:

mol (Chem.Mol) – The RDKit molecule.

Returns:

True if at least one atom has a mapping number.

Return type:

bool

classmethod mol_to_graph(mol: Mol, drop_non_aam: bool = False, light_weight: bool = False, use_index_as_atom_map: bool = False) Graph

Convert a molecule to a full-featured NetworkX graph.

Parameters:
  • mol (Chem.Mol) – The RDKit molecule to convert.

  • drop_non_aam (bool) – If True, drop atoms without mapping numbers (requires use_index_as_atom_map=True). Defaults to False.

  • light_weight (bool) – If True, create a lightweight graph with minimal attributes. Defaults to False.

  • use_index_as_atom_map (bool) – If True, prefer atom maps as node IDs. Defaults to False.

Returns:

A NetworkX graph of the molecule with all attributes.

Return type:

nx.Graph

static random_atom_mapping(mol: Mol) Mol

Assign random atom mapping numbers to all atoms in the molecule.

Parameters:

mol (Chem.Mol) – The RDKit molecule.

Returns:

The molecule with new random atom mapping numbers.

Return type:

Chem.Mol

transform(mol: Mol, drop_non_aam: bool = False, use_index_as_atom_map: bool = False) Graph

Build a graph directly from a molecule, including only selected attributes.

Parameters:
  • mol (Chem.Mol) – The RDKit molecule to convert.

  • drop_non_aam (bool) – If True, skips atoms without atom mapping numbers (requires use_index_as_atom_map=True). Defaults to False.

  • use_index_as_atom_map (bool) – If True, uses atom mapping numbers as node IDs when present; otherwise uses atom index+1. Defaults to False.

Returns:

A NetworkX graph containing only the specified node and edge attributes.

Return type:

nx.Graph

class synkit.IO.graph_to_mol.GraphToMol(node_attributes: Dict[str, str] = {'atom_map': 'atom_map', 'charge': 'charge', 'element': 'element'}, edge_attributes: Dict[str, str] = {'order': 'order'})

Bases: object

Converts a NetworkX graph representation of a molecule into an RDKit molecule object.

This class reconstructs RDKit molecules from node and edge attributes in a graph, correctly interpreting atom types, charges, mapping numbers, bond orders, and optionally explicit hydrogen counts.

Parameters:
  • node_attributes (Dict[str, str]) – Mapping of expected attribute names to node keys in the graph. For example, {“element”: “element”, “charge”: “charge”, “atom_map”: “atom_map”}.

  • edge_attributes (Dict[str, str]) – Mapping of expected attribute names to edge keys in the graph. For example, {“order”: “order”}.

static get_bond_type_from_order(order: float) BondType

Converts a numerical bond order into the corresponding RDKit BondType.

Parameters:

order (float) – The numerical bond order (typically 1, 2, or 3).

Returns:

The corresponding RDKit bond type (single, double, triple, or aromatic).

Return type:

Chem.BondType

graph_to_mol(graph: Graph, ignore_bond_order: bool = False, sanitize: bool = True, use_h_count: bool = False) Mol

Converts a NetworkX graph into an RDKit molecule.

Parameters:
  • graph (nx.Graph) – The NetworkX graph representing the molecule.

  • ignore_bond_order (bool) – If True, all bonds are created as single bonds regardless of edge attributes. Defaults to False.

  • sanitize (bool) – If True, the resulting RDKit molecule will be sanitized after construction. Defaults to True.

  • use_h_count (bool) – If True, the ‘hcount’ attribute (if present) will be used to set explicit hydrogen counts on atoms. Defaults to False.

Returns:

An RDKit molecule constructed from the graph’s nodes and edges.

Return type:

Chem.Mol

class synkit.IO.nx_to_gml.NXToGML

Bases: object

Converts NetworkX graph representations of chemical reactions to GML (Graph Modelling Language) strings. Useful for exporting reaction rules in a standard graph format.

This class provides static methods for converting individual graphs, sets of reaction graphs, and managing charge/attribute changes in the export process.

static transform(graph_rules: Tuple[Graph, Graph, Graph], rule_name: str = 'Test', reindex: bool = False, attributes: List[str] = ['charge'], explicit_hydrogen: bool = False) str

Processes a triple of reaction graphs to generate a GML string rule, with options for node reindexing and explicit hydrogen expansion.

Parameters:
  • graph_rules (tuple[nx.Graph, nx.Graph, nx.Graph]) – Tuple containing (L, R, K) reaction graphs.

  • rule_name (str) – The rule name to use in the output.

  • reindex (bool) – Whether to reindex node IDs based on the L graph sequence.

  • attributes (list[str]) – List of attribute names to check for node changes.

  • explicit_hydrogen (bool) – Whether to explicitly include hydrogen atoms in the output.

Returns:

The GML string representing the chemical rule.

Return type:

str

class synkit.IO.gml_to_nx.GMLToNX(gml_text: str)

Bases: object

Parses GML-like text and transforms it into three NetworkX graphs representing the left, right, and context graphs of a chemical reaction step.

Parameters:

gml_text (str) – The GML-like text to parse.

Variables:

graphs (dict[str, nx.Graph]) – A dictionary containing ‘left’, ‘right’, and ‘context’ NetworkX graphs.

transform() Tuple[Graph, Graph, Graph]

Transforms the GML-like text into three NetworkX graphs: left, right, and context.

Returns:

A tuple of (left_graph, right_graph, context_graph), each a NetworkX graph.

Return type:

tuple[nx.Graph, nx.Graph, nx.Graph]

IO Functions

synkit.IO.data_io.collect_data(num_batches: int, temp_dir: str, file_template: str) List[Any]

Collects and aggregates data from multiple pickle files into a single list.

Parameters:
  • num_batches (int) – The number of batch files to process.

  • temp_dir (str) – The directory where the batch files are stored.

  • file_template (str) – The template string for batch file names, expecting an integer formatter.

Returns:

A list of aggregated data items from all batch files.

Return type:

list

synkit.IO.data_io.load_compressed(filename: str) ndarray

Loads a NumPy array from a compressed .npz file.

Parameters:

filename (str) – The path of the .npz file to load.

Returns:

The loaded NumPy array.

Return type:

numpy.ndarray

Raises:

KeyError – If the .npz file does not contain an array with the key ‘array’.

synkit.IO.data_io.load_database(pathname: str = './Data/database.json') List[Dict]

Load a database (a list of dictionaries) from a JSON file.

Parameters:

pathname (str) – The path from where the database will be loaded. Defaults to ‘./Data/database.json’.

Returns:

The loaded database.

Return type:

list[dict]

Raises:

ValueError – If there is an error reading the file.

synkit.IO.data_io.load_dg(path: str, graph_db: list, rule_db: list)

Load a DG instance from a dumped file.

Parameters:
  • path (str) – The file path of the dumped graph.

  • graph_db (list) – List of Graph objects representing the graph database.

  • rule_db (list) – List of Rule objects required for loading the DG.

Returns:

The loaded derivation graph instance.

Return type:

DG

Raises:

Exception – If loading fails.

synkit.IO.data_io.load_dict_from_json(file_path: str) dict | None

Load a dictionary from a JSON file.

Parameters:

file_path (str) – The path to the JSON file from which to load the dictionary.

Returns:

The dictionary loaded from the JSON file, or None if an error occurs.

Return type:

dict or None

synkit.IO.data_io.load_from_pickle(filename: str) List[Any]

Load data from a pickle file.

Parameters:

filename (str) – The name of the pickle file to load data from.

Returns:

The data loaded from the pickle file.

Return type:

list

synkit.IO.data_io.load_from_pickle_generator(file_path: str) Generator[Any, None, None]

A generator that yields items from a pickle file where each pickle load returns a list of dictionaries.

Parameters:

file_path (str) – The path to the pickle file to load.

Yields:

A single item from the list of dictionaries stored in the pickle file.

Return type:

Any

synkit.IO.data_io.load_gml_as_text(gml_file_path: str) str | None

Load the contents of a GML file as a text string.

Parameters:

gml_file_path (str) – The file path to the GML file.

Returns:

The text content of the GML file, or None if the file does not exist or an error occurs.

Return type:

str or None

synkit.IO.data_io.load_list_from_file(file_path: str) list

Load a list from a JSON-formatted file.

Parameters:

file_path (str) – The path to the file to read the list from.

Returns:

The list loaded from the file.

Return type:

list

synkit.IO.data_io.load_model(filename: str) Any

Load a machine learning model from a file using joblib.

Parameters:

filename (str) – The path to the file from which the model will be loaded.

Returns:

The loaded machine learning model.

Return type:

object

synkit.IO.data_io.save_compressed(array: ndarray, filename: str) None

Saves a NumPy array in a compressed format using .npz extension.

Parameters:
  • array (numpy.ndarray) – The NumPy array to be saved.

  • filename (str) – The file path or name to save the array to, with a ‘.npz’ extension.

synkit.IO.data_io.save_database(database: List[Dict], pathname: str = './Data/database.json') None

Save a database (a list of dictionaries) to a JSON file.

Parameters:
  • database (list[dict]) – The database to be saved.

  • pathname (str) – The path where the database will be saved. Defaults to ‘./Data/database.json’.

Raises:
  • TypeError – If the database is not a list of dictionaries.

  • ValueError – If there is an error writing the file.

synkit.IO.data_io.save_dg(dg, path: str) str

Save a DG instance to disk using MØD’s dump method.

Parameters:
  • dg (DG) – The derivation graph to save.

  • path (str) – The file path where the graph will be dumped.

Returns:

The path of the dumped file.

Return type:

str

Raises:

Exception – If saving fails.

synkit.IO.data_io.save_dict_to_json(data: dict, file_path: str) None

Save a dictionary to a JSON file.

Parameters:
  • data (dict) – The dictionary to be saved.

  • file_path (str) – The path to the file where the dictionary should be saved.

synkit.IO.data_io.save_list_to_file(data_list: list, file_path: str) None

Save a list to a file in JSON format.

Parameters:
  • data_list (list) – The list to save.

  • file_path (str) – The path to the file where the list will be saved.

synkit.IO.data_io.save_model(model: Any, filename: str) None

Save a machine learning model to a file using joblib.

Parameters:
  • model (object) – The machine learning model to save.

  • filename (str) – The path to the file where the model will be saved.

synkit.IO.data_io.save_text_as_gml(gml_text: str, file_path: str) bool

Save a GML text string to a file.

Parameters:
  • gml_text (str) – The GML content as a text string.

  • file_path (str) – The file path where the GML text will be saved.

Returns:

True if saving was successful, False otherwise.

Return type:

bool

synkit.IO.data_io.save_to_pickle(data: List[Dict[str, Any]], filename: str) None

Save a list of dictionaries to a pickle file.

Parameters:
  • data (list[dict]) – A list of dictionaries to be saved.

  • filename (str) – The name of the file where the data will be saved.

synkit.IO.data_io.collect_data(num_batches: int, temp_dir: str, file_template: str) List[Any]

Collects and aggregates data from multiple pickle files into a single list.

Parameters:
  • num_batches (int) – The number of batch files to process.

  • temp_dir (str) – The directory where the batch files are stored.

  • file_template (str) – The template string for batch file names, expecting an integer formatter.

Returns:

A list of aggregated data items from all batch files.

Return type:

list

synkit.IO.data_io.load_compressed(filename: str) ndarray

Loads a NumPy array from a compressed .npz file.

Parameters:

filename (str) – The path of the .npz file to load.

Returns:

The loaded NumPy array.

Return type:

numpy.ndarray

Raises:

KeyError – If the .npz file does not contain an array with the key ‘array’.

synkit.IO.data_io.load_database(pathname: str = './Data/database.json') List[Dict]

Load a database (a list of dictionaries) from a JSON file.

Parameters:

pathname (str) – The path from where the database will be loaded. Defaults to ‘./Data/database.json’.

Returns:

The loaded database.

Return type:

list[dict]

Raises:

ValueError – If there is an error reading the file.

synkit.IO.data_io.load_dg(path: str, graph_db: list, rule_db: list)

Load a DG instance from a dumped file.

Parameters:
  • path (str) – The file path of the dumped graph.

  • graph_db (list) – List of Graph objects representing the graph database.

  • rule_db (list) – List of Rule objects required for loading the DG.

Returns:

The loaded derivation graph instance.

Return type:

DG

Raises:

Exception – If loading fails.

synkit.IO.data_io.load_dict_from_json(file_path: str) dict | None

Load a dictionary from a JSON file.

Parameters:

file_path (str) – The path to the JSON file from which to load the dictionary.

Returns:

The dictionary loaded from the JSON file, or None if an error occurs.

Return type:

dict or None

synkit.IO.data_io.load_from_pickle(filename: str) List[Any]

Load data from a pickle file.

Parameters:

filename (str) – The name of the pickle file to load data from.

Returns:

The data loaded from the pickle file.

Return type:

list

synkit.IO.data_io.load_from_pickle_generator(file_path: str) Generator[Any, None, None]

A generator that yields items from a pickle file where each pickle load returns a list of dictionaries.

Parameters:

file_path (str) – The path to the pickle file to load.

Yields:

A single item from the list of dictionaries stored in the pickle file.

Return type:

Any

synkit.IO.data_io.load_gml_as_text(gml_file_path: str) str | None

Load the contents of a GML file as a text string.

Parameters:

gml_file_path (str) – The file path to the GML file.

Returns:

The text content of the GML file, or None if the file does not exist or an error occurs.

Return type:

str or None

synkit.IO.data_io.load_list_from_file(file_path: str) list

Load a list from a JSON-formatted file.

Parameters:

file_path (str) – The path to the file to read the list from.

Returns:

The list loaded from the file.

Return type:

list

synkit.IO.data_io.load_model(filename: str) Any

Load a machine learning model from a file using joblib.

Parameters:

filename (str) – The path to the file from which the model will be loaded.

Returns:

The loaded machine learning model.

Return type:

object

synkit.IO.data_io.save_compressed(array: ndarray, filename: str) None

Saves a NumPy array in a compressed format using .npz extension.

Parameters:
  • array (numpy.ndarray) – The NumPy array to be saved.

  • filename (str) – The file path or name to save the array to, with a ‘.npz’ extension.

synkit.IO.data_io.save_database(database: List[Dict], pathname: str = './Data/database.json') None

Save a database (a list of dictionaries) to a JSON file.

Parameters:
  • database (list[dict]) – The database to be saved.

  • pathname (str) – The path where the database will be saved. Defaults to ‘./Data/database.json’.

Raises:
  • TypeError – If the database is not a list of dictionaries.

  • ValueError – If there is an error writing the file.

synkit.IO.data_io.save_dg(dg, path: str) str

Save a DG instance to disk using MØD’s dump method.

Parameters:
  • dg (DG) – The derivation graph to save.

  • path (str) – The file path where the graph will be dumped.

Returns:

The path of the dumped file.

Return type:

str

Raises:

Exception – If saving fails.

synkit.IO.data_io.save_dict_to_json(data: dict, file_path: str) None

Save a dictionary to a JSON file.

Parameters:
  • data (dict) – The dictionary to be saved.

  • file_path (str) – The path to the file where the dictionary should be saved.

synkit.IO.data_io.save_list_to_file(data_list: list, file_path: str) None

Save a list to a file in JSON format.

Parameters:
  • data_list (list) – The list to save.

  • file_path (str) – The path to the file where the list will be saved.

synkit.IO.data_io.save_model(model: Any, filename: str) None

Save a machine learning model to a file using joblib.

Parameters:
  • model (object) – The machine learning model to save.

  • filename (str) – The path to the file where the model will be saved.

synkit.IO.data_io.save_text_as_gml(gml_text: str, file_path: str) bool

Save a GML text string to a file.

Parameters:
  • gml_text (str) – The GML content as a text string.

  • file_path (str) – The file path where the GML text will be saved.

Returns:

True if saving was successful, False otherwise.

Return type:

bool

synkit.IO.data_io.save_to_pickle(data: List[Dict[str, Any]], filename: str) None

Save a list of dictionaries to a pickle file.

Parameters:
  • data (list[dict]) – A list of dictionaries to be saved.

  • filename (str) – The name of the file where the data will be saved.