Skip to content

taps.apps.moldesign.tasks

train_model()

train_model(train_data: DataFrame) -> Pipeline

Train a machine learning model using Morgan Fingerprints.

Parameters:

  • train_data (DataFrame) –

    Dataframe with a 'smiles' and 'ie' column that contains molecule structure and property, respectfully.

Returns:

  • Pipeline

    A trained model.

Source code in taps/apps/moldesign/tasks.py
def train_model(train_data: pd.DataFrame) -> Pipeline:
    """Train a machine learning model using Morgan Fingerprints.

    Args:
        train_data: Dataframe with a 'smiles' and 'ie' column
            that contains molecule structure and property, respectfully.

    Returns:
        A trained model.
    """
    # Imports for python functions run remotely must be defined inside the
    # function
    from sklearn.neighbors import KNeighborsRegressor
    from sklearn.pipeline import Pipeline

    from taps.apps.moldesign.chemfunctions import MorganFingerprintTransformer

    model = Pipeline(
        [
            ('fingerprint', MorganFingerprintTransformer()),
            (
                'knn',
                KNeighborsRegressor(
                    n_neighbors=4,
                    weights='distance',
                    metric='jaccard',
                    n_jobs=-1,
                ),
            ),
        ],
    )

    # Ray arrays are immutable so need to clone.
    return model.fit(train_data['smiles'].copy(), train_data['ie'].copy())

run_model()

run_model(model: Pipeline, smiles: list[str]) -> DataFrame

Run a model on a list of smiles strings.

Parameters:

  • model (Pipeline) –

    Trained model that takes SMILES strings as inputs.

  • smiles (list[str]) –

    List of molecules to evaluate.

Returns:

  • DataFrame

    A dataframe with the molecules and their predicted outputs.

Source code in taps/apps/moldesign/tasks.py
def run_model(model: Pipeline, smiles: list[str]) -> pd.DataFrame:
    """Run a model on a list of smiles strings.

    Args:
        model: Trained model that takes SMILES strings as inputs.
        smiles: List of molecules to evaluate.

    Returns:
        A dataframe with the molecules and their predicted outputs.
    """
    pred_y = model.predict(smiles)
    return pd.DataFrame({'smiles': smiles, 'ie': pred_y})

combine_inferences()

combine_inferences(*inputs: DataFrame) -> DataFrame

Concatenate a series of inferences into a single DataFrame.

Parameters:

  • inputs (DataFrame, default: () ) –

    A list of the component DataFrames.

Returns:

  • DataFrame

    A single DataFrame containing the same inferences.

Source code in taps/apps/moldesign/tasks.py
def combine_inferences(*inputs: pd.DataFrame) -> pd.DataFrame:
    """Concatenate a series of inferences into a single DataFrame.

    Args:
        inputs: A list of the component DataFrames.

    Returns:
        A single DataFrame containing the same inferences.
    """
    return pd.concat(inputs, ignore_index=True)