taps.apps.docking.train¶
Protein docking model training.
Module adapted from ParslDock.
MorganFingerprintTransformer ¶
            Bases: BaseEstimator, TransformerMixin
Class that converts SMILES strings to fingerprint vectors.
Source code in taps/apps/docking/train.py
                  
                fit() ¶
fit(
    X: list[str], y: NDArray[bool] | None = None
) -> MorganFingerprintTransformer
Train model.
Parameters:
- 
        
X(list[str]) –List of SMILES strings.
 - 
        
y(NDArray[bool] | None, default:None) –Array of true fingerprints.
 
Returns:
- 
            
MorganFingerprintTransformer–The trained model.
 
Source code in taps/apps/docking/train.py
        transform() ¶
Compute the fingerprints.
Parameters:
- 
        
X(list[str]) –List of SMILES strings.
 - 
        
y(NDArray[bool] | None, default:None) –Array of true fingerprints.
 
Returns:
Source code in taps/apps/docking/train.py
        compute_morgan_fingerprints() ¶
compute_morgan_fingerprints(
    smiles: str,
    fingerprint_length: int,
    fingerprint_radius: int,
) -> NDArray[bool]
Get Morgan Fingerprint of a specific SMILES string.
Adapted from: https://github.com/google-research/google-research/blob/> dfac417/mol_dqn/chemgraph/dqn/deep_q_networks.py#L750
Parameters:
- 
        
smiles(str) –The molecule as a SMILES string.
 - 
        
fingerprint_length(int) –Bit-length of fingerprint.
 - 
        
fingerprint_radius(int) –Radius used to compute fingerprint.
 
Returns:
- 
            
NDArray[bool]–Array containing the Morgan fingerprint with shape
 - 
            
NDArray[bool]–[hparams, fingerprint_length]. 
Source code in taps/apps/docking/train.py
        train_model() ¶
Train a machine learning model using Morgan Fingerprints.
Parameters:
- 
        
training_data(DataFrame) –Dataframe with a 'smiles' and 'score' column that contains molecule structure and docking score, respectfully.
 
Returns:
- 
            
Pipeline–A trained model.
 
Source code in taps/apps/docking/train.py
        run_model() ¶
Run a model on a list of smiles strings.
Parameters:
- 
        
model(Pipeline) –Trained model that takes SMILES strings as inputs.
 - 
        
smiles(list[str]) –List of molecules to evaluate.
 
Returns:
- 
            
DataFrame–A dataframe with the molecules and their predicted outputs