3. Main API¶
-
class
PReLIM.
CpGBin
(matrix, binStartInc=None, binEndInc=None, cpgPositions=None, sequence='', encoding=None, missingToken=-1, chromosome=None, binSize=100, species='MM10', verbose=True, tag1=None, tag2=None)[source]¶ A class that contains information about a CpG Bin. Does not need to be used directly, PReLIM will use this class internally.
-
class
PReLIM.
PReLIM
(cpgDensity=2)[source]¶ Class for a PReLIM model.
Example usage:
from PReLIM import PReLIM
import numpy as np
# Collect methylation matrices, 1 is methylated, 0 is unmethylated, -1 is unknown
# Each column is a cpg site, each row is a read
bin1 = np.array([[1,0],[0,-1],[-1,1],[0,0]],dtype=float)
bin2 = np.array([[1,0],[1,0],[-1,1],[0,0],[0,1],[1,1],[0,0]],dtype=float)
bin3 = np.array([[-1,1],[0,-1],[-1,1],[0,0]],dtype=float)
etc
bin1000 = np.array([[1,-1],[0,1],[-1,1],[1,0]],dtype=float)
bin1001 = np.array([[1,1],[0,0],[0,1],[1,1]],dtype=float)
bin1002 = np.array([[1,1],[1,1],[0,1],[1,0]],dtype=float)
bin1003 = np.array([[0,0],[1,0],[0,1],[1,1]],dtype=float)
# Collection of bins
bins = [bin1, bin2, bin3, … bin1000, bin1001, bin1002, bin1003]
model = PReLIM(cpgDensity=2)
# Options for training/saving model
model.train(bins, model_file=”no”) # don’t want a model file, must use “no”
# Use model for imputation
imputed_bin1 = model.impute(bin1)
# You can also use batch imputation to impute on many bins at once
imputed_bins = model.impute_many(bins)
-
fit
(X_train, y_train, n_estimators=[10, 50, 100, 500, 1000], cores=-1, max_depths=[1, 5, 10, 20, 30], model_file=None, verbose=False)[source]¶ Train a random forest model using grid search on a feature matrix (X) and class labels (y)
Usage: model.fit(X_train, y_train)
Parameters: - X_train – numpy array, Contains feature vectors.
- y_train – numpy array, Contains labels for training data.
- n_estimators – list, the number of estimators to try during a grid search.
- max_depths – list, the maximum depths of trees to try during a grid search.
- cores – integer, the number of cores to use during training, helpful for grid search.
- model_file – string,The name of the file to save the model to. If None, then create a file name that includes a timestamp. If you don’t want to save a file, set this to “no”
Returns: The trained sklearn model
-
get_X_y
(bin_matrices, verbose=False)[source]¶ Parameters: - bin_matrices – list of CpG matrices
- verbose – prints more info if true
Returns: feature matrix (X) and class labels (y)
-
impute
(matrix)[source]¶ Impute the missing values in a CpG matrix. Values are filled with the predicted probability of methylation.
param matrix: a 2d np array, dtype=float, representing a CpG matrix, 1=methylated, 0=unmethylated, -1=unknown Returns: A 2d numpy array with predicted probabilities of methylation
-
impute_many
(matrices)[source]¶ Imputes a bunch of matrices at the same time to help speed up imputation time.
param matrices: list of CpG matrices, where each matrix is a 2d np array, dtype=float, representing a CpG matrix, 1=methylated, 0=unmethylated, -1=unknown Returns: A List of 2d numpy arrays with predicted probabilities of methylation for unknown values.
-
loadWeights
(model_file)[source]¶ self.model is loaded with the provided weights
Parameters: model_file – string, name of file with a saved model
-
predict
(X)[source]¶ Predict the probability of methylation for each sample in the given feature matrix
Usage: y_pred = CpGNet.predict(X)
Parameters: - X – numpy array, contains feature vectors
- verbose – prints more info if true
Returns: 1-d numpy array of prediction values
-
predict_classes
(X)[source]¶ Predict the classes of the samples in the given feature matrix
Usage: y_pred = CpGNet.predict_classes(X)
Parameters: - X – numpy array, contains feature vectors
- verbose – prints more info if true
Returns: 1-d numpy array of predicted classes
-
predict_proba
(X)[source]¶ Predict the classes of the samples in the given feature matrix Same as predict, just a convenience to have in case of differen styles
Usage: y_pred = CpGNet.predict_classes(X)
Parameters: - X – numpy array, contains feature vectors
- verbose – prints more info if true
Returns: 1-d numpy array of predicted classes
-
train
(bin_matrices, model_file='no', verbose=False)[source]¶ - Train a PReLIM model using cpg matrices.
Parameters: - bin_matrices – list of cpg matrices
- model_file – The name of the file to save the model to. If None, then create a file name that includes a timestamp. If you don’t want to save a file, set this to “no”
- verbose – prints more info if true
-