3. Main API

class PReLIM.CpGBin(matrix, binStartInc=None, binEndInc=None, cpgPositions=None, sequence='', encoding=None, missingToken=-1, chromosome=None, binSize=100, species='MM10', verbose=True, tag1=None, tag2=None)[source]

A class that contains information about a CpG Bin. Does not need to be used directly, PReLIM will use this class internally.

class PReLIM.PReLIM(cpgDensity=2)[source]

Class for a PReLIM model.

Example usage:

from PReLIM import PReLIM

import numpy as np

# Collect methylation matrices, 1 is methylated, 0 is unmethylated, -1 is unknown

# Each column is a cpg site, each row is a read

bin1 = np.array([[1,0],[0,-1],[-1,1],[0,0]],dtype=float)

bin2 = np.array([[1,0],[1,0],[-1,1],[0,0],[0,1],[1,1],[0,0]],dtype=float)

bin3 = np.array([[-1,1],[0,-1],[-1,1],[0,0]],dtype=float)

etc

bin1000 = np.array([[1,-1],[0,1],[-1,1],[1,0]],dtype=float)

bin1001 = np.array([[1,1],[0,0],[0,1],[1,1]],dtype=float)

bin1002 = np.array([[1,1],[1,1],[0,1],[1,0]],dtype=float)

bin1003 = np.array([[0,0],[1,0],[0,1],[1,1]],dtype=float)

# Collection of bins

bins = [bin1, bin2, bin3, … bin1000, bin1001, bin1002, bin1003]

model = PReLIM(cpgDensity=2)

# Options for training/saving model

model.train(bins, model_file=”no”) # don’t want a model file, must use “no”

# Use model for imputation

imputed_bin1 = model.impute(bin1)

# You can also use batch imputation to impute on many bins at once

imputed_bins = model.impute_many(bins)

fit(X_train, y_train, n_estimators=[10, 50, 100, 500, 1000], cores=-1, max_depths=[1, 5, 10, 20, 30], model_file=None, verbose=False)[source]

Train a random forest model using grid search on a feature matrix (X) and class labels (y)

Usage: model.fit(X_train, y_train)

Parameters:
  • X_train – numpy array, Contains feature vectors.
  • y_train – numpy array, Contains labels for training data.
  • n_estimators – list, the number of estimators to try during a grid search.
  • max_depths – list, the maximum depths of trees to try during a grid search.
  • cores – integer, the number of cores to use during training, helpful for grid search.
  • model_file – string,The name of the file to save the model to. If None, then create a file name that includes a timestamp. If you don’t want to save a file, set this to “no”
Returns:

The trained sklearn model

get_X_y(bin_matrices, verbose=False)[source]
Parameters:
  • bin_matrices – list of CpG matrices
  • verbose – prints more info if true
Returns:

feature matrix (X) and class labels (y)

impute(matrix)[source]

Impute the missing values in a CpG matrix. Values are filled with the predicted probability of methylation.

param matrix:a 2d np array, dtype=float, representing a CpG matrix, 1=methylated, 0=unmethylated, -1=unknown
Returns:A 2d numpy array with predicted probabilities of methylation
impute_many(matrices)[source]

Imputes a bunch of matrices at the same time to help speed up imputation time.

param matrices:list of CpG matrices, where each matrix is a 2d np array, dtype=float, representing a CpG matrix, 1=methylated, 0=unmethylated, -1=unknown
Returns:A List of 2d numpy arrays with predicted probabilities of methylation for unknown values.
loadWeights(model_file)[source]

self.model is loaded with the provided weights

Parameters:model_file – string, name of file with a saved model
predict(X)[source]

Predict the probability of methylation for each sample in the given feature matrix

Usage: y_pred = CpGNet.predict(X)

Parameters:
  • X – numpy array, contains feature vectors
  • verbose – prints more info if true
Returns:

1-d numpy array of prediction values

predict_classes(X)[source]

Predict the classes of the samples in the given feature matrix

Usage: y_pred = CpGNet.predict_classes(X)

Parameters:
  • X – numpy array, contains feature vectors
  • verbose – prints more info if true
Returns:

1-d numpy array of predicted classes

predict_proba(X)[source]

Predict the classes of the samples in the given feature matrix Same as predict, just a convenience to have in case of differen styles

Usage: y_pred = CpGNet.predict_classes(X)

Parameters:
  • X – numpy array, contains feature vectors
  • verbose – prints more info if true
Returns:

1-d numpy array of predicted classes

train(bin_matrices, model_file='no', verbose=False)[source]
Train a PReLIM model using cpg matrices.
Parameters:
  • bin_matrices – list of cpg matrices
  • model_file – The name of the file to save the model to. If None, then create a file name that includes a timestamp. If you don’t want to save a file, set this to “no”
  • verbose – prints more info if true