-
w : int
default : 10000
Length of sliding window
-
dw : int
default : 100
Step size of sliding window
-
karlin_mode : str {'normalized', 'original', 'raw'}
default : 'normalized'
The mode of karlin feature computation
-
If normalized
$$2-mers = \frac{freq(XY)}{\sqrt{freq(X) \times freq(Y)}}$$
-
If original
$$2-mers = \frac{freq(XY)}{freq(X) \times freq(Y)}$$
-
If raw
$$2-mers = freq(X)$$
Here,
$$X \in [A,T,C,G]$$
$$Y \in [A,T,C,G]$$
-
pca_dn : int [1,16]
default : 2
Number of PCA components for dinucleotide features
-
pca_amino_acid : int [1,20]
default : 2
Number of PCA components for amino acid features
-
pca_kmer4 : int [1,256]
default : 2
Number of PCA components for 4-mer features
-
entropy_features : bool
default : True
Using entropy features or not
-
contamination_model1 : float [0.0,0.99]
default : 0.15
The probable amount of outliers in the first anomaly detection scan.
This is required by sklearn.covariance.EllipticEnvelope.
-
support_fraction_model1 : float [0.0,0.99]
default : 0.75
The proportion of points to be included in the support of the raw MCD estimate in the first
anomaly detection scan.
This is required by sklearn.covariance.EllipticEnvelope.
-
contamination_model2 : float [0.0,0.99]
default : 0.05
The probable amount of outliers in the second anomaly detection scan.
This is required by sklearn.covariance.EllipticEnvelope.
-
support_fraction_model2 : float [0.0,0.99]
default : 0.9
The proportion of points to be included in the support of the raw MCD estimate in the second
anomaly detection scan.
This is required by sklearn.covariance.EllipticEnvelope.
-
median_filter_window_len : int
default : 400
Window size used in median filtering.
-
min_island_len : int
default : 10000
Minimum length of predicted genomic islands. Islands with lesser length will be discarded