SSG-LUGIA: Single Sequence based Genome Level Unsupervised Genomic Island Prediction Algorithm

Model Parameters

SSG-LUGIA combines several sequence based features to infer GIs using an unsupervised anomaly detection pipeline.

Default Models

We present 3 variants of SSG-LUGIA. SSG-LUGIAF, SSG-LUGIAP and SSG-LUGIAR tries to optimize the F1-Score, Precision and Recall respectively. The parameter values used in these models are as follows:
Parameter SSG-LUGIAF SSG-LUGIAP SSG-LUGIAR
w 10000 10000 10000
dw 100 100 100
karlin_mode 'normalized' 'normalized' 'normalized'
pca_dn 2 2 2
pca_amino_acid 2 2 2
pca_kmer4 2 2 2
contamination_model1 0.15 0.075 0.2
support_fraction_model1 0.75 0.75 0.75
contamination_model2 0.05 0.075 0.25
support_fraction_model2 0.9 0.9 0.9
median_filter_window_len 400 400 400
min_island_len 10000 10000 10000

Creating Custom Models

Users can create custom models using Python dictionaries containing the parameters.
custom_model = {}
custom_model['w'] = 15000
custom_model['dw'] = 250
custom_model['karlin_mode'] = "normalized"
custom_model['pca_dn'] = 2
custom_model['pca_amino_acid'] = 2
custom_model['pca_kmer4'] = 2
custom_model['entropy_features'] = True
custom_model['contamination_model1'] = 0.25
custom_model['support_fraction_model1'] = 0.75
custom_model['contamination_model2'] = 0.125
custom_model['support_fraction_model2'] = 0.9
custom_model['median_filter_window_len'] = 1000
custom_model['min_island_len'] = 4000
The custom models can be used by passing the as model_parameters
from main import SSG_LUGIA

SSG_LUGIA(sequence_fasta_file_path='sample_data/NC_003198.1.fasta',model_parameters=custom_model)
Such models can be stored as .json files and loaded for future use.
import json

with open('custom_model.json', 'w') as outfile:
    json.dump(custom_model, outfile)
                
The .json format models can be used by passing the path to the .json file
from main import SSG_LUGIA

SSG_LUGIA(sequence_fasta_file_path='sample_data/NC_003198.1.fasta',model_name='custom_model.json')
SSG-LUGIA: Single Sequence based Genome Level Unsupervised Genomic Island Prediction Algorithm
Model Parameters
SSG-LUGIA combines several sequence based features to infer GIs using an unsupervised anomaly detection pipeline.
Default Models
We present 3 variants of SSG-LUGIA. SSG-LUGIAF, SSG-LUGIAP and SSG-LUGIAR tries to optimize the F1-Score, Precision and Recall respectively. The parameter values used in these models are as follows:
Parameter SSG-LUGIAF SSG-LUGIAP SSG-LUGIAR
w 10000 10000 10000
dw 100 100 100
karlin_mode 'normalized' 'normalized' 'normalized'
pca_dn 2 2 2
pca_amino_acid 2 2 2
pca_kmer4 2 2 2
contamination_model1 0.15 0.075 0.2
support_fraction_model1 0.75 0.75 0.75
contamination_model2 0.05 0.075 0.25
support_fraction_model2 0.9 0.9 0.9
median_filter_window_len 400 400 400
min_island_len 10000 10000 10000
Creating Custom Models
Users can create custom models using Python dictionaries containing the parameters.
custom_model = {}
custom_model['w'] = 15000
custom_model['dw'] = 250
custom_model['karlin_mode'] = "normalized"
custom_model['pca_dn'] = 2
custom_model['pca_amino_acid'] = 2
custom_model['pca_kmer4'] = 2
custom_model['entropy_features'] = True
custom_model['contamination_model1'] = 0.25
custom_model['support_fraction_model1'] = 0.75
custom_model['contamination_model2'] = 0.125
custom_model['support_fraction_model2'] = 0.9
custom_model['median_filter_window_len'] = 1000
custom_model['min_island_len'] = 4000
The custom models can be used by passing the as model_parameters
from main import SSG_LUGIA

SSG_LUGIA(sequence_fasta_file_path='sample_data/NC_003198.1.fasta',model_parameters=custom_model)
Such models can be stored as .json files and loaded for future use.
import json

with open('custom_model.json', 'w') as outfile:
    json.dump(custom_model, outfile)
                
The .json format models can be used by passing the path to the .json file
from main import SSG_LUGIA

SSG_LUGIA(sequence_fasta_file_path='sample_data/NC_003198.1.fasta',model_name='custom_model.json')