Supplementary data and code
This page contains supplementary data for the paper
Renqiang Min, Anthony Bonner, and Zhaolei Zhang. "Improved Kernels Using Label Information for Protein Classification."The summary of the results can be found in the paper above. The supplementary data is as follows:
- ROC-50 scores for all families and all detection methods from the paper in plain text format.
- Plain text table specifying the positive and negative training and test sets for each family. Each row is one sequence, and each column is one family. (0 = not present; 1 = positive train; 2 = negative train; 3 = positive test; 4 = negative test).
- Summary of data splits giving the number of positive and negative training and test set examples and amount of unlabeled data for each family.
- Names of the SCOP families.
- 7329x7329 Kernel matrices for methods used in the experiments: (here are the IDs by row or column)
- Mismatch Kernel , m=5, k=1, ascii text file, gzipped (76 MB).
- Profile Kernel , m=5, delta=7.5, ascii text file, gzipped (86 MB).
- precomputed homologs (PSI-BLAST)
- input sequence
- The Spider software used in the experiments, a Matlab-based library of machine learning tools.
- Matlab code to run the semi-supervised experiments (using the Spider software.)
- Results in .mat files