DIGEP-Pred: Prediction of drug-induced changes of gene expression profile
  • Home
  • Training Sets
  • Products/Services
  • Interpretation
  • Contacts
Protein data
mRNA data
MCF7
VCAP_6h
VCAP_24h


* Leave-one-out cross-validation (LOO CV) procedure is performed using the whole PASS training set for validation of prediction quality. The prediction result is compared with known experimental data for the studied compound. The procedure is repeated for all compounds from the PASS training set; then the average Invariant Accuracy of Prediction (IAP=1-IEP) values are calculated for each biological activity and for all biological activites.
IAP equals numerically to ROC AUC


Training sets were created on the basis of the data on the drug-induced changes in mRNA expression and protein concentration that were represented in Comparative Toxicogenomics Database. They include the structures of single electroneutral organic molecules with molecular weight of 50 - 1250 Da and the data on drug-induced changes of human-specific gene expression.

mRNA-based training set consists of 1756 compounds and allows predicting drug-induced changes of gene expression for 1802 genes (1069 up- and 733 downregulations). The average accuracy calculated by leave-one-out cross-validation procedure (ROC AUC) is 0.853.

Protein-based training set consists of 1736 compounds and allows predicting drug-induced changes of gene expression for 123 genes (78 up- and 45 downregulations). The average accuracy calculated by leave-one-out cross-validation procedure (ROC AUC) is 0.89.

MCF7-based training set consists of 1024 compounds and allows predicting drug-induced changes of gene expression for 3900 genes (1769 up- and 2131 downregulations). The average accuracy calculated by leave-one-out cross-validation procedure (ROC AUC) is 0.89.

VCAP_6-based training set consists of 6614 compounds and allows predicting drug-induced changes of gene expression for 16124 genes (10687 up- and 5437 downregulations). The average accuracy calculated by leave-one-out cross-validation procedure (ROC AUC) is 0.80.

VCAP_24-based training set consists of 6534 compounds and allows predicting drug-induced changes of gene expression for 9716 genes (6078 up- and 3638 downregulations). The average accuracy calculated by leave-one-out cross-validation procedure (ROC AUC) is 0.78.

Curated chemical–gene interactions data in the training sets were retrieved from the Comparative Toxicogenomics Database (CTD), Mount Desert Island Biological Laboratory, Salisbury Cove, Maine. http://ctdbase.org/ [October, 2015].
Lagunin A., Ivanov S., Rudik A., Filimonov D., Poroikov V. DIGEP-Pred: web-service for in silico prediction of drug-induced gene expression profiles based on structural formula. Bioinformatics, 2013, 29, 2062-2063. [Abstract, PDF]