Leave-one-out cross-validation (LOO CV) procedure is performed using the whole PASS training set for validation of prediction quality. The prediction result is compared with known experimental data for the studied compound. The procedure is repeated for all compounds from the PASS training set; then the average Invariant Accuracy of Prediction (IAP=1-IEP) values are calculated for each biological activity and for all biological activites.
IAP equals numerically to ROC AUC
The training sets are based on the data for 1011 compounds tested in the standard two-year rodent carcinogenicity bioassay from Carcinogenic Potency Database (CPDB). SD file with CPDB data is available on EPA Distributed Structure-Searchable Toxicity (DSSTox) Public Database Network ( ftp://ftp.epa.gov/dsstoxftp/DSSTox_Archive_20150930/CPDBAS_DownloadFiles/). Small inorganic compounds (e.g. NO2), oils, paraffins and mixtures of compounds were excluded from CPDB data during creation of the training sets. Click on the links in the left to see the number of carcinogens and accuracy of prediction for each organ of appropriate species and sex. An organ-specific carcinogenicity related with species and sex of animals is considered as a particular type of biological activity describing the action of the compound.
Click to download the final training set.