Revised-Ghafourian-v11-Danielle-Impact.pdf (668.24 kB)
The impact of training set data distributions for modelling of passive intestinal absorption
journal contribution
posted on 2023-06-09, 03:25 authored by Taravat Ghafourian, Alex A Freitas, Danielle NewbyThis study presents regression and classification models to predict human intestinal absorption of 645 drug and drug like compounds using percentage human intestinal values from the published dataset by Hou et al. (2007c). The problem with this dataset and other datasets in the literature is there are more highly than poorly absorbed compounds. Any models developed using these datasets will be biased towards highly absorbed compounds and not applicable for use in industry where now more compounds are likely to be poorly absorbed. The study compared two training sets, TS1, a balanced (50:50) distribution of highly and poorly absorbed compounds created by under-sampling the majority high absorption compounds, with TS2, a randomly selected training set with biased distribution towards highly absorbed compounds. The regression results indicate that the best models were those developed using the balanced dataset (TS1). Also for classification, TS1 led to the most accurate models and the highest specificity value of 0.949. In comparison, TS2 led to the highest sensitivity with a value of 0.939. Thus, under-sampling the majority class of the highly absorbed compounds leads to a balanced training set (TS1) that can achieve more applicable in silico regression and classification models for the use in the industry. © 2012 Elsevier B.V. All rights reserved.
History
Publication status
- Published
File Version
- Accepted version
Journal
International Journal of PharmaceuticsISSN
0378-5173Publisher
ElsevierExternal DOI
Issue
1-2Volume
436Page range
711-720Department affiliated with
- Biochemistry Publications
Full text available
- Yes
Peer reviewed?
- Yes
Legacy Posted Date
2017-12-01First Open Access (FOA) Date
2017-12-01First Compliant Deposit (FCD) Date
2017-12-01Usage metrics
Categories
No categories selectedKeywords
article; comparative study; drug absorption; drug distribution; drug industry; human; intestine absorption; medical literature; priority journal; regression analysisHumans; Intestinal Absorption; ModelsBiological; ModelsStatistical; Pharmaceutical Preparations; Regression Analysis; Reproducibility of Results
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC