Previous Topic Back Forward Next Topic
Print Page Dr. Frank Dieterle
Ph. D. ThesisPh. D. Thesis 2. Theory  Fundamentals of the Multivariate Data Analysis 2. Theory Fundamentals of the Multivariate Data Analysis 2.4. Data Splitting and Validation2.4. Data Splitting and Validation 2.4.1. Crossvalidation2.4.1. Crossvalidation
About Me
Ph. D. Thesis
  Table of Contents
  1. Introduction
  2. Theory Fundamentals of the Multivariate Data Analysis
    2.1. Overview of the Multivariate Quantitative Data Analysis
    2.2. Experimental Design
    2.3. Data Preprocessing
    2.4. Data Splitting and Validation
      2.4.1. Crossvalidation
      2.4.2. Bootstrapping
      2.4.3. Random Subsampling
      2.4.4. Kennard Stones
      2.4.5. Kohonen Neural Networks
      2.4.6. Conclusions
    2.5. Calibration of Linear Relationships
    2.6. Calibration of Nonlinear Relationships
    2.7. Neural Networks Universal Calibration Tools
    2.8. Too Much Information Deteriorates Calibration
    2.9. Measures of Error and Validation
  3. Theory Quantification of the Refrigerants R22 and R134a: Part I
  4. Experiments, Setups and Data Sets
  5. Results Kinetic Measurements
  6. Results Multivariate Calibrations
  7. Results Genetic Algorithm Framework
  8. Results Growing Neural Network Framework
  9. Results All Data Sets
  10. Results Various Aspects of the Frameworks and Measurements
  11. Summary and Outlook
  12. References
  13. Acknowledgements
Research Tutorials
Site Map
Print this Page Print this Page

2.4.1.   Crossvalidation

The most popular subsampling technique is crossvalidation. For an n-fold crossvalidation, the data are partitioned into n equal parts. The first part is used as test data set; the rest is used as calibration data set. Then, the second part is used for the test data and the rest is used for a new calibration. This procedure is repeated n times and the predictions of the n test data are averaged. It is essential that no knowledge of the models is transferred from fold to fold. There exist no clear rules how many folds to use for the crossvalidation, whereby the simplest and clearest way of performing crossvalidation is to leave one sample out at a time. This special variant of crossvalidation is also called full crossvalidation, leave-one-out or jackknifing and gives a unique and therefore reproducible result. Yet, it has been shown that increasing the number of crossvalidation groups results in lower root mean square errors of predictions giving overly optimistic estimations of predictivity [13]-[16]. This deficiency is known as asymptotically inconsistency in literature [17].

Page 16 © Dr. Frank Dieterle, 14.08.2006 Navigation