Previous Topic Back Forward Next Topic
Print Page Dr. Frank Dieterle
Ph. D. ThesisPh. D. Thesis 2. Theory  Fundamentals of the Multivariate Data Analysis 2. Theory Fundamentals of the Multivariate Data Analysis 2.4. Data Splitting and Validation2.4. Data Splitting and Validation 2.4.6. Conclusions2.4.6. Conclusions
About Me
Ph. D. Thesis
  Table of Contents
  1. Introduction
  2. Theory Fundamentals of the Multivariate Data Analysis
    2.1. Overview of the Multivariate Quantitative Data Analysis
    2.2. Experimental Design
    2.3. Data Preprocessing
    2.4. Data Splitting and Validation
      2.4.1. Crossvalidation
      2.4.2. Bootstrapping
      2.4.3. Random Subsampling
      2.4.4. Kennard Stones
      2.4.5. Kohonen Neural Networks
      2.4.6. Conclusions
    2.5. Calibration of Linear Relationships
    2.6. Calibration of Nonlinear Relationships
    2.7. Neural Networks Universal Calibration Tools
    2.8. Too Much Information Deteriorates Calibration
    2.9. Measures of Error and Validation
  3. Theory Quantification of the Refrigerants R22 and R134a: Part I
  4. Experiments, Setups and Data Sets
  5. Results Kinetic Measurements
  6. Results Multivariate Calibrations
  7. Results Genetic Algorithm Framework
  8. Results Growing Neural Network Framework
  9. Results All Data Sets
  10. Results Various Aspects of the Frameworks and Measurements
  11. Summary and Outlook
  12. References
  13. Acknowledgements
Research Tutorials
Site Map
Print this Page Print this Page

2.4.6.   Conclusions

When comparing the advantages and disadvantages of the different subsampling algorithms bootstrapping and random subsampling are most suited for splitting the data into calibration, test and validation subsets. As the user definable ratio between the sizes of the different subset allows a high flexibility, the random subsampling procedure was used to split the data into calibration, test and monitor data sets in this work, whereas for most data sets a static external validation set was recorded and used. The monitor set for the early-stopping procedure of the neural networks (see section 2.7.3) was generated by a modified full crossvalidation procedure, which speeds up learning and which is described in detail in [28].

Besides of the averaging effect of the subsampling procedure, the comparison of the standard deviations between the predictions of the test data of the different subsets additionally allows an estimation of the robustness of the calibration method. A high standard deviation is an indication of the calibration being subject to the random partitioning of the data. If the quality of the calibration and prediction significantly depends on the perturbation of the data sub sets, the calibration method is not very robust.

Page 21 © Dr. Frank Dieterle, 14.08.2006 Navigation