Previous Topic Back Forward Next Topic
Print Page Dr. Frank Dieterle
 
Ph. D. ThesisPh. D. Thesis 6. Results – Multivariate Calibrations6. Results – Multivariate Calibrations 6.6. Model Trees6.6. Model Trees
Home
News
About Me
Ph. D. Thesis
  Abstract
  Table of Contents
  1. Introduction
  2. Theory – Fundamentals of the Multivariate Data Analysis
  3. Theory – Quantification of the Refrigerants R22 and R134a: Part I
  4. Experiments, Setups and Data Sets
  5. Results – Kinetic Measurements
  6. Results – Multivariate Calibrations
    6.1. PLS Calibration
    6.2. Box-Cox Transformation + PLS
    6.3. INLR
    6.4. QPLS
    6.5. CART
    6.6. Model Trees
    6.7. MARS
    6.8. Neural Networks
    6.9. PCA-NN
    6.10. Neural Networks and Pruning
    6.11. Conclusions
  7. Results – Genetic Algorithm Framework
  8. Results – Growing Neural Network Framework
  9. Results – All Data Sets
  10. Results – Various Aspects of the Frameworks and Measurements
  11. Summary and Outlook
  12. References
  13. Acknowledgements
Publications
Research Tutorials
Links
Contact
Search
Site Map
Guestbook
Print this Page Print this Page

6.6.   Model Trees

The model trees are very similar to the CART principle and are often applied in the field of economic research [9],[251]. Yet, each leaf contains a local linear regression model instead of a single discrete value for the samples passed to this leaf. Similar to CART an oversized tree is built in a first step. Thereby the optimal criterion for the splitting of a node is the minimi­zation of the 2 standard deviations of the response variables of the samples assigned to the 2 child nodes. In the second step, a pruning of the subtrees is performed. Similar to the CART procedure, the nodes and leaves are pruned, which increase the error of the calibration data less than a specified "size corrected" value. For the calibration data of the refrigerant data set, a tree with 33 nodes and 35 leaves was built for R22 and a tree with 29 nodes and 32 leaves was built for R134a. Both, the predictions of the validation data with relative RMSE of 7.19% for R22 and 7.59 % for R134a and the predictions of the validation data with relative RMSE for R22 of 10.29% and 11.20% for R134a were disappointing. In principle, the model trees should be superior to the regression trees as many local regression models are used instead of single discrete values. The true-predicted plots in figure 40 show that the predictions of the different concentration levels are rather inconsistent indicating differences of the quality of the various local linear regression models. This means that among the more than 30 local regression models per analyte not all models are calibrated well. The data set might be too limited in size to calibrate 30 linear regression models successfully with single local models spoilt by noise and outliers. Therefore, some local models are overfitted resulting in the significant increase of the prediction error of the validation data. In figure 40, no significant bias of the residuals can be detected in agreement with the statistical tests. The locally weighted regression (LWR) also uses the principle of many local linear regression models. In contrast to the model trees, which separate the sample space by a tree into local regions, the LWR generates local models at prediction time by weighting samples in the neighborhood more. As the principle of local regression models seems not to work for this highly correlated nonlinear refrigerant data set, LWR and other methods based on local model are not investigated any further.

 

figure 40:  True-predicted plots of the model trees for the validation data.

Page 77 © Dr. Frank Dieterle, 14.08.2006 Navigation