Previous Topic Back Forward Next Topic
Print Page Dr. Frank Dieterle
Ph. D. ThesisPh. D. Thesis 2. Theory – Fundamentals of the Multivariate Data Analysis 2. Theory – Fundamentals of the Multivariate Data Analysis 2.3. Data Preprocessing2.3. Data Preprocessing
About Me
Ph. D. Thesis
  Table of Contents
  1. Introduction
  2. Theory – Fundamentals of the Multivariate Data Analysis
    2.1. Overview of the Multivariate Quantitative Data Analysis
    2.2. Experimental Design
    2.3. Data Preprocessing
    2.4. Data Splitting and Validation
    2.5. Calibration of Linear Relationships
    2.6. Calibration of Nonlinear Relationships
    2.7. Neural Networks – Universal Calibration Tools
    2.8. Too Much Information Deteriorates Calibration
    2.9. Measures of Error and Validation
  3. Theory – Quantification of the Refrigerants R22 and R134a: Part I
  4. Experiments, Setups and Data Sets
  5. Results – Kinetic Measurements
  6. Results – Multivariate Calibrations
  7. Results – Genetic Algorithm Framework
  8. Results – Growing Neural Network Framework
  9. Results – All Data Sets
  10. Results – Various Aspects of the Frameworks and Measurements
  11. Summary and Outlook
  12. References
  13. Acknowledgements
Research Tutorials
Site Map
Print this Page Print this Page

2.3.   Data Preprocessing

Data preprocessing can be used for systematically modifying the raw signals of the device with the hope that the altered signals provide more useful input to the calibration method. Unfortunately, no general guidelines exist to determine the appropriate data preprocessing technique and thus the different preprocessing techniques are controversially discussed in literature [7],[8].

In this work, the input variables are preprocessed by autoscaling according to:


With  as response of the ith sample at the jth variable, as the mean of the jth variable and  as the standard deviation of the jth variable. Autoscaling involves a mean-centering of the data and a division by the standard deviation of all responses of a particular input variable resulting in a mean of zero and a unit standard deviation of each variable. For some calibration methods autoscaling can improve the calibration as autoscaling allows all variables to influence equally the calibration especially if different variables show different magnitudes of variation.

The dependent variables were range-scaled between -0.9 to 0.9, which is essential for calibration by neural networks with hyperbolic tangent activation functions, according to:


For the calculation of the prediction errors and the true-predicted plots, the range-scaling was reversed.

Page 14 © Dr. Frank Dieterle, 14.08.2006 Navigation