2.3. Data Preprocessing (Dr. Frank Dieterle)

Frank Dieterle

Ph. D. Thesis

2. Theory – Fundamentals of the Multivariate Data Analysis

2.3. Data Preprocessing

Home
News
About Me
Ph. D. Thesis
	Abstract
	Table of Contents
	1. Introduction
	2. Theory – Fundamentals of the Multivariate Data Analysis
		2.1. Overview of the Multivariate Quantitative Data Analysis
		2.2. Experimental Design
		2.3. Data Preprocessing
		2.4. Data Splitting and Validation
		2.5. Calibration of Linear Relationships
		2.6. Calibration of Nonlinear Relationships
		2.7. Neural Networks – Universal Calibration Tools
		2.8. Too Much Information Deteriorates Calibration
		2.9. Measures of Error and Validation
	3. Theory – Quantification of the Refrigerants R22 and R134a: Part I
	4. Experiments, Setups and Data Sets
	5. Results – Kinetic Measurements
	6. Results – Multivariate Calibrations
	7. Results – Genetic Algorithm Framework
	8. Results – Growing Neural Network Framework
	9. Results – All Data Sets
	10. Results – Various Aspects of the Frameworks and Measurements
	11. Summary and Outlook
	12. References
	13. Acknowledgements
Publications
Research Tutorials
Downloads and Links
Contact
Search
Site Map
Print this Page

2.3. Data Preprocessing

Data preprocessing can be used for systematically modifying the raw signals of the device with the hope that the altered signals provide more useful input to the calibration method. Unfortunately, no general guidelines exist to determine the appropriate data preprocessing technique and thus the different preprocessing techniques are controversially discussed in literature [7],[8].

In this work, the input variables are preprocessed by autoscaling according to:

(2)

With as response of the i^th sample at the j^th variable, as the mean of the j^th variable and as the standard deviation of the j^th variable. Autoscaling involves a mean-centering of the data and a division by the standard deviation of all responses of a particular input variable resulting in a mean of zero and a unit standard deviation of each variable. For some calibration methods autoscaling can improve the calibration as autoscaling allows all variables to influence equally the calibration especially if different variables show different magnitudes of variation.

The dependent variables were range-scaled between -0.9 to 0.9, which is essential for calibration by neural networks with hyperbolic tangent activation functions, according to:

(3)

For the calculation of the prediction errors and the true-predicted plots, the range-scaling was reversed.

Page 32