6.10. Neural Networks and Pruning (Dr. Frank Dieterle)

Frank Dieterle

Ph. D. Thesis

6. Results – Multivariate Calibrations

6.10. Neural Networks and Pruning

Home
News
About Me
Ph. D. Thesis
	Abstract
	Table of Contents
	1. Introduction
	2. Theory – Fundamentals of the Multivariate Data Analysis
	3. Theory – Quantification of the Refrigerants R22 and R134a: Part I
	4. Experiments, Setups and Data Sets
	5. Results – Kinetic Measurements
	6. Results – Multivariate Calibrations
		6.1. PLS Calibration
		6.2. Box-Cox Transformation + PLS
		6.3. INLR
		6.4. QPLS
		6.5. CART
		6.6. Model Trees
		6.7. MARS
		6.8. Neural Networks
		6.9. PCA-NN
		6.10. Neural Networks and Pruning
		6.11. Conclusions
	7. Results – Genetic Algorithm Framework
	8. Results – Growing Neural Network Framework
	9. Results – All Data Sets
	10. Results – Various Aspects of the Frameworks and Measurements
	11. Summary and Outlook
	12. References
	13. Acknowledgements
Publications
Research Tutorials
Downloads and Links
Contact
Search
Site Map
Print this Page

6.10. Neural Networks and Pruning

For the pruning of neural networks, which is described in section 2.8.8 in detail, separate neural networks for both analytes were trained using the calibration data set. The networks were fully connected with 8 hidden neurons serving as reference networks for the pruning algorithms. Then, the two pruning algorithms Magnitude Based Pruning and Optimal Brain Surgeon were used to remove network links until the estimated increase of the error for the calibration data reached 2%. After that, the networks were retrained. This procedure was repeated 3 times in total. Finally, the calibration data and the external validation data were predicted. For both pruning algorithms, 50 networks were trained and optimized by this procedure using different initial random weights.

Magnitude Based Pruning
For R22, the network with the smallest crossvalidated calibration error consisted of 20 input neurons, 2 hidden neurons and 25 links. This network predicted the calibration data with a rel. RMSE of 2.34% and the validation data with a rel. RMSE of 2.48% (see table 2). For R134a the network with the smallest crossvalidated calibration error consisted of 33 input neurons, 3 hidden neurons and 64 links. The predictions by this network showed relative errors of 3.16% for the calibration data and 3.34% for the validation data. Compared with the fully connected neural networks, the number of adjustable parameters (27 respectively 67) were dramatically reduced resulting in a smaller gap between the prediction errors of the calibration data and the prediction errors of the validation error. Yet, the predictions of the validation data are worse than the predictions of the fully connected neural networks rendering this approach to improve the generalization ability of neural networks useless.

Optimal Brain Surgeon
For R22, the network with the smallest crossvalidated calibration error consisted of 25 input neurons, 3 hidden units and 37 links. This network predicted the calibration data with a rel. RMSE 2.10% and the validation data with a rel. RMSE of 2.12% (see table 2). For R134a the network with the smallest crossvalidated calibration error consisted of 17 input neurons, 4 hidden neurons and 24 links. The predictions by this network showed relative errors of 3.22% for the calibration data and 3.32% for the validation data. The low number of adjustable parameters (40 respectively 26) successfully helped to prevent an overfitting with practically no gap between the predictions of the calibration and validation data visible. Compared with the fully connected neural networks the predictions of the validation data are slightly better for R22 and slightly worse for R134a. This demonstrates the possibility of modeling the relationship between the concentrations of the analytes and the time-resolved sensor responses using by far less adjustable parameters. It is also visible that the sophisticated OBS algorithm performs better than the simple MP approach.

Summary
The predictions of both pruning algorithms did not show unmodeled nonlinearities and the true-predicted plots were similar to the true-predicted plots of the fully connected networks (see figure 43). The most severe drawback of both pruning algorithms is the instability of the algorithms resulting in a totally different network topology for each run with different initial weights. For example, the 50 networks created by OBS for R22 used 7 to 27 input neurons, 1 to 4 hidden neurons and 8 to 40 links and showed prediction errors of the external validation data between 2.12% and 3.38%. The 50 networks for R134a used 8 to 36 input neurons, 2 to 6 hidden neurons and 12 to 49 links with no repeated topology. The predictions of the validation data varied between 3.32% and 5.48%. The variation of the networks created by the MP algorithm was even worse. Although the pruning algorithms demonstrated that significantly sparser network topologies are enough for modeling the relationships between the time-resolved sensor responses and the concentrations of the analytes, the high variations of the network topologies and of the qualities of prediction render the pruning approach useless for an easy reproducible application.

Page 99