February 25, 2020 at 7:56 am

Harrington Designs Enhanced Zippy Restricted Boltzmann Machine for Smaller Datasets

Dr. Peter de B. Harrington

Dr. Peter de B. Harrington authored an article on “Enhanced zippy restricted Boltzmann machine for feature expansion and improved classification of analytical data” in the Journal of Chemometrics.

“Restricted Boltzmann machines are a component of deep learning, and many people think that they are limited to “big data” for which the number of objects is large compared with the number of measurements or variables by factor of 1 million or greater. However, RBMs can work quite well for smaller dataset sizes that are typically encountered by analytical chemists,” says Harrington, Professor of Chemistry & Biochemistry at Ohio University.

“The goal of our research is to develop algorithms that are robust and easy to use,” Harrington says.

He devised the enhanced zippy RBM (EZRBM) as a “general‐purpose RBM that can accept real‐valued inputs with good convergence properties and robust behavior with respect to tuning the parameters.”

Harrington wrote the code in MathWorks MATLAB 2018a and performed his calculation on a home‐built PC with an Intel processor. The operating system was Microsoft Windows 10 Enterprise.

Abstract: Restricted Boltzmann machines (RBMs) are components found in many deep learning algorithms. RBMs originally were designed for binary image data. Some advances in RBM algorithms have been made so that they may accept real‐valued inputs that are typical for analytical chemistry measurements. However, these algorithms are difficult to train and require fine‐tuning of the parameters. The RBM algorithm was modified to furnish the enhanced zippy RBM (EZRBM) that trains reliably and robustly with respect to the parameters. In addition, feature augmentation (ie, fusing the RBM linear inputs and nonlinear outputs) improves the classification rate while reducing the dependence of the RBM training parameters. Two different classifiers were used, the support vector classifier (SVC) and super partial least squares–discriminant analysis (sPLS‐DA), to evaluate the performance. Classifiers built from the EZRBM outputs performed better than those built from the continuous RBM and the Gaussian RBM (GRBM) outputs when validated using 100 bootstraps with two Latin partitions. Three datasets were used. The first was an overdetermined set of eight fatty acid concentrations for 572 olive oils from nine regions of Italy. The second was 75 UV spectra of 15 Cannabis extracts with 101 measurements made from 200 to 400 nm. The third set comprised 60 proton nuclear magnetic resonance (NMR) spectra of 12 tea extracts that had 1000 chemical shift measurements from 0.5 to 7.0 ppm. In every evaluation, the augmented EZRBM had better classification performance than the classifiers without the RBM. The classifiers built with EZRBM outperformed the other RBM algorithms except for a single instance. Recently, RBMs have been considered a transform into a dual feature space.