Ensemble Systems

In matters of great importance that has financial, medical, social, or other implications, we often seek a second opinion before making a decision, sometimes a third, and sometimes many more. In doing so, we somehow weigh the individual opinions, and combine them through some thought process to reach a final decision that is presumably the most informed one. The process of consulting “several experts” before making a final decision – what may be second nature to us – has recently been rediscovered by computational intelligence community for automated decision making applications, and it has emerged as a popular and heavily researched area. Also known under various other names, such as multiple classifier systems, committee of classifiers, or mixture of experts, ensemble systems have shown to produce favorable results compared to those of single expert systems for a broad range of applications and under a variety of scenarios.

 

Now Available: For a review of ensemble systems, please see the new review / tutorial papers published by IEEE Circuits and Systems Magazine and IEEE Signal Processing Magazine.

 

· Polikar R., “Ensemble Based Systems in Decision Making,”
IEEE Circuits and Systems Magazine, vol.6, no. 3, pp. 21-45, 2006

· Polikar R., “Bootstrap Inspired Techniques in Computational Intelligence,”
IEEE Signal Processing Magazine, vol.24, no. 4, pp. 56-72, 2007

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Also see the proceedings of the International Workshop on Multiple Classifier Systems:

 

· Oza N., Polikar R., Kittler J., Roli F., Editors, Proc. of the 5th Int. Workshop on
Multiple Classifier Systems, Lecture Notes in Computer Science, vol. 3541,
Berlin, Germany: Springer, 2005.

 

 

==========================================================

For a complete list of our works please see List of Publications

==========================================================

 

At SPPRL we are looking at various applications and novel uses of ensemble systems, such as:

Incremental Learning

One of the biggest frustrations that many researchers working on classifiers face is that most classifiers cannot be “further trained” with new data without forgetting what has been learned earlier. This is particularly true for many of the most common neural network schemes, such as multilayer perceptron, radial basis function network, etc. Learning additional information provided by new data requires discarding the old network, combining old and new data and re-training from scratch. This solution not only forgets what has been learned earlier (also known as catastrophic forgetting), but it is useless if the original data is no longer available. The purpose of our NSF funded work is to develop an algorithm that will allow any classification algorithm learn incrementally from new data in the absence of old data. The idea is in part inspired by Shapire’s boosting algorithm which was originally developed for improving the accuracy of weak learning algorithms. The new algorithm, called Learn++, uses an ensemble of classifiers instead of a single classifier to learn incrementally, and it has shown to be effective in incremental learning, even when additional data introduce new classes. Scroll down  to see a figure for algorithm pseudocode and block diagram.

 

The following papers describe the Learn++ algorithm, and its recent variations.

 

1. Syed-Mohammed H., Leander J., Marbach M., and Polikar R., “Can AdaBoost.M1 learn incrementally? A comparison to Learn++ under different combination rules,” Int. Conf. on Artificial Neural Networks (ICANN2006),  Lecture Notes in Computer Science (LNCS) , vol. 4131, pp. 254-263, Athens, Greece. Berlin: Springer, 2006.

2. Erdem Z., Polikar R., Gurgen F., Yumusak N., “Ensemble of SVM Classifiers for Incremental Learning,” 6th Int. Workshop on Multiple Classifier Systems (MCS 2005), Springer Lecture Notes in Computer Science (LNCS), vol. 3541, pp. 246-255, Seaside, CA, June 2005.

3. Gangardiwala A. and Polikar R. , “Dynamically weighted majority voting for incremental learning and comparison of three boosting based approaches,” Proc. of Int. Joint Conf. on Neural Networks (IJCNN 2005), pp. 1131-1136, Montreal, QB, Canada, July y2005.

4. Muhlbaier M., Topalis A., Polikar R., “Incremental learning from unbalanced data,” Proc. of Int. Joint Conference on Neural Networks (IJCNN 2004), vol. *, pp. 1057-1062, Budapest, Hungary, July 2004.

5. Muhlbaier M., Topalis A., Polikar R., “Learn++.MT: A new approach to incremental learning,” 5th Int. Workshop on Multiple Classifier Systems (MCS 2004), Springer LINS vol. 3077 , pp. 52-61, Cagliari, Italy, June 2004.

6. Polikar R., Udpa L., Udpa S., Honavar V., “ An incremental learning algorithm with confidence estimation for automated identification of NDE signals,” IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control, vol. 51, no. 8, pp. 990-1001, 2004.

7. Polikar R., Udpa L., Udpa, S., Honavar, V., “Learn++: An incremental learning algorithm for supervised neural networks,” IEEE Transactions on System, Man and Cybernetics (C), Special Issue on Knowledge Management, vol. 31, no. 4, pp. 497-508, 2001 (ORIGINAL PAPER)

8. Polikar R., Byorick J., Krause S., Marino A., Moreton M., “Learn++: A classifier independent incremental learning algorithm for supervised neural networks,” Proc. of Int. Joint Conference on Neural Networks (IJCNN 2002), vol. 2, pp. 1742-1747, Honolulu, HI, 12-17 May 2002.

9. Polikar R., Krause S., Burd L., “Dynamic weight update in weighted majority voting for Learn++,” Proc. of Int. Joint Conference on Neural Networks (IJCNN 2003), vol. 4, pp. 2770-2775, Portland, OR, 20-24 July 2003.

 

The pseudocode of the original Learn++ algorithm, its block diagram, and its .m file are also available to the community. 

 

Data Fusion

In many applications that call for automated decision making, it is not unusual to receive data obtained from different sources that may provide complementary information. A suitable combination of such information is usually referred to as data or information fusion, and can lead to improved accuracy and confidence of the classification decision compared to a decision based on any of the individual data sources alone. Consequently, both incremental learning and data fusion involve learning from different sets of data. If the consecutive datasets that later become available are obtained from different sources and/or consist of different features, the incremental learning problem turns into a data fusion problem. Therefore a suitable modification of Learn++ can be used for data fusion, where a new ensemble of classifiers are generated for each source that generates a different database. The following is a list of data fusion related papers on Learn++.

 

1. Parikh D. and Polikar R., “An Ensemble based incremental learning approach to data fusion, IEEE Transactions on Systems, Man and Cybernetics, vol. 37, no.2, pp. 437-500, 2007

2. Polikar R., Topalis A., Green D., Kounios J., Clark C.M., Comparative multiresolution analysis and ensemble of classifiers approach for early diagnosis of Alzheimer’s disease, Computers in Biology and Medicine, vol. 37, no. 4, pp. 542-558, 2007.

3. Polikar R., Topalis A., Green D., Kounios J., Clark C.M., Ensemble based data fusion for early diagnosis of Alzheimer’s disease, Information Fusion, accepted / in print for 2007.

4. Lewitt M. and Polikar R., “An ensemble approach for data fusion with Learn++,” 4th Int. Workshop on Multiple Classifier Systems (MCS 2003), Springer LINS vol. 2709 , pp. 176-185Surrey, England, June 11-13 2003.

5. Parikh D. and Polikar R. , “A Multiple Classifier Approach for Multisensor Data Fusion,” Proc. of  IEEE FUSION 2005, vol. 1, pp: 453-460,  Philadelphia, PA July 2005.

6. Parikh D., Kim M., Oagaro J., Mandayam S. and Polikar R., “Combining classifiers for multisensor data fusion,” Proc. of Int. IEEE Conf. on System Man Cybernetics (SMC 04), pp. 1232-1237, The Hague, The Netherlands, October 2004.

7. Parikh D., Kim M., Oagaro J., Mandayam S. and Polikar R., “Ensemble of classifiers approach for NDE data fusion,” Proc. of 2004 IEEE Int. Ultrasonics, Ferroelectrics and Frequency Control Joint Conf (UFFC2004), vol. 2, pp. 1062-1065, Montreal, Canada, August 2004.

 

Confidence Estimation

Ensemble based systems can also be used to estimate the confidence of a classifier’s decision on any given instance. Intuitively, given a group of classifiers that classify a given instance, if all –or most—classifiers agree on the decision, we can reasonably assume that the classification system has high confidence in its decision. On the other hand, if the decision is made by just a slight majority, then there is less confidence in the decision. This idea can be formalized  to estimate the confidence of a decision. In fact, we can use this approach to estimate the posterior probability of a class given a measurement vector. The following papers describe how Learn++ can be used for confidence estimation.

 

1. Muhlbaier M., Topalis A., Polikar R., “Ensemble confidence estimates posterior probability,” 6th Int. Workshop on Multiple Classifier Systems (MCS 2005), Springer Lecture Notes in Computer Science (LNCS), vol. 3541, pp. 326-335, Seaside, CA, June 2005.

2. Byorick J. and Polikar R., “Confidence estimation using incremental learning algorithm, Learn++,” Int. Conf. on Artificial Neural Networks (ICANN 2003), Springer LINS vol. 2714, pp. 181 – 188, Istanbul, Turkey, 26-29 June 2003.

 

Missing Feature

Many classification algorithms, including most popular neural network architectures, require that the number and nature of the features be set before the training. Since the underlying operation for most classifiers is a matrix multiplication, instances missing even a single feature cannot be processed by such classifiers, due to the missing number(s) in the vectors/matrices to be multiplied. Hence, the field or test data to be evaluated by the classifier must contain exactly the same set and number of features as the training data used to create the neural network to make a valid classification. Missing data in real world applications is not an uncommon occurrence, however. It is not unusual for training, validation or field data to have missing features in some (or even all) of their instances, as bad sensors, failed pixels, malfunctioning equipment, unexpected noise causing signal saturation, data corruption, etc. are all familiar scenarios in many practical applications. The missing feature problem can also be addressed by an ensemble based approach. Under the assumption that the feature set is redundant, we can train a sufficiently large number of classifiers, each with a random subset of the features, and instances with missing features are then classified by the majority voting of those classifiers whose training data did not include the missing features. We call the resulting algorithm Learn++.MF (for missing feature). The following papers describe the Learn++.MF algorithm.

 

1. DePasquale, J. and Polikar R., “Random feature subset selection for ensemble based classification of data with missing features,” 7th Int. Workshop on Multiple Classifier Systems, in Lecture Notes in Computer Science, vol. 4472, pp. 251-260, Springer, 2007.

2. H. Syed-Mohammed, N. Stepenosky and R. Polikar, “An Ensemble Technique to Handle Missing Data from Sensors,” IEEE Sensor Applications Symposium, Houston, TX, February 2006.

3. Krause S. and Polikar R., An ensemble of classifiers for the missing feature problem,” Proc. of Int. Joint Conference on Neural Networks (IJCNN 2003), vol. 1, pp. 553-558, Portland, OR, 20-24 July 2003.

 

Robi Polikar—Research

Ensemble Systems

&

Learn++