|
Abstract:
|
Active learning techniques aim to reduce the amount of labeled data required for a supervised learner to achieve a certain level of performance . This can be very useful in domains where unlabeled data is easy to obtain but labelling data is costly . In this dissertation , I introduce methods of creating computationally efficient active learning techniques that handle different misclassification costs , different evaluation metrics , and different label acquisition costs . This is accomplished in part by developing techniques from utility -based data mining typically not studied in conjunction with active learning . I first address supervised learning problems where labeled data may be scarce , especially for one particular class . I revisit claims about resampling , a particularly popular approach to handling imbalanced data , and cost -sensitive learning . The presented research shows that while resampling and cost -sensitive learning can be equivalent in some cases , the two approaches are not identical . This work on resampling and cost -sensitive learning motivates a need for active learners that can handle different misclassification costs . After presenting a cost -sensitive active learning algorithm , I show that this algorithm can be combined with a proposed framework for analyzing evaluation metrics in order to create an active learning approach that can optimize any evaluation metric that can be expressed as a function of terms in a confusion matrix . Finally , I address methods for active learning in terms of different utility costs incurred when labeling different types of points , particularly when label acquisition costs are spatially driven . |