ml classification


Classification is the process of predicting the class of given data sets. Classes are called as targets/ labels / categories. Classification predictive modeling is the task of approximating a mapping function (f) from input variables (X) to discrete output variables (y).

A classification problem is when the output variable is a category, such as "red"/ "blue" / green or "disease" / "no disease". A classification model attempts to draw some conclusion from observed values. Given one or more inputs a classification model will try to predict the value of one or more outcomes. For example, when filtering emails "spam" or "not spam", when looking at transaction data, "fraudulent", or "authorized". In short Classification either predicts categorical class labels or classifies data (construct a model) based on the training set and the values (class labels) in classifying attributes and uses it in classifying new data.

Classification Use Cases

  • To find whether an email received is a spam or ham
  • To identify customer segments
  • To find if a bank loan is granted
  • To identify if a kid will pass or fail in an examination

Examples of Regression and Classification

  • Predicting the gender of a person by his/her handwriting style - Classification
  • Predicting house price based on area - Regression
  • Predicting whether monsoon will be normal next year - Classification
  • Predict the number of copies a music album will be sold next month - Regression

There are two types of learners in classification as lazy learners and eager learners.

1. Lazy learners

Lazy learners simply store the training data and wait until a testing data appear. When it does, classification is conducted based on the most related data in the stored training data. Compared to eager learners, lazy learners have less training time but more time in predicting.
Ex. k-nearest neighbor, Case-based reasoning

2. Eager learners

Eager learners construct a classification model based on the given training set before receiving data for classification. It must be able to commit to a single hypothesis that covers the entire instance space. Due to the model construction, eager learners take a long time for train and less time to predict.
Ex. Decision Tree, Naive Bayes, Artificial Neural Networks