Fr 13 Januar 2017 — under Bayes, Python, Machine Learning, Pandas
Naive Bayes is one of the simplest classification machine learning algorithm. As the name suggests its based on the Bayes theorem.
Doing my thesis using Probabilistic Programming I always had read about many models and how it compared with Naive Bayes classifier. Even though of the simplicity Naive bayes is a pretty solid and basic classifier every machine learning student should know. But I never had an opportunity to fully understand this simple tool mainly because I used it like a blackbox using many implementations available, the most famous being from scikit learn.
I got inspired by Sebastian Raschka about implementing machine learning algorithms from scarch. I completely agree to the argument that it just improves your learning. So here I start with the simple implementation of the Naive Bayes and later using the Probabilistic Programming prespective.
The above formula can be reinterpreted in the machine learning terms of features and class label. Classification is a problem of assigning classes based on the features provided.
$$ P(class | features ) = \frac{P(features | class) P(class)}{P(features)}$$For example we need to classify a person's sex based on the height and weight. So here the class={male,female} and features={height,weight}, and the formula can he re-written as :
$$ P(sex | height, weight ) = \frac{P(height, weight | sex) P(sex)}{P(height,weight)}$$Based on these information lets go ahead and implement the algorithm on a discretized problem using the vectorization properties of the problem. To understand what is vectorization of problem please read From Python to Numpy.
# Dataset
import pandas as pd
data = pd.read_csv ('./naive_bayes_dataset.csv')
print (data)
The Naive part of the Naive Bayes algorithm is the following assumptiong that the features are mutually independent(which is rarely true). But it allows us to do this simplfyied mathematics .
$$ P(Age, Income, Student, Credit rating | Buys computer) = P(Age | Buys computer) * P(Income | Buys computer) * P(Student | Buys computer) * P(Credit rating | Buys computer)$$prior = P(Buys computer)
P(Buys computer ) = How many times (yes/no) appears / Total observations
P(Buys computer = Yes) P(Buys computer = No)
We use the groupby function from pandas
prior = data.groupby('Buys_Computer').size().div(len(data)) #count()['Age']/len(data)
print prior
Likelihood is generated for each of the features of the dataset. Basicallay likelihood is probability of finding each feature given class label.
$$ P(Age | Buys computer) $$$$ P(Income | Buys computer) $$$$ P(Student | Buys computer) $$$$ P(Credit rating | Buys computer) $$likelihood = {}
likelihood['Credit_Rating'] = data.groupby(['Buys_Computer', 'Credit_Rating']).size().div(len(data)).div(prior)
likelihood['Age'] = data.groupby(['Buys_Computer', 'Age']).size().div(len(data)).div(prior)
likelihood['Income'] = data.groupby(['Buys_Computer', 'Income']).size().div(len(data)).div(prior)
likelihood['Student'] = data.groupby(['Buys_Computer', 'Student']).size().div(len(data)).div(prior)
print (likelihood)
We need to if a person wil buy computer based on the following new information
{"Age":'<=30', "Income":"medium", "Student":'yes' , "Credit_Rating":'fair'}
Substituing the values in the likehood data and in the bayes formula, we get
# Probability that the person will buy
p_yes = likelihood['Age']['yes']['<=30'] * likelihood['Income']['yes']['medium'] * \
likelihood['Student']['yes']['yes'] * likelihood['Credit_Rating']['yes']['fair'] \
* prior['yes']
# Probability that the person will NOT buy
p_no = likelihood['Age']['no']['<=30'] * likelihood['Income']['no']['medium'] * \
likelihood['Student']['no']['yes'] * likelihood['Credit_Rating']['no']['fair'] \
* prior['no']
print ('Yes : ', p_yes)
print ('No : ', p_no)
As we can see there is a higher probability that the person will buy the computer.
Note : We dont need to calculate the denominator of bayes as in the end we need to do comparison between the different probabilities so dividing by same number dsnt change the comparison.
We try to solve the same problem using Naive Bayes calssifier implemented in the sklearn library
from sklearn.preprocessing import LabelEncoder
encoded_data = data.apply(LabelEncoder().fit_transform)
from sklearn.naive_bayes import MultinomialNB
import numpy as np
clf = MultinomialNB()
clf.fit(encoded_data.drop(['Buys_Computer'], axis=1), encoded_data['Buys_Computer'])
# {"Age":'<=30', "Income":"medium", "Student":'yes' , "Credit_Rating":'fair'}
# The data is encoded as [1,2,1,1]
X = np.array([1,2,1,1])
print (clf._joint_log_likelihood(X.reshape(1,-1)))
print ("Prediction of : ", clf.predict(X.reshape(1,-1)))
Thus even with Sklearn the answer is YES . In sklearn the log_likelihood is being used rather than the likelihood.
The beauty of the Naive Bayes for the discretized features set is that it just involves counting and multiplication to get the answer. The algorithm can be extended to continous feature variables. For continuous feature variables we need to decide which probability distributions to use and their likelihoods.
All content copyright 2014-2016 Deebul Nair unless otherwise noted. Licensed under Creative Commons.