1 Intro
Besides Content-based filtering, Collaborative Filtering was another commonly used recommender algorithm.
Collaborative Filtering
Definition from Wikipedia
Collaborative filtering is based on the assumption that people who agreed in the past will agree in the future, and that they will like similar kinds of items as they liked in the past. The system generates recommendations using only information about rating profiles for different users or items. By locating peer users/items with a rating history similar to the current user or item, they generate recommendations using this neighborhood.
While Content-based filtering focuses on the similarity of choices itself, Collaborative filering relies on people who makes these choices.
2. Getting Started
I’ll use a dataset of restaurant from Kaggle UCI Restaurant
2.1 Loading data
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
import pprint
from collections import Counter
### Loading Data
records = pd.read_csv("datasets_2719_4497_rating_final.csv")
print(records)
userID placeID rating food_rating service_rating
0 U1077 135085 2 2 2
1 U1077 135038 2 2 1
2 U1077 132825 2 2 2
3 U1077 135060 1 2 2
4 U1068 135104 1 1 2
... ... ... ... ... ...
1156 U1043 132630 1 1 1
1157 U1011 132715 1 1 0
1158 U1068 132733 1 1 0
1159 U1068 132594 1 1 1
1160 U1068 132660 0 0 0
2.2 Pivot Table
Similar to conter vectorizing in content based filtering, we wanted to convert rows of records to a vector charactorizing a customer’s preferrence. There we created a pivot table representing each customer’s scoring of specific restaurant
pt = records.pivot(index='userID', columns = 'placeID', values= 'rating' ).fillna(0)
pt.fillna(0)
print(pt)
placeID 132560 132561 132564 132572 132583 132584 132594 ... 135085 135086 135088 135104 135106 135108 135109
userID ...
U1001 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
U1002 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 1.0 0.0 0.0 0.0 1.0 0.0 0.0
U1003 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
U1004 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 2.0 0.0 0.0
U1005 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
U1134 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 2.0 0.0 0.0 0.0 0.0 0.0 0.0
U1135 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
U1136 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
U1137 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 2.0 0.0 0.0 0.0 0.0 0.0 0.0
U1138 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2.3 Similar Customers
Let’s use the first user as a recommending target. The first step is to calculate similarity of other customers, and sorted by similarity score. The higher similarity score means they share more favorite restaurant.
target = 0 #first customer
customerID = pt.index.to_list()[0] # first customer' ID
score = zip(pt.index.to_list(), similarity[target,:].tolist()) #Simularity score
scoreSorted = sorted(score, key=lambda t: t[1], reverse = True)#Sorted
Let’s filter these perfect matched (>99%) simply they are too similar and there is nothing new. Selecting the top users(top 10 here)
# Filter the top 10 users that that has a >10% similarity and <99%(nothing new)
similarCU = [ s for s in scoreSorted if (s[1]<0.99 and 0.1<s[1])][:10]
similarCU_ID = [s[0] for s in similarCU]
pprint.pprint(similarCU)
[('U1036', 0.4173919355648411),
('U1054', 0.4173919355648411),
('U1092', 0.40406101782088427),
('U1116', 0.3970333335883721),
('U1055', 0.3954372976473721),
('U1071', 0.3940552031195504),
('U1104', 0.390094748802747),
('U1024', 0.38188130791298674),
('U1045', 0.3585685828003181),
('U1132', 0.35355339059327373)]
2.4 Making Recommendations
Now we have top customers sharing preferences. Next was to find out what’s popupar among them.
1) Selected restaurant receiving high scores from these users
2) Recommend ones our target customer hasn’t rated yet
match = records.loc[(records['rating']>=2) & (records['userID'].isin(similarCU_ID)),].placeID.tolist()
matchSorted = [s[0] for s in Counter(match).most_common()]
recommended = [id for id in matchSorted if id not in records.loc[records['userID'] == customerID]['placeID'].tolist()]
print(recommended[:3])
[135025, 132825, 135085]
3 Summary
Collaborative filtering is usually more practical in E-com considering analysis on description(content-based filtering) was resource consuming and inaccurate. Vectorizing purchase records was quick and simple.
Building collaborative recommender:
– 1. Converting records to decisions table
– 2. Calculate simlarity score.
– 3. Filtering similar people
– 4. Recommend popular choices among them