1 Intro

Besides Content-based filtering, Collaborative Filtering was another commonly used recommender algorithm.

Collaborative Filtering

Collaborative filtering is based on the assumption that people who agreed in the past will agree in the future, and that they will like similar kinds of items as they liked in the past. The system generates recommendations using only information about rating profiles for different users or items. By locating peer users/items with a rating history similar to the current user or item, they generate recommendations using this neighborhood.

While Content-based filtering focuses on the similarity of choices itself, Collaborative filering relies on people who makes these choices.

2. Getting Started

I’ll use a dataset of restaurant from Kaggle UCI Restaurant

2.1 Loading data

from sklearn.metrics.pairwise import cosine_similarity 
import pandas as pd
import pprint
from collections import Counter
### Loading Data
records = pd.read_csv("datasets_2719_4497_rating_final.csv")
print(records)

     userID  placeID  rating  food_rating  service_rating
0     U1077   135085       2            2               2
1     U1077   135038       2            2               1
2     U1077   132825       2            2               2
3     U1077   135060       1            2               2
4     U1068   135104       1            1               2
...     ...      ...     ...          ...             ...
1156  U1043   132630       1            1               1
1157  U1011   132715       1            1               0
1158  U1068   132733       1            1               0
1159  U1068   132594       1            1               1
1160  U1068   132660       0            0               0

2.2 Pivot Table

Similar to conter vectorizing in content based filtering, we wanted to convert rows of records to a vector charactorizing a customer’s preferrence. There we created a pivot table representing each customer’s scoring of specific restaurant

pt = records.pivot(index='userID', columns = 'placeID', values= 'rating' ).fillna(0)
pt.fillna(0)
print(pt)

placeID  132560  132561  132564  132572  132583  132584  132594  ...  135085  135086  135088  135104  135106  135108  135109
userID                                                           ...                                                    
U1001       0.0     0.0     0.0     0.0     0.0     0.0     0.0  ...     0.0     0.0     0.0     0.0     0.0     0.0     0.0
U1002       0.0     0.0     0.0     0.0     0.0     0.0     0.0  ...     1.0     0.0     0.0     0.0     1.0     0.0     0.0
U1003       0.0     0.0     0.0     0.0     0.0     0.0     0.0  ...     0.0     0.0     0.0     0.0     0.0     0.0     0.0
U1004       0.0     0.0     0.0     0.0     0.0     0.0     0.0  ...     0.0     0.0     0.0     0.0     2.0     0.0     0.0
U1005       0.0     0.0     0.0     0.0     0.0     0.0     0.0  ...     0.0     0.0     0.0     0.0     0.0     0.0     0.0
...         ...     ...     ...     ...     ...     ...     ...  ...     ...     ...     ...     ...     ...     ...     ...
U1134       0.0     0.0     0.0     0.0     0.0     0.0     0.0  ...     2.0     0.0     0.0     0.0     0.0     0.0     0.0
U1135       0.0     0.0     0.0     0.0     0.0     0.0     0.0  ...     0.0     0.0     0.0     0.0     0.0     0.0     0.0
U1136       0.0     0.0     0.0     0.0     0.0     0.0     0.0  ...     0.0     0.0     0.0     0.0     0.0     0.0     0.0
U1137       0.0     0.0     0.0     0.0     0.0     0.0     0.0  ...     2.0     0.0     0.0     0.0     0.0     0.0     0.0
U1138       0.0     0.0     0.0     0.0     0.0     0.0     0.0  ...     0.0     0.0     0.0     0.0     0.0     0.0     0.0

2.3 Similar Customers

Let’s use the first user as a recommending target. The first step is to calculate similarity of other customers, and sorted by similarity score. The higher similarity score means they share more favorite restaurant.

target = 0 #first customer
customerID = pt.index.to_list()[0] # first customer' ID
score = zip(pt.index.to_list(), similarity[target,:].tolist()) #Simularity score
scoreSorted  = sorted(score, key=lambda t: t[1], reverse = True)#Sorted

Let’s filter these perfect matched (>99%) simply they are too similar and there is nothing new. Selecting the top users(top 10 here)

# Filter the top 10 users that that has a >10% similarity and <99%(nothing new) 
similarCU  = [ s for s in scoreSorted if (s[1]<0.99 and 0.1<s[1])][:10]
similarCU_ID = [s[0] for s in similarCU]
pprint.pprint(similarCU)

[('U1036', 0.4173919355648411),
 ('U1054', 0.4173919355648411),
 ('U1092', 0.40406101782088427),
 ('U1116', 0.3970333335883721),
 ('U1055', 0.3954372976473721),
 ('U1071', 0.3940552031195504),
 ('U1104', 0.390094748802747),
 ('U1024', 0.38188130791298674),
 ('U1045', 0.3585685828003181),
 ('U1132', 0.35355339059327373)]

2.4 Making Recommendations

Now we have top customers sharing preferences. Next was to find out what’s popupar among them.
1) Selected restaurant receiving high scores from these users
2) Recommend ones our target customer hasn’t rated yet

match = records.loc[(records['rating']>=2) & (records['userID'].isin(similarCU_ID)),].placeID.tolist()
matchSorted = [s[0] for s in Counter(match).most_common()]
recommended = [id for id in matchSorted if id not in records.loc[records['userID'] == customerID]['placeID'].tolist()]
print(recommended[:3])

[135025, 132825, 135085]

3 Summary

Collaborative filtering is usually more practical in E-com considering analysis on description(content-based filtering) was resource consuming and inaccurate. Vectorizing purchase records was quick and simple.
Building collaborative recommender:
– 1. Converting records to decisions table
– 2. Calculate simlarity score.
– 3. Filtering similar people
– 4. Recommend popular choices among them

Category: