Contents
5. Univariate sequential encoding
It is time to build the sequential mechanism to keep track of user choices over time. The mechanism I idealized works on two separate vectors (that after the process end up being one, hence univariate), a historical vector and a caching vector.
The historical vector is the one that is used to perform knn on the existing clusters. Once a session is concluded, we update the historical vector with the new user choices. At the same time, we adjust existing values with a decay function that diminishes the existing weights over time. By doing so, we make sure to keep up with the customer trends and give more weight to new choices, rather than older ones.
Rather than updating the vector at each user makes a choice (which is not computationally efficient, in addition, we risk letting older choices decay too quickly, as every user interaction will trigger the decay mechanism), we can store a temporary vector that is only valid for the current session. Each user interaction, converted into a vector using the tag frequency as one hot weight, will be summed to the existing cached vector.
Once the session is closed, we will retrieve the historical vector from the database, merge it with the cached vector, and apply the adjustment mechanisms, such as the decay function and pruning, as we will see later). After the historical vector has been updated, it will be stored in the database replacing the old one.
The two reasons to follow this approach are to minimize the weight difference between older and newer interactions and to make the entire process scalable and computationally efficient.
6. Pruning Mechanism
The system has been completed. However, there is an additional problem: covariate encoding has one flaw: its base vector is scaled proportionally to the number of encoded tags. For example, if our database were to reach 100k tags, the vector would have an equivalent number of dimensions.
The original covariate encoding architecture already takes this problem into account, proposing a PCA compression mechanism as a solution. However, applied to our recommender, PCA causes issues when iteratively summing vectors, resulting in information loss. Because every user choice will cause a summation of existing vectors with a new one, this solution is not advisable.
However, If we cannot compress the vector we can prune the dimensions with the lowest scores. The system will execute a knn based on the most relevant scores of the vector; this direct method of feature engineering won’t affect negatively (better yet, not excessively) the results of the final recommendation.
By pruning our vector, we can arbitrarily set a maximum number of dimensions to our vectors. Without altering the tag indexes, we can start operating on sparse vectors, rather than a dense one, a data structure that only saves the active indexes of our vectors, being able to scale indefinitely. We can compare the recommendations obtained from a full vector (dense vector) against a sparse vector (pruned vector).
As we can see, we can spot minor differences, but the overall integrity of the vector has been maintained in exchange for scalability. A very intuitive alternative to this process is by performing clustering at the tag level, maintaining the vector size fixed. In this case, a tag will need to be assigned to the closest tag semantically, and will not occupy its dedicated dimension.
7. Exemplar estimation
Now that you have fully grasped the theory behind this new approach, we can compare them more clearly. In a multivariate approach, the first step was to identify the top user preferences using clustering. As we can see, this process required us to store as many vectors as found exemplars.
However, in a univariate approach, because covariate encoding works on a transposed version of the encoded data, we can use sections of our historical vector to store user preferences, hence only using a single vector for the entire process. Using the historical vector as a query to search through encoded tags: its top-k results from a knn search will be equivalent to the top-k preferential clusters.
8. Recommendation approaches
Now that we have captured more than one preference, how do we plan to recommend items? This is the major difference between the two systems. The traditional multivariate recommender will use the exemplar to recommend k items to a user. However, our system has assigned our customer one supercluster and the top subclusters under it (depending on our level of tag segmentation, we can increase the number of levels). We will not recommend the top k items, but the top k subclusters.
Using groupby instead of vector search
So far, we have been using a vector to store data, but that does not mean we need to rely on vector search to perform recommendations, because it will be much slower than a SQL operation. Note that obtaining the same exact results using vector search on the user array is indeed possible.
If you are wondering why you would be switching from a vector-based system to a count-based system, it is a legitimate question. The simple answer to that is that this is the most loyal replica of the multivariate system (as portrayed in the reference images), but much more scalable (it can reach up to 3000 recommendations/s on 16 CPU cores using pandas). Originally, the univariate recommender was designed to employ vector search, but, as showcased, there are simpler and better search algorithms.
Let us run a full test that we can monitor. We can use the code from the sample notebook: for our simple example, the user selects at least one game labeled with corresponding tags.
# if no vector exists, the first choices are the historical vector
historical_vector = user_choices(5, tag_lists=[['Shooter', 'Fantasy']], tag_frequency=tag_frequency, display_tags=False)# day1
cached_vector = user_choices(3, tag_lists=[['Puzzle-Platformer'], ['Dark Fantasy'], ['Fantasy']], tag_frequency=tag_frequency, display_tags=False)
historical_vector = update_vector(historical_vector, cached_vector, 1, 0.8)
# day2
cached_vector = user_choices(3, tag_lists=[['Puzzle'], ['Puzzle-Platformer']], tag_frequency=tag_frequency, display_tags=False)
historical_vector = update_vector(historical_vector, cached_vector, 1, 0.8)
# day3
cached_vector = user_choices(3, tag_lists=[['Adventure'], ['2D', 'Turn-Based']], tag_frequency=tag_frequency, display_tags=False)
historical_vector = update_vector(historical_vector, cached_vector, 1, 0.8)
compute_recommendation(historical_vector, label_1_max=3)
At the end of 3 sessions, these are the top 3 exemplars (label_1) extracted from our recommender:
In the notebook, you will find the option to perform Monte Carlo simulations, but there would be no easy way to validate them (mostly because team games are not tagged with the highest accuracy, and I noticed that most small games list too many unrelated or common tags).
The architectures of the most popular recommender systems still do not take into account session history, but with the development of new algorithms and the increase in computing power, it is now possible to tackle a higher level of complexity.
This new approach should offer a comprehensive alternative to the sequential recommender systems available on the market, but I am convinced that there is always room for improvement. To further enhance this architecture it would be possible to switch from a clustering-based to a network-based approach.
It is important to note that this recommender system can only excel when applied to a limited number of domains but has the potential to shine in conditions of scarce computational resources or extremely high demand.