Lots of sites use algorithms to suggest new content for users. Some people see this as an issue in itself as they are often designed to simply keep users on the platform and distract them with shiny nonsense to generate ad revenue. In addition a common complaint is that it leads to so called 'filter bubbles' where people get their existing biases and prejudices constantly reinforced. I think that in both cases these objections are missing the real issue, but that is outside the scope of this text. What I want to talk about is how we can drastically improve the effectiveness and usability of these algorithms on virtually every platform.
To start with I will use youtube as an example, but this discussion applies far more widely than just one platform. For a long time I have been using youtube in a non-standard way. Rather than having an account and logging in, I have been relying on cookies to track what I watch for the recommendation algorithm. I do like recommendation algorithms in general and youtube's in particular, I have discovered a lot of interesting new music and learned a lot of simple skills that I would otherwise never have stumbled across. At a base level if you only look for one type of thing you get new suggestions for variations on that theme in quite a robust way. But therein lies the problem. Who only looks for one type of thing? If you use youtube for music and only listen to one or a collection of related genres, you will likely get some interesting suggestions and find new stuff to listen to. Then one day someone links you a funny minecraft video and suddenly you are getting a wall of minecraft videos on your suggestion list instead of music. Even worse, if you are interested in more than one thing, you get a crazy mix of all the things you are interested in and the effectiveness of the suggestions is diluted a lot by trying to match too many different things at once.
When I first started doing this, having a youtube account offered no advantages here. Effectively the suggestion quality was the same with an account as without one. Gradually over the years I have observed over the shoulders of others who have accounts that there are some improvements. Youtube identifies categories in what you watch and you can filter based on them for one. I do not have an account and do not use any of these features so there may be a lot of cool things I am missing out on. If you love your youtube account and think all my suggestions here are covered then great, but there are other reasons than usability that make me not want one. I am trying to use google accounts as little as possible in general. They collect too much personal data. Their customer support is terrible. I make it a policy to try to boycott the biggest player in any market where possible. Feel free to try to convince me.
To solve this I used the cookie plug-in 'Cookie Swap' for firefox (chrome has one too, 'Swap My Cookies', I can't tell you if it is any good). This means I can have one cookie for music, one for home improvement, one for space news, one for programming tutorials etc. If I want to watch some random videos I just choose one and get a clean list of sensible suggestions. I can have a general profile that I don't train, and use that if anyone links me a youtube video, to prevent polluting the training of other profiles. To put it simply, I want to have some control over the recommendations to ensure that they actually work for me, and I also want to keep some control over my data and how it is used.
At first I thought youtube was saving my usage fingerprint locally in the cookie, I know that seems naive in hindsight. What youtube does is it saves a hashed code that identifies me as a user in the cookie and saves a profile of what I watch on their server. This probably has much the same data as a logged in account but it is somewhat anonymous and also less permanent. There are multiple different recommendation algorithms used by different platforms (see this quora answer for a deeper dive into this). I do not know which one youtube uses specifically. In general recommendation algorithms create a preferences/usage fingerprint for each user. This is done by setting up a matrix or an n-dimensional space where aspects of the content each have a dimension and where each user and each piece of content is a point in the space. The implementation details are not important for the purpose of this text, what matters is that in the end the algorithm can take a user, and save their preferences as a relatively small piece of data, even if it is distilled from thousands of different interactions. This is why having a profile for each specific topic you are interested in, rather than one profile that is expected to encompass all your interests, is far more effective. A single fingerprint has trouble recommending things to someone that likes watching programming tutorials and Russian dash cam videos because it is effectively trying to find videos that satisfy both of those categories.
I have reluctantly come to the conclusion that my approach to this is not viable for most people, or even for myself long term. There are a few reasons for this. First not many platforms even bother doing recommendations properly if you are not logged in. It is fine if youtube is the only website you use but if you have a normal life, these profiles are very unreliable. On some sites they will work and others they just wont. In addition youtube seems to be making changes that undermine this approach. The window of time that your identity is saved on the server if you don't use it is getting shorter and shorter, it appears to be less than a month now. If I am busy for whatever reason and don't watch any videos on a specific profile for a while, all preferences are forgotten and it starts recommending me 'Funny memes!!!11!' and coverage of the latest political controversy. Training a profile takes watching 20-50 videos and can take a few hours so it is not very convenient
I stumbled across two decentralised alternatives to youtube, peertube and dtube. "Aha!" I thought, "people that are into decentralisation user self determination. Surely their recommendation algorithms will be what I am looking for!". Yes again I show how naive I really am. It turns out that recommendation algorithms are hard, and these alternative sites focus on actual decentralised video hosting first (which is also hard), and have not yet done anything with regard to recommendations. Peertube just has 'trending' which means 'people watched this recently', and dtube has a ranking based on views and upvotes. This means you have to know exactly what you are looking for, or you get whatever content the majority of users are looking at, which means more minecraft videos.
Here is what I think we should do. I don't expect youtube/facebook/spotify to implement this, even I am not that naive, but I feel like something like this could eventually find its way into decentralised alternatives. This is where I stop using youtube and video hosting as an example, these principles should apply to any site that uses recommendations.
I am not going to talk about specific algorithm implementations here, I am going to talk about user facing features. Whether a platform uses alternating least square, stochastic gradient descent or chicken entrails to decide what to recommend will depend on what kind of content they serve and what the specific math geeks in charge of programming it like best. Let software engineers do their thing, they know what they are doing. Except where UX is concerned, there they (we, technically I am both a math geek and a software engineer, don't tell anyone) have some blind spots.
1. The ability to control all aspects of the system as a user should be maximised.
This sort of covers all the others in general but it is important to specify it as the main goal of the whole exercise. There might be a lot of ways I haven't though of to achieve this.
2. Different profiles/moods/categories should be a built in feature.
If you put on music at a party you don't want stand up comedy or minecraft play-throughs getting added to the playlist, even if you love both of those things at other times. You probably don't even want other styles of music. This means the user should have multiple usage fingerprints and be able to select which one they are using at any time.
3. The user should be able to switch training off and on for a specific profile so that a well trained profile can be preserved.
Sometimes you let the kids use your account. Sometimes you let drunk people use your account. Enough said.
4. The user should be able not just to 'like' and 'view' content but to 'dislike content'.
This one is not critical, but the ability to say 'I never want to see recommendations like this again' is nice if the algorithm can swing it.
5. The user should be in control of all variables in the algorithm that can be tweaked for a given search
For example, if the algorithm applies some fuzzing to recommendations to create more variety, there should be a slider for that going from 'only show me specific content I have seen and liked and nothing new' to 'show me wild and crazy unrelated stuff all the time' and everything in between. If there is a weighting assigned to different factors, like how popular content is vs. how close it is to your profile fingerprint (popularity vs. affinity). There should be a slider from 'only show me things millions of people like' to 'only show me things that are totally obscure and unknown'
6. The algorithm should be open source.
Open source means people know how it works, this enables people to innovate new ways of using it more effectively.
7. The user's content preferences (fingerprint) should be saved on their local machine.
The user should be able to delete their profile, or share it with other people. Saving an identity locally is not sufficient. Rather than sending someone a link of an artist/channel/topic you like, you would be able to share trained recommendation profiles with friends and family. In some cases this might cause issues for the algorithm itself. User preference data is often integral to training the server algorithm. In such cases the user data could exist both on the server and locally, with the local one always taking precedence in a conflict. This would also enable users to opt out.
8. The user's content preferences (fingerprint) should be saved in a human editable format.
Even if the algorithm just creates a huge matrix of numbers for a user profile, this should not be compressed or obfuscated. People might want to create external applications that interact with this data. People might just want to change the numbers for fun to see what happens. Experimentation is a good thing.
If a system like this can be established for one platform, there may be some scope to investigate whether it is possible to implement interoperability between different platforms. So if I find a new music website I can copy my content fingerprint over from my old one and get similar suggestions. There are multiple reasons why this is extremely difficult and probably not viable but it would be worth looking into.
Content recommendation is a powerful tool for users to help them find content that they are looking for. It is also a powerful tool for corporations and governments to manipulate people. We need to take these tools into our own hands.