Twitter open-sources recommendation algorithm code


Twitter announced on Friday that it’s open-sourcing the code behind the recommendation algorithm the platform uses to select the contents of the users’ For You timeline.

However, the code made public today doesn’t include parts behind advertising recommendations, or that would endanger Twitter’s ability to keep threat actors’ attempts to manipulate the platform under control.

“For this release, we aimed for the highest possible degree of transparency, while excluding any code that would compromise user safety and privacy or the ability to protect our platform from bad actors, including undermining our efforts at combating child sexual exploitation and manipulation,” the company said.

“Today’s release also does not include the code that powers our ad recommendations. We also took additional steps to ensure that user safety and privacy would be protected, including our decision not to release training data or model weights associated with the Twitter algorithm at this point.”

Twitter has published two separate GitHub repositories containing the source code for its recommendation algorithm and some of the machine learning (ML) models powering it.

As the company’s engineering team revealed, tweets that end up in the For You timeline are chosen by a service known as Home Mixer that uses the following pipeline:

  1. Fetch the best Tweets from different recommendation sources in a process called candidate sourcing.
  2. Rank each Tweet using a machine learning model.
  3. Apply heuristics and filters, such as filtering out Tweets from users you’ve blocked, NSFW content, and Tweets you’ve already seen.

“For each request, we attempt to extract the best 1500 Tweets from a pool of hundreds of millions through these sources,” Twitter explains.

“We find candidates from people you follow (In-Network) and from people you don’t follow (Out-of-Network).”

The end goal is for each user’s For You timeline to show 50% of relevant and recent tweets coming from their followers and the other 50% from people not in their network based on what the user would find interesting.

Twitter source code leaked online months ago

Earlier this month, Twitter took down proprietary source code and internal tools leaked on GitHub and publicly available for at least several months.

In a DMCA infringement notice, the company also asked GitHub to provide info on the access history for leaked code, likely to find out who downloaded the code while it was available online.

Twitter is also attempting to use a subpoena filed with the U.S. District Court for the Northern District of California to force GitHub to share identifying information on the FreeSpeechEnthusiasm user who first published the files and anyone who accessed and distributed the leaked Twitter source code, which could likely also be used for further legal action.

Today’s announcement follows tweets made by Twitter CEO Elon Musk promising to make the Twitter algorithm public, the first one a poll asking users to vote on a poll to decide if “Twitter algorithm should be open source” and the second saying that “Twitter will open source all code used to recommend tweets on March 31st.”





Source link