Bluesky and Portability

How does the AT Protocol enable account portability and what might get in the way?

I recently read Bluesky and the AT Protocol: Usable Decentralized Social Media by members of the Bluesky team along with University of Cambridge's Martin Kleppmann. The paper talks about Bluesky, a decentralized microblogging platform, and the AT Protocol which serves as it's foundation.

The AT Protocol is one of several new technologies in the decentralized social media space, otherwise known as the fediverse. These technologies aim to allow interoperability between social media providers, resulting in a competitive market where users have more choice and freedom to find a platform that meets their needs. To support this, the AT Protocol is designed to allow portability, allowing users to move from one social media provider to another without losing their username, followers, posts, and without having to create a new account.

Traditional centralized social media platforms like Facebook become sticky and lock users in because they can only see content from other Facebook users. If someone were to leave Facebook and go to a competitor, their followers would no longer be able to see their posts and they'd lose the ability to see posts from the people they follow.

Decentralized social media attempts to solve this by decoupling a users social network from the underlying provider. Users on one platform can follow users from another platform and vice versa. This gives users more choice when initially signing up for a service, but without account portability platforms can still achieve a great deal of lock in. Users would be reticent to lose their followers, username, posts, and other account data if it couldn't be moved to a new platform when they wanted to leave.

It's this key property of account portability I find most interesting about the AT Protocol. If moving a user's account to another social media provider is easy, then the switching cost is dramatically lower and there are more opportunities for competition in the marketplace. With a higher degree of competition we'd likely see more innovation or more platforms specialized to the needs of their users. For example, we might see platforms arise that cater to content creators separate from the platforms that cater to the consumers of that same content.

I'm a big supporter of decentralized social media and Bluesky in particular, but I want to point out some aspects of the AT Protocol that may lead to practical challenges when it comes to ensuring account portability. To explain these challenges, we first need to explore how user data and user identities work in the AT Protocol.

In the AT Protocol, each user's data is kept in a "user data repository" hosted on a "personal data server" (PDS). While users could operate their own personal data server, in practice they would likely be run by third party providers that host many user data repositories. Each user data repository contains information such as the user's posts and who they're following.

In the paper, Kleppmann et. al. use the analogy of running a website. In the case of the AT Protocol, each user data repository is analogous to a website and a personal data server is a hosting provider.

PDS providers hold a user's data, so to prevent lock in it's important users be able to take that data and move to a new provider. The AT Protocol goes about this by determining where a user's data is stored (their PDS) based on their identity, instead of tying a users identity to where their data is stored.

The way this works from a technical standpoint is that a user's identity piggybacks off existing DNS infrastructure, using a subdomain as the user's "handle" or username. The domain name system already has technical and social policies in place that control ownership of domains, and it's fairly inexpensive and simple to register a domain for yourself. Some platforms may give subdomains to their users automatically in order to eliminate the complexity of changing DNS records. Having control of this subdomain is part of how the protocol identifies users.

A user's subdomain could simply point to the location of their PDS, but that would be insufficient. What if you wanted to change your handle? Your followers might lose the ability to find your PDS if your old handle no longer pointed to it. So, in addition to theeir handle, you need a unique identifier for each user. More importantly, what if two handles pointed to the same user in the same PDS? There would need to be some way to distinguish which handle is actually authentic.

This is where the concept of a decentralized ID (DID) comes in. Instead of referring to a specific user data repository on a specific PDS, a user's handle (subdomain) refers to a DID. Each DID is immutable and uniquely identifies a user. This way, they can change their handle and still have it refer to the same DID.

Distinguishing which handle goes to which DID is accomplished by having the DID refer back to the handle. This mutually referential setup helps validate that one handle goes with one DID. The way this is accomplished on a technical level is through the use of a DID document. A user's handle refers to a DID, and the DID refers to a DID document. Each DID document contains the actual information about which handle the DID is associated with and which PDS the user is using.

So to recap, a user stores their data in a user data repository hosted on a PDS. The user's handle is a subdomain that specifies the user's unique DID. The DID points to a DID document which refers back to the handle to verify it's correct, and specifies which PDS is hosting the user's data. This mechanism gives users an identity, and that identity determines where their data is stored.

Great, so at this point we know the user has control over their subdomain because the DNS system already has mechanisms in place to verify ownership. We know that the DID associated with their handle (subdomain) is valid because the DID document refers back to the user's handle (subdomain). But how do we know the DID document is authentic?

There are various types of DID, but for now Bluesky only supports two of them. The first, web, I won't get into but it also uses existing DNS infrastructure to verify a DID document is authentic. As expressed in the paper, typical users would not use web DIDs anyway, they're more likely to use the second supported type, plc.

plc DIDs use a hash value to refer to an entry in a central DID document repository. Each DID document in the central repository contains a public key and a digital signature in addition to the user's handle and PDS. When a new DID document is added or changed in the central repository, it's signed with the private key associated with the document's public key. A plc DID document can be validated either by checking that it matches the plc hash, or if it's been modified, by checking the public key against the digital signature.

Again, plc DIDs are what users will likely use, meaning that ownership and validity of a DID document is tied to a private key. So a user needs access to their private key in order to change their PDS or handle.

In a real world use case, most users would not have direct access to this private key. If a user lost their private key, they wouldn't be able to make changes to their DID document. Meaning they wouldn't be able to change their PDS and move their data to a new provider, preventing account portability.

Instead, a user's private key would likely be stewarded by a service provider, possibly the same service provider as their PDS. The service provider would let the user authenticate with something more convenient like a username and password. That way if they forgot or lost their password, it could be reset through conventional means.

This is where I think malicious actors could prevent account portability by controlling a user's private key. There may be incentives to prevent a user from moving to a different platform, especially if the private key service provider was the same as a user's PDS. These service providers could refuse to give the user their private key, preventing them from updating their DID document. This would mean they could not change their handle or their PDS.

If this were to happen, a user could create a new DID and change their handle to point to it, but they would lose their old posts, followers, and other data. What's worse, if they didn't have control over their handle (subdomain) they would lose that as well. If a user's handle, DID document private key, and PDS are all through the same provider, there are opportunities for those providers to act in malicious ways to lock them in.

I very well may have missed something. There may be other technical mechanisms that further prevent lock in. Regardless though, if there are financial incentives I think bad actors will still find ways to lock users in. Moving to a world with decentralized social media and a competitive marketplace will require more than just technical solutions. We also need to be vigilant and develop social mechanisms to prevent bad behavior.

Make no mistake though, the AT Protocol and other decentralized social media protocols are a big step in the right direction.

I think Bluesky and the AT Protocol are a wonderful innovation and something we sorely need in the social media space. We've seen time and again that when centralized social media platforms gain significant scale they lock in their users and content creators. When these platforms make changes that alienate their users they often have little recourse because the switching costs are too high. Take for example YouTube creators. Many YouTube channel operators have complained about having videos demonetized or receiving copyright strikes for innocent behavior. My hope is that by embracing decentralized social media, we can lower the switching costs for users and content creators, creating a much more competitive market that better serves their needs.