A big question that had lots of discussion and debate on the mailing list, and on IRC, and on the old SocialHub, is: Where to host objects that are part of the management tools of a project and need access control and to be trusted by the project team? Do we host them on the author’s server or on the project’s server?
It’s time to make a final official decision on this, because the entire shape of the ForgeFed spec depends on it.
Basically, we’d like to pick one of two approaches for hosting objects. The situation is a bit more complicated than toots (which are simply hosted by their author). The approach we pick will affect the whole shape of the spec, so I really really want to make a careful choice here and hopefully reach consensus on it.
The decision is relevant to any object which needs access-controlled reliable updates/edits after its creation, but to make things clearer here, let’s focus on a specific type of object: issues. Issues that people open on repos, to report bugs or request new features. Issues aren’t always submitted to repos (e.g. there are issue trackers that just handle issues and don’t host any repos), but for simplicity, let’s assume a common case in which issues are submitted to repos and listed under repos.
The life cycle of issues works roughly like this:
(1) A user asks to open an issue, this is basically a piece of text describing a bug or a feature request or a task to be done etc. possibly also having some data attached such as tags or screenshots or command output or user’s system info
(2) The issue is usually automatically opened and gets listed in the repo’s list of open issues
(3) The issue goes through a series of updates. There are primary two kinds of updates: Related events (comments, mentions in other issues, related Merge Requests being submitted, etc.) and edits (description text edits, tags added and removed, milestone added and removed, people assigned and unassigned to work on the issue, priority and severity set and edited, dependencies (i.e. other issues that this issue depends on) and reverse dependencies added and removed, issue resolution status edits, issue being closed and reopened, access control changes e.g. no more comments allowed or only 1 specific person given access to edit the issue, etc.). Related events just happen, and the repo only gets to decide whether to list them. Edits are direct manipulation on the issue and they’re examined and applied if access control approves them.
(4) The issue is resolved and closed. Possibly, at some point it switches into “archived” state, at which further comments and edits aren’t allowed, and if anyone at all has access to un-archive the issue, it’s only the repo team members.
(5) The issue may get removed from the list of the repo’s issues, and/or entirely deleted. This rarely happens, but it’s possible.
The big difference between toots and issues:
- Toots are expressions of the thoughts of the author, and if they ever gets updates (Mastodon seems not to allow toot updates but supposed it did allow them), then those updates can be made only by the author. It’s important that the author has access control over those updates, so that the toot continues to be an authentic expression of the author’s thoughts. If anyone else gets to edit the author’s words, we generally call that impersonation.
- Issues fill two roles simultaneously. One role is, like with toots, to be an authentic expression of the author’s thoughts and observations. For example, imagine you report a critical security bug, and it happens to actually be an intentional backdoor, and the repo ignores/deletes your issue. Or perhaps you and the repo team have different cultures/code-of-conduct/etc. and simply disagree on the issue description text. It’s important that when you publish something, it can remain your authentic expression and in your control. The other role that issues fill is to be a work item of the repo team, and serve as a project management tool, helping the repo team (and potential contributors) decide what to work on and track the work that remains to be done. Once an issue is listed under the repo, it also becomes an authentic expression of the repo team’s thoughts and intentions. They set the severity and priority, they may edit the description, they may decide to block further comments, they and other people may add new comments, they may delete comments that violate the code of conduct, and so on. For example imagine the author of a critical security issue decides to silently delete it, because the government paid them to keep it silent while they secretly work on turning the bug into a backdoor. It’s important that the repo doesn’t just lose the issue, if the repo team wants the issue there then it should remain there.
So, an issue begins its life as an agreement between the author and the repo, and later turns into 2 separate intentions: autho’s intention and repo’s intention. These two things may align, but it’s very possible that they don’t.
In centralized forges, there’s a compromise between the two roles, generally leaning towards the repo team. The author gets to do their authentic expression by getting access to edit the issue, and perhaps to reopen it if it got closed. But otherwise, the repo is in control. If the repo decides to block comments on an issue you opened, because your comments reveal some bug they refuse to fix and refuse to admit to the user community, your only way to further discuss it is to post your words elsewhere, out-of-band. A blog post, a toot on the Fediverse, a separate issue in a separate repo, and so on.
On the Fediverse, we want to fill both roles, and one of the very first questions we asked ourselves when ForgeFed started is how to do it.
How I’m going to compare the approaches
I’d like to explain some points before I refer to them.
First of all, I must say, on the Fediverse we currently mix authentication with storage location. There’s some partial use of LD Sigs (and even that, only for announces iirc and Idk if anyone except Mastodon really uses them, maybe Pleroma too?) but otherwise basically if you wrote something you must be hosting it, and the only trustworthy way to get the content is to HTTP GET it from your server. If we switched to a p2p approach, then I suppose the controversy around the question “where to host issues” would be resolved, by having them use the p2p storage mechanism and do authentication using crypto signatures and not by relying on DNS hostnames matchind. But in the current situation ActivityPub assumes HTTPS or at least non-p2p, and that’s how the Fediverse works, and we need to pick a solution for this existing situation.
For reference, quoting the AP spec talking about object identifier URIs:
Publicly dereferencable URIs, such as HTTPS URIs, with their authority belonging to that of their originating server. (Publicly facing content SHOULD use HTTPS URIs).
I’ll be comparing the 2 approaches using the following criteria:
- Author expression role: This refers to how the proposed approach fills the role of issues being an authentic record of its author’s thoughts. And if other people made edits to the words, can we trust that the author approved those edits.
- Project management role: This refers to how the proposed approach fills the role of issues being a tool for the repo team to decide what to work on, be able to know what remains to be done, what’s already done, who’s working on which task, which tasks belong to which milestones, which tasks have close due dates, which issues have comments disabled or other specific access controls, and so on. Also, their ability to enforce their code-of-conduct for discussions within the project, which includes issues and their comments.
People have been commenting about whether some approach is centralized or decentralized, and I find those words too broad to be used without a closer look and explanation, so I’m adding these points too:
- Centralization: Which aspects of the given approach is centralized, i.e. concentrates control and authority in one place
- Decentralization: Which aspects of the given approach is decentralized, i.e. allows control and authority to be shared, and in which ways that sharing maps to the logical control and authority we want people and servers to have over the data
After those there’s an overview of each approach with the major pros and cons and open questions and challenges remaining to be resolved.
Approach 1: Hosted by author only
In this approach, the repo allows authors to host the issues. If
https://jane.tld opens an issue on repo
https://repo.tld then Jane hosts the issue on her server, and the repo’s list of issues simply lists that URL as an issue.
The proposed ActivityPub way to do this is that an issue is published using
Create just like a toot, and the repo actor possibly sends an
Reject referring to the
- Author expression role: This role of the issue is fully filled, similarly to toots. Since the issue is hosted on the author’s server, we can be similarly confident that the author approves the content they wrote and approves any edits made by other people.
- Project management role: It’s unclear how this role is filled, feedback welcome. The idea @jaywink mentioned is that whenever there’s an activity involving the issue, the repo actor can send an
Acceptto approve it, and if you receive that Accept, then you know the activity represents both the repo team’s wishes and the author’s wishes. The problem is that the author may approve changes without notifying the repo at all, and when you GET the issue you see the the version the author chooses to give you, even if that version isn’t agreed upon by the repo team. Publicly verifiable crypto signed ocap invocations could help, but then we force ocaps to be public, we make the system rely on them, and we make it impossible to deny some content once you published it. And even then, the author may simply decide to omit some update activity and not apply it to the issue. We may also rely on
Accept, but then the author may simply refuse to do inbox forwarding to make sure everyone gets it, and we’re again back at the problem: When people GET the issue, they may see a version the author approves but the repo team doesn’t. Which leads to the question: If the repo team rejects some edit to the issue, how do we as repo followers HTTP GET the version of the issue that the repo team approves? Another problem is that the author’s server may go offline, and suddenly the repo team members are stuck without their database of issues and unable to fix critical bugs because they can’t view the issues.
- Centralization: The centralized aspect here is that the issue is required to be hosted only in 1 place, despite the possibility that the author and the repo team have conflicting needs, wishes and beliefs, and contradicting thoughts they want to document. In Git (and other DVCS) you just have your copy of some repo and you can do there whatever you want. Note that nothing prevents the repo team from publishing an identical issue on their own server, and simply listing it in place of the author’s issue. The problem is that then issue authorship info authenticity is lost. The 2nd Approach described below suggests a way to preserve authenticity while allowing the repo team to host issues authored by others, without resulting with impersonation.
- Decentralization: The decentralized aspect here is that the git repo an its issues don’t have to be hosted in one place and controlled by one server, they can be hosted in different places. Somewhat like toots and their comments can be hosted in different places.
So this approach is great for preserving the author’s authentic expression and thoughts, but it’s unclear how it can fill the needs of the repo team members to feel and be confident they have an issue database they can reliably access and use to manage and track their work and allow followers to keep track in a trustworthy manner.
Approach 2: Hosted by repo team, possibly by author too
In this approach, the issues are hosted in a location that the repo team chooses, trusts and controls. Issue authors may host the issues they publish on their own servers too, but the repo team does issue edits (such as setting severity, priority, setting issue dependencies, etc. all the things I listed in the intro) on the copy that they host.
The proposed ActivityPub way is that the issue author sends an
Offer activity referring to the issue they published (or the Offer simply contains the issue object, if the author doesn’t wish to publish on their server). The repo actor may send an
Reject. A new issue is opened and hosted on the repo team’s server.
- Author expression role: It’s not instantly obvious how to fill this role, this approach does offer a solution but feedback on it is welcome. If you browse an issue hosted at
https://repo.tldand the issue claims that
https://jane.tldopened this issue at 2016-12-19, how can you be sure that this is what really happened? Maybe the issue text contains opinions that Jane actually strongly opposes, right? Or maybe she didn’t open this issue at all. Maybe the domain name
jane.tlddoesn’t even exist! But tomorrow someone may launch a federated server at that domain, and obviously the users there haven’t interacted with
repo.tldand didn’t open that issue. PROPOSAL: When an issue is opened on the repo team’s server, the issue links to the
Offeractivity (hosted by the person who opened the issue, of course) or perhaps to a copy of the issue hosted by the author. When you as a desktop/mobile client app HTTP GET the issue from the repo team’s server, you then also GET the
Offerand make sure the author matches. If the title and description in the
Offerare identical to the ones in the issue, you can safely display the author on the screen as “this person indeed wrote this content”. If there’s a mismatch, you can tell someone edited the content and the author just was historically involved in the issue but you can’t hold them responsible for the content. You can however, display side-by-side the author’s original version (for which they’re resonsible) and the repo team’s current version (which contains further edits made over time and the original author isn’t responsible for them if other people made those edits). If an issue on the repo team’s server doesn’t link to the
Offeror perhaps the URL of the
Offerdoesn’t respond because the author’s server went offline, then there’s no proof the author really voluntarily opened that issue, and you can display the UI accordingly, marking author as unknown or as authenticity-isn’t-verified. Another option is that even if the
Offerisn’t listed, you can GET the author’s outbox and search through it for an
Offerof an issue with an identical title and description, and that’s another way to be sure of authenticity. If you’re interested in the issue author’s thoughts, which may conflict with the repo team’s preferences and thoughts, then check out the edits and comments the author has made and published on their own server; you can trust those. We could even be more strict and decide that an issue MUST refer to the
Offer, and if the
Offer's ID URL resolves, then it MUST resolve to an activity attributed to the issue author.
- Project management role: This role of issues is fully filled, the issue is hosted on the repo team’s server so it has full control over any edits and can enforce all access control rules. OCAPs for the fediverse are also as possible here as they’d be for toots, because the entity that hosts the issue (repo team) and the entity that needs to enforce access control (repo team) is the same entity.
- Centralization: The centralized aspect is that the repo team hosts all the issues in one place, or at least in places that the repo team chooses and controls. Issues can additionally be hosted by authors, as well as the
Offeractivities that sent the issues to the repo team, but repo team’s version which is what people would usually be interested in is always stored in a location the repo team controls, not spread around the fediverse like toots. But also remember that with Git repos, all the lines of text in the files in the repo are hosted in one place, and all the commits and all the tags etc. are all hosted together in one
.gitdirectory on one server. And all the characters in the text of a toot are stored in one place. In other words we can always deconstruct an object and say “why are all of these stored in 1 place, that’s centralization”. The answer is, you want the data to be stored like that e.g. all the toot’s text characters hosted by the author of the toot, and all the lines of text in a repo stored by the person from whom you clone. Similarly if you trust the repo team, you want to get their content from them.
- Decentralization: The decentralized aspect is that an issue doesn’t have to be hosted in one place. Much like in DVCS like Git you can just clone a repo and publish your own copy with your own changes, in this approach the author and the repo team can host their own version of the issue. If the author says “systemd is bad, let’s support other init systems” while the repo team says “systemd is awesome, let’s drop support for other init systems”, which are conflicting statements, then both get to host their version simply by allowing each to host an issue, there’s no requirement that one has to trust the other’s version, they can store their own if they disagree much like people fork git repos.
So this approach needs careful details to make sure authorship authenticity is preserved, but it easily serves the repo team’s project management needs their issue database is reliable and available for planning and doing their collaborative work.
We need your feedback