Where to host issues

A big question that had lots of discussion and debate on the mailing list, and on IRC, and on the old SocialHub, is: Where to host objects that are part of the management tools of a project and need access control and to be trusted by the project team? Do we host them on the author’s server or on the project’s server?

It’s time to make a final official decision on this, because the entire shape of the ForgeFed spec depends on it.

Intro

Basically, we’d like to pick one of two approaches for hosting objects. The situation is a bit more complicated than toots (which are simply hosted by their author). The approach we pick will affect the whole shape of the spec, so I really really want to make a careful choice here and hopefully reach consensus on it.

The decision is relevant to any object which needs access-controlled reliable updates/edits after its creation, but to make things clearer here, let’s focus on a specific type of object: issues. Issues that people open on repos, to report bugs or request new features. Issues aren’t always submitted to repos (e.g. there are issue trackers that just handle issues and don’t host any repos), but for simplicity, let’s assume a common case in which issues are submitted to repos and listed under repos.

The life cycle of issues works roughly like this:

(1) A user asks to open an issue, this is basically a piece of text describing a bug or a feature request or a task to be done etc. possibly also having some data attached such as tags or screenshots or command output or user’s system info

(2) The issue is usually automatically opened and gets listed in the repo’s list of open issues

(3) The issue goes through a series of updates. There are primary two kinds of updates: Related events (comments, mentions in other issues, related Merge Requests being submitted, etc.) and edits (description text edits, tags added and removed, milestone added and removed, people assigned and unassigned to work on the issue, priority and severity set and edited, dependencies (i.e. other issues that this issue depends on) and reverse dependencies added and removed, issue resolution status edits, issue being closed and reopened, access control changes e.g. no more comments allowed or only 1 specific person given access to edit the issue, etc.). Related events just happen, and the repo only gets to decide whether to list them. Edits are direct manipulation on the issue and they’re examined and applied if access control approves them.

(4) The issue is resolved and closed. Possibly, at some point it switches into “archived” state, at which further comments and edits aren’t allowed, and if anyone at all has access to un-archive the issue, it’s only the repo team members.

(5) The issue may get removed from the list of the repo’s issues, and/or entirely deleted. This rarely happens, but it’s possible.

The big difference between toots and issues:

  • Toots are expressions of the thoughts of the author, and if they ever gets updates (Mastodon seems not to allow toot updates but supposed it did allow them), then those updates can be made only by the author. It’s important that the author has access control over those updates, so that the toot continues to be an authentic expression of the author’s thoughts. If anyone else gets to edit the author’s words, we generally call that impersonation.
  • Issues fill two roles simultaneously. One role is, like with toots, to be an authentic expression of the author’s thoughts and observations. For example, imagine you report a critical security bug, and it happens to actually be an intentional backdoor, and the repo ignores/deletes your issue. Or perhaps you and the repo team have different cultures/code-of-conduct/etc. and simply disagree on the issue description text. It’s important that when you publish something, it can remain your authentic expression and in your control. The other role that issues fill is to be a work item of the repo team, and serve as a project management tool, helping the repo team (and potential contributors) decide what to work on and track the work that remains to be done. Once an issue is listed under the repo, it also becomes an authentic expression of the repo team’s thoughts and intentions. They set the severity and priority, they may edit the description, they may decide to block further comments, they and other people may add new comments, they may delete comments that violate the code of conduct, and so on. For example imagine the author of a critical security issue decides to silently delete it, because the government paid them to keep it silent while they secretly work on turning the bug into a backdoor. It’s important that the repo doesn’t just lose the issue, if the repo team wants the issue there then it should remain there.

So, an issue begins its life as an agreement between the author and the repo, and later turns into 2 separate intentions: autho’s intention and repo’s intention. These two things may align, but it’s very possible that they don’t.

In centralized forges, there’s a compromise between the two roles, generally leaning towards the repo team. The author gets to do their authentic expression by getting access to edit the issue, and perhaps to reopen it if it got closed. But otherwise, the repo is in control. If the repo decides to block comments on an issue you opened, because your comments reveal some bug they refuse to fix and refuse to admit to the user community, your only way to further discuss it is to post your words elsewhere, out-of-band. A blog post, a toot on the Fediverse, a separate issue in a separate repo, and so on.

On the Fediverse, we want to fill both roles, and one of the very first questions we asked ourselves when ForgeFed started is how to do it.

How I’m going to compare the approaches

I’d like to explain some points before I refer to them.

First of all, I must say, on the Fediverse we currently mix authentication with storage location. There’s some partial use of LD Sigs (and even that, only for announces iirc and Idk if anyone except Mastodon really uses them, maybe Pleroma too?) but otherwise basically if you wrote something you must be hosting it, and the only trustworthy way to get the content is to HTTP GET it from your server. If we switched to a p2p approach, then I suppose the controversy around the question “where to host issues” would be resolved, by having them use the p2p storage mechanism and do authentication using crypto signatures and not by relying on DNS hostnames matchind. But in the current situation ActivityPub assumes HTTPS or at least non-p2p, and that’s how the Fediverse works, and we need to pick a solution for this existing situation.

For reference, quoting the AP spec talking about object identifier URIs:

Publicly dereferencable URIs, such as HTTPS URIs, with their authority belonging to that of their originating server. (Publicly facing content SHOULD use HTTPS URIs).

I’ll be comparing the 2 approaches using the following criteria:

  • Author expression role: This refers to how the proposed approach fills the role of issues being an authentic record of its author’s thoughts. And if other people made edits to the words, can we trust that the author approved those edits.
  • Project management role: This refers to how the proposed approach fills the role of issues being a tool for the repo team to decide what to work on, be able to know what remains to be done, what’s already done, who’s working on which task, which tasks belong to which milestones, which tasks have close due dates, which issues have comments disabled or other specific access controls, and so on. Also, their ability to enforce their code-of-conduct for discussions within the project, which includes issues and their comments.

People have been commenting about whether some approach is centralized or decentralized, and I find those words too broad to be used without a closer look and explanation, so I’m adding these points too:

  • Centralization: Which aspects of the given approach is centralized, i.e. concentrates control and authority in one place
  • Decentralization: Which aspects of the given approach is decentralized, i.e. allows control and authority to be shared, and in which ways that sharing maps to the logical control and authority we want people and servers to have over the data

After those there’s an overview of each approach with the major pros and cons and open questions and challenges remaining to be resolved.

Approach 1: Hosted by author only

In this approach, the repo allows authors to host the issues. If https://jane.tld opens an issue on repo https://repo.tld then Jane hosts the issue on her server, and the repo’s list of issues simply lists that URL as an issue.

The proposed ActivityPub way to do this is that an issue is published using Create just like a toot, and the repo actor possibly sends an Accept / Reject referring to the Create.

Analysis:

  • Author expression role: This role of the issue is fully filled, similarly to toots. Since the issue is hosted on the author’s server, we can be similarly confident that the author approves the content they wrote and approves any edits made by other people.
  • Project management role: It’s unclear how this role is filled, feedback welcome. The idea @jaywink mentioned is that whenever there’s an activity involving the issue, the repo actor can send an Accept to approve it, and if you receive that Accept, then you know the activity represents both the repo team’s wishes and the author’s wishes. The problem is that the author may approve changes without notifying the repo at all, and when you GET the issue you see the the version the author chooses to give you, even if that version isn’t agreed upon by the repo team. Publicly verifiable crypto signed ocap invocations could help, but then we force ocaps to be public, we make the system rely on them, and we make it impossible to deny some content once you published it. And even then, the author may simply decide to omit some update activity and not apply it to the issue. We may also rely on Reject instead of Accept, but then the author may simply refuse to do inbox forwarding to make sure everyone gets it, and we’re again back at the problem: When people GET the issue, they may see a version the author approves but the repo team doesn’t. Which leads to the question: If the repo team rejects some edit to the issue, how do we as repo followers HTTP GET the version of the issue that the repo team approves? Another problem is that the author’s server may go offline, and suddenly the repo team members are stuck without their database of issues and unable to fix critical bugs because they can’t view the issues.
  • Centralization: The centralized aspect here is that the issue is required to be hosted only in 1 place, despite the possibility that the author and the repo team have conflicting needs, wishes and beliefs, and contradicting thoughts they want to document. In Git (and other DVCS) you just have your copy of some repo and you can do there whatever you want. Note that nothing prevents the repo team from publishing an identical issue on their own server, and simply listing it in place of the author’s issue. The problem is that then issue authorship info authenticity is lost. The 2nd Approach described below suggests a way to preserve authenticity while allowing the repo team to host issues authored by others, without resulting with impersonation.
  • Decentralization: The decentralized aspect here is that the git repo an its issues don’t have to be hosted in one place and controlled by one server, they can be hosted in different places. Somewhat like toots and their comments can be hosted in different places.

So this approach is great for preserving the author’s authentic expression and thoughts, but it’s unclear how it can fill the needs of the repo team members to feel and be confident they have an issue database they can reliably access and use to manage and track their work and allow followers to keep track in a trustworthy manner.

Approach 2: Hosted by repo team, possibly by author too

In this approach, the issues are hosted in a location that the repo team chooses, trusts and controls. Issue authors may host the issues they publish on their own servers too, but the repo team does issue edits (such as setting severity, priority, setting issue dependencies, etc. all the things I listed in the intro) on the copy that they host.

The proposed ActivityPub way is that the issue author sends an Offer activity referring to the issue they published (or the Offer simply contains the issue object, if the author doesn’t wish to publish on their server). The repo actor may send an Accept / Reject. A new issue is opened and hosted on the repo team’s server.

Analysis:

  • Author expression role: It’s not instantly obvious how to fill this role, this approach does offer a solution but feedback on it is welcome. If you browse an issue hosted at https://repo.tld and the issue claims that https://jane.tld opened this issue at 2016-12-19, how can you be sure that this is what really happened? Maybe the issue text contains opinions that Jane actually strongly opposes, right? Or maybe she didn’t open this issue at all. Maybe the domain name jane.tld doesn’t even exist! But tomorrow someone may launch a federated server at that domain, and obviously the users there haven’t interacted with repo.tld and didn’t open that issue. PROPOSAL: When an issue is opened on the repo team’s server, the issue links to the Offer activity (hosted by the person who opened the issue, of course) or perhaps to a copy of the issue hosted by the author. When you as a desktop/mobile client app HTTP GET the issue from the repo team’s server, you then also GET the Offer and make sure the author matches. If the title and description in the Offer are identical to the ones in the issue, you can safely display the author on the screen as “this person indeed wrote this content”. If there’s a mismatch, you can tell someone edited the content and the author just was historically involved in the issue but you can’t hold them responsible for the content. You can however, display side-by-side the author’s original version (for which they’re resonsible) and the repo team’s current version (which contains further edits made over time and the original author isn’t responsible for them if other people made those edits). If an issue on the repo team’s server doesn’t link to the Offer or perhaps the URL of the Offer doesn’t respond because the author’s server went offline, then there’s no proof the author really voluntarily opened that issue, and you can display the UI accordingly, marking author as unknown or as authenticity-isn’t-verified. Another option is that even if the Offer isn’t listed, you can GET the author’s outbox and search through it for an Offer of an issue with an identical title and description, and that’s another way to be sure of authenticity. If you’re interested in the issue author’s thoughts, which may conflict with the repo team’s preferences and thoughts, then check out the edits and comments the author has made and published on their own server; you can trust those. We could even be more strict and decide that an issue MUST refer to the Offer, and if the Offer's ID URL resolves, then it MUST resolve to an activity attributed to the issue author.
  • Project management role: This role of issues is fully filled, the issue is hosted on the repo team’s server so it has full control over any edits and can enforce all access control rules. OCAPs for the fediverse are also as possible here as they’d be for toots, because the entity that hosts the issue (repo team) and the entity that needs to enforce access control (repo team) is the same entity.
  • Centralization: The centralized aspect is that the repo team hosts all the issues in one place, or at least in places that the repo team chooses and controls. Issues can additionally be hosted by authors, as well as the Offer activities that sent the issues to the repo team, but repo team’s version which is what people would usually be interested in is always stored in a location the repo team controls, not spread around the fediverse like toots. But also remember that with Git repos, all the lines of text in the files in the repo are hosted in one place, and all the commits and all the tags etc. are all hosted together in one .git directory on one server. And all the characters in the text of a toot are stored in one place. In other words we can always deconstruct an object and say “why are all of these stored in 1 place, that’s centralization”. The answer is, you want the data to be stored like that e.g. all the toot’s text characters hosted by the author of the toot, and all the lines of text in a repo stored by the person from whom you clone. Similarly if you trust the repo team, you want to get their content from them.
  • Decentralization: The decentralized aspect is that an issue doesn’t have to be hosted in one place. Much like in DVCS like Git you can just clone a repo and publish your own copy with your own changes, in this approach the author and the repo team can host their own version of the issue. If the author says “systemd is bad, let’s support other init systems” while the repo team says “systemd is awesome, let’s drop support for other init systems”, which are conflicting statements, then both get to host their version simply by allowing each to host an issue, there’s no requirement that one has to trust the other’s version, they can store their own if they disagree much like people fork git repos.

So this approach needs careful details to make sure authorship authenticity is preserved, but it easily serves the repo team’s project management needs their issue database is reliable and available for planning and doing their collaborative work.

Summary

We need your feedback :wink:


For reference, there’s an open issue about this, issue #73. And previous discussion in issue #7 and issue #58.

3 Likes

I think some of our disagreement possibly comes a few differences in thought.

  • I feel there is a lot of C2S related needs here, while I’m thinking of stricly S2S (since pretty much no one implements C2S and I would assume that to continue, especially re GitLab, Gitea, etc, which have an API already)
  • I think you assume the created issues would not be cached at remote servers, while in my experience content is always cached on remote servers in all implementations I’ve seen.

The last point makes the question “where is the issue hosted” irrelevant, since the ID is simply an unique identifier after the object is received. Issues are hosted, as per common delivery rules, most likely at:

  • the authoring server
  • any servers with to/cc/bcc
  • all servers with followers
  • (and for public issues) any server that happened to get the payload

Of course, a refetch would mean resolving the ID and fetching the content. But a cached version would mean the authoring server for example going down would not make the issue disappear - it would live on in a decentralized manner as other content on the federated web does when the authoring server disappears.

I think there are good and bad sides to each solution. Will try to draw some diagrams in a bit to map the weaknesses in either.

To make the decision would be great to have a few other opinions too :confused: I wont strongly oppose the Offer flow in any case, if no more comments are received, I shall yield to move on :wink: Btw, thanks for the great write-up on this and taking my feedback seriously :+1:

1 Like

I’d like to make sure we make a fair comparison. I’m in favor of approach 2, but I want to be sure to make an effort to solve the problems of approach 1 before making any decisions.

@jaywink, I saw your reply in the middle of writing this; I’ll post this comment as-is and then I’ll separately reply to yours :slight_smile:

Problems to resolve:

  • When viewing the issue, need to be able to tell, reliably, whether its whole content is agreed upon by the repo team
  • If not agreed upon, how does the repo team publish its own version?
  • How to verify OCAPs
  • Do we say no to issue forking? Is that a good idea?

Determining authenticity

If I browse a repo and it lists an issue hosted elsewhere, and I proceed to browse that issue, how can I tell whether the content I get is approved not only by the author, but also by the repo team?

The issue, hosted by its author, would need to refer to some proof, right? Ideas:

  • Link to the Accept activity
  • Link to an exact copy hosted by repo team
  • Link to a cryptographic signature made by repo team

Does anyone have other ideas? Here’s my analysis of the 3 I came up with.

Link to Accept

This has a problem. How do you tell whether the Accept applies to the latest version of the issue?

It would link to the activity it applies to. If that activity isn’t an Update on the whole issue, then we can’t verify it matches (without taking the entire history of edits of the issue, applying them one by one, making sure the result is exactly identical to the current latest version…). If it is an Update on the whole issue, most likely we’ll still have problems: Some of the fields of the issue object would be ID URIs pointing to separately (but locally) hosted objects and collections, whose changes aren’t directly reflected in the issue object. To prevent that we’d have to force them all to be embedded in the issue object, which only creates more trouble (not sure what creates more tons of trouble, to embed them instead of providing separately, or embed in addition :P)

Link to exact copy

Is there’s an exact copy hosted by the repo team, then this ends up being like approach 2, except without allowing author and repo team to have their versions diverge. If the repo team hosts a copy you may as well just get it from them directly, without going through the author at all…

Link to signature

Imagine that for each issue change, the repo team sends some activity that includes a crypto signature on the whole issue object. That would require LDSigs. But fine, suppose we require them. It still has the problem of nested objects. How do we deal with them? Imagine an issue with 1000 comments. Since repo team may wish to block an issue from getting comments, or removing a comment due to CoC violation, it needs to sign the comments too… and everything else linked from the issue object, essentially… that could end up being a huge huge signature, even for the most trivial changes. Alternatively, every single linked object gets its own signature. Now you need to verify 1000 crypto signatures to be sure the issue you’re looking at is approved by repo team.

This may be possible, but I’m wondering what benefit it would have, compared to having the repo team just host and manage its own copy. I mean, that signature stuff is such a huge heavy complication, just to allow the issue author to host the ID URL of the issue.

Hmm there’s a problem, actually, with comments. CoC violation and moderation etc. generally involve manual action, which means many comments would get automatically accepted and later some of them may be manually rejected by repo team. How does the issue author prove to me, the viewer, that no provided object got rejected? I suppose I’d have to look at some repo team outbox and verify I can’t find any Reject there on any of the objects I’m looking at.

Upon disagreement

If there’s a disagreement, repo team can publish its own copy. The problem is, the copy doesn’t have any of the comments or other attached objects. And the last-approved-version may be unavailable anymore, after going through many small disagreed-upon changes. To avoid that problem, repo team would have to cache the entire history of edits, including to linked sub-objects and collections…

OCAPs

When author receives an edit activity, the activity would have an OCAP attached. For the author to be able to verify the OCAP by themselves, the OCAP has to be publicly provided and publicly verifiable. So it would be something like OCAP-LD. It means we have to define a vocabulary for the roles and actions and exactly how to represent the access to each ForgeFed action in OCAPs. Maybe OcapPub has some ideas about that?

Alternatively, the author could wait for repo team to send its Accept. Now OCAPs don’t have to be publicly verifiable, no need to spec their entire vocabulary for this, but there’s a waiting period in which people would be unable to verify the repo team’s approval.

In approach 2, the ocap part would be trivial, since the entity managing access control (repo team) and the entity hosting the objects (repo team) are the same. The OCAPs could be something as simple as an HMAC, which the repo team’s server would internally verify and instantly apply the update to the edited issue.

Issue forking

Approach 1 says, issue author hosts the issue. Repo team doesn’t publish a copy (because if it did, that’s already approach 2). But servers cannot prevent other servers from publishing stuff, and more generally, we don’t have any control over what happens on other instances. Repo teams would have a good reason to copy and host-by-themselves all the remotely opened issues: Full control with a much simpler implementation, guaranteed availability, ability to do anything non-standard that the issue author’s server couldn’t handle. Not depending on the author’s server being slow, or accidentally failing to list valuable comments and reviews, and so on. So we could end up with implementations automatically making copies of issues.

Making copies that way would also allow originally-centralized forges to continue to host all their issues, the way they already do, and not have to deal with all those Accept activities or scary LDSigs.

So, do we just state that making such copies is… forbidden? A plain simple “forking” of an issue, a copy of information from place A to place B?

If we don’t/can’t require that, then we’ll end up with some issues being copied and some not. So, a mixture of the 2 approaches. And then I wonder if it’s not much easier to just go with approach 2, and not have to handle all the problems of approach 1.

CONCLUSION

Approach 1 seems technically possible, but at a high cost of complexity whose benefit is unclear to me. The big difference between the approaches, in the criteria I examined above:

  • In approach 1, you have to examine all the edit actions or latest versions of the issue and all related objects, and approve them
  • In approach 2, all you need to do is to state who the original author is and point to their initial original version of the issue; there’s no need to do anything complicated with the next versions, because repo team hosts the issue and merely HTTP GETing it from them is the proof it’s the version they agree with

That makes the authenticity verification approach 1 much more complicated, while in approach 2 is easy.

I’m keeping my vote for approach 2. Now time to review what jaywink wrote :slight_smile:

It’s important to me that we don’t do anything that gets in the way of C2S. I’m actually planning to implement C2S in Vervis, and also, it’s a part of the ActivityPub spec and I’d like ForgeFed to work with it and not against it. I haven’t been assuming the use of C2S, but yes, I’ve been trying to make decisions in a way that works either way, with or without C2S or any other specific API.

Note that the fact that forges currently host all their issues doesn’t mean it would be that way in federated forges: You could browse some gitlab repo which has issues hosted remotely, and the JS client from gitlab in your web browser would HTTP GET the issue from wherever it thinks is the right place, whether it’s the actual remote ID URL or a cached version. ForgeFed just needs to work in such a way, that either of those options can work. In other words, if client GETs issue from elsewhere, that place elsewhere needs to provide proof of approval of repo team (the challenge in approach 1), and if GETing from cache, gitlab needs to provide proof of authenticity of original remote author (the challenge of approach 2).

So even if we don’t assume any C2S at all, the challenge of proving authenticity remains quite the same.

Hmmm nope :slight_smile: caching is definitely possible, it’s just that there’s no way to bypass the need to proof author authenticity. Obviously, you can skip that part and say “I claim this remote actor authored this issue but I won’t provide any proof”, but it’s not a magical solution. The question I want to have answered is how to proof authenticity when you do want to prove it. Displaying from cache is essentially like approach 2: You host a copy. The author’s version them doesn’t matter, and the fact that they host the issue’s ID URL is an illusion, since the issue is actually served to everyone via the cached copy on the repo team’s server.

The last point makes the question “where is the issue hosted” irrelevant, since the ID is simply an unique identifier after the object is received. Issues are hosted, as per common delivery rules […]

Of course, a refetch would mean resolving the ID and fetching the content. But a cached version would mean the authoring server for example going down would not make the issue disappear - it would live on in a decentralized manner as other content on the federated web does when the authoring server disappears.

Agreed. In other words, the ID URL makes a difference when you actually HTTP GET the object from it.

An issue would normally have a replies field, listing all the known replies. And more such lists/collections, for example the issue’s dependencies. Now we have 2 options:

  • The repo team’s server caches those lists too
  • The repo team’s server doesn’t, and you really just have to HTTP GET them from the author’s server

In the former case, the repo team server essentially has an entire copy of the issue and all related objects. That’s just approach 2 in disguise. The ID URL is on author’s server, yes, but nobody actually ever GETs that URL because they view the issue via the cache hosted by repo team. In practice, the repo team is providing the hosting for this issue, serving it to people who want to view it. There’s no need to bother to grab the issue from the author’s server and then do complicated repo-team-approval verification; it’s much easier to use that cache.

In the latter case, the cache is partial and now if author’s server goes down there’s real loss of data.

So, once the issue is cached and you HTTP GET it from the cache and can trust that cache, the ID URL indeed makes no difference, it’s just a cell in a DB table. So the question is: If you don’t have the issue cached, where do you get it from? My proposed answer: Both author an repo would have a copy of the issue. One of them is the canonical ID and the other is a (hopefully) identical copy. Except if you GET from the author, the process of authenticity verification is much more weird and complicated than if you GET from the repo team.


Btw a point I forgot to mention: If we force authors to host issues, then their servers will have to understand all the vocabulary of issue edits and apply the side effects. If we have repos host the issues, then sending an issue is trivial and any server that can publish toots can then also publish issues, with very little implementation work, If an issue is published in 1 activity and then some 1000 edit/related activities happen (comments, dependencies, due dates, tags, milestones…), we end up putting a huge burden on the author’s server: Handle all those 1000 activities merely because they wanted to send that 1 activity to open an issue. Not saying this as an argument for either approach, just sharing the thought here.

Where does the requirement about “agreed upon by the repo team” come from? This is not true even in centralized forges currently. I can log an issue into any public GitHub repository and the repo team does not get to approve my issue. I’m not sure why this would be the case in the decentralized world.

Information in a decentralized world can get out of sync easily (especially with push only protocols with no state like ActivityPub). If the author updates their copy but does not send it to the remote repository, then they make that choice for some reason. The repo team will likely have their cached copy and if they do end up GETting the author’s copy, they lose no more control than they lose with created issues.

I’m not sure I’ve said this, I can’t find any reference at least. What I believe I’ve said is that the Create should be Accepted, if we want repositories to indicate back whether they accepted the issue into their list. Even if they don’t the author can keep it in their copy of the repository.

The create and any updates should follow inbox forwarding rules to reach all the servers that are interacting with the target repository. But I don’t see why everything should be accepted. When one comments on a post, remote servers don’t normally accept the comment. This doesn’t even happen in centralized forges.

This is a major problem with the Offer flow. If I create an issue in a repository, I don’t want to lose my authorship on that issue to the target server. I want it to be authored by me.

I don’t believe this is a great reason for one of the most important details of the spec that people can post issues easily from social networks. The primary details IMHO should be “how to implement a forge” and IMHO we should assume that implementers of the spec are forges who implement the spec correctly.

Besides, even if say, Mastodon, added support for Create { Issue }, the side effects and such are pretty much the same for an issue as for Create { Note }, in the sense of inbox forwarding which is probably the most important detail so that everyone receives the activity.

I’m still not convinced Offer is better than Create. I think it goes against the idea of federating forges since it feels like forges become just API’s to each other instead of the forge objects (issues for example) being really decentralized. As soon as you Offer something, you lose control of it. You can’t even interact with it to fill in details until the remote server sends you an ID, which could be a huge problem with servers that are not reliably available (the real world of the federated web).

This brings me to the only bad thing I can think of the Create flow. Repository owners receiving a remote Issue would normally not be able to send an Update { Issue } out. This however could be written into the specification that receivers of an update to an issue should respect updates from the repository owners as well as the issue author. Any well behaving Forgefed server would then allow the repository owners to also update issues created by others.

I hope this clarifies some thoughts on the issue. Unfortunately there is starting to be so much text for this it’s hard to keep track of things. I do believe Forgefed should try to keep as close as possible to standard ActivityPub behaviours. Federation and decentralization are hard problems. But we shouldn’t avoid these problems by watering down solutions by encouraging servers to avoid remote objects. I still fail to see any forge related requirements. I know there are some above, but I don’t think they are valid as (hopefully) explained above.

How to get forward, since we’re clearly not going to reach common ground here?

  1. Post out to the fediverse and forges and try to get more opinions on this
  2. Make a decision as main author and move forward

I would prefer 1) but I do realize it’s frustrating as there doesn’t seem to be much interest, or people are just scared to get involved. As such I wont blame you fro choosing 2), though I strongly feel it goes against the principles of federation and decentralization.

Some quick diagrams to dump some thoughts on the issue flows I propose which I believe would be more in line with the idea of federation.

Selection_430

i have not been involved much lately; and i can only assume that
i am out of sync with the direction things are going -
discussion like this look to me like the goal of this project is
going way off track from the original intention, reaching far
beyond the basics of what needs to be accomplished - i am having
a hard time understanding why this questions needs to be asked -
it looks like yet another way in which the use of activity-pub
and its jargon is complicating something that should be very
simple - i would assume that everything in this post is already
plainly obvious - if it is not, then i have no idea what
forge-fed is trying to do anymore

forge-fed only needs to do three general things:

  1. allow all data to be migrated from one forge to another,
    just as pagure already does

  2. allow foreign users to initiate common forge operations,
    just as if they were a registered user of that forge,
    but using their own home-server credentials instead

  3. allow foreign users to receive notifications
    of subscribed forge events

that is the core functionality that most people will want -
anything beyond that should not be in the initial spec - there
is ample design space for special enhancements after the core
features are specified as a usable base set; but those could all
be defined in optional “sub-specs” for those who want the special
features - for example, the only special feature that i have
proposed is the ability to sign comments; but that is not one of
the “common forge operations” that any existing forge can handle
currently, so it does not need to be on the design table just
yet, or ever really

to any such questions like: “where do we store … [anything]?”,
the answer is always: “in the database of the forge where the
maintainers of that relevant code have total authority over it” -
in this case, if the ticket is to be posted against a remote
repo, then “the real actual ticket” will be hosted in the
database of the remote forge - as far as that project is
concerned, any data that is not in the forge database, does not
exist; and any representation of some data that is in the forge
database, which does not correspond exactly to what is in the
database, is presenting something that the maintainer is not
aware of or accountable for, and is therefore incorrect - if
some ticket is to be interesting to the maintainer of
my-server.net/my-repo, then that ticket must be stored in
my-server.net's database - any long-lived copies of
activity-pub objects, that correspond to something in some
forge’s database, must respect the authenticity of that
database, and update their state indiscriminately according to
the whims of the authoritative repo maintainer; or else the
“copy” floating about in the cloud is a fiction, not actually
corresponding to the reality of the state of the real issue
tracker

it looks that the proposal is that these tickets should have a
life of their own “in the cloud” - that would be a nifty thing
if the goal of forge-fed was to make a decentralized forge
system based entirely on activity-pub, with no database; but that
is not the goal - the goal is to allow forges, which are
inherently centralized, to operate just as they normally do, but
to allow them to inter-operate in whichever ways that is
sensible

that is what federation is: “inter-operability” - it has nothing
to do with shared authority or data - a federated service is
much more like a centralized server than decentralized service;
because every admin has total autonomy and authority over the
users and data on the local system - the only difference from
centralization is the ability to share users and data; but with
no obligation to do so - the difference from decentralization is
essentially the obligation to share data or other resources - in
many decentralized softwares, the admin may have no knowledge of
which data or computations the local machine is handling; with
the only control being to kill the entire process

for example, the git-dit project is a decentralized issue
tracker - that could be a forge-fed peer, accepting the very
same ticket-related messages as forges do, and handling storage
and representation in its own unique way - forge-fed itself does
not need to extend into implementations or storage - it only
needs to send messages between interested parties - all such
messages are requests for something to happen, which may or may
not actually be honored by any party - if some state is to kept
elsewhere than the target repo, it had better reflect what
actually did happen to that repo’s database, and not merely
what someone hoped would happen

a decentralized forge based on activity-pub, could be the goal
of a new novel forge like vervis; but if existing forges are
going to make any use of forge-fed, then activity-pub can be
used as nothing more than a stateless communication protocol, so
far as it concerns any forge operations - all AP objects should
be considered to be transient and possibly deleted by the
receiver forge, shortly after being received - any statefulness
of activity-pub objects is meaningful only to peers that are not
forges nor bug trackers; and they are inherently fragile, and
prone to presenting a false reality, if taken as authoritative

any data or state, existing only in the form of an activity-pub
object, could perhaps be authoritative with respect to itself;
but it would not be interesting (or even known) to any forge
or it’s maintainers, because it would not correspond to anything
something in the forge database, and there would be no way to
handle it or to represent it

in the context of a distributed BBS service such as mastodon,
they are authoritative; but only about themselves - their only
consequence is that they are presented (or not) on one instance
or another - there is never any contention regarding their
semantics; and it is unimportant where the are hosted, or
whether they get deleted here or there - there is also no
confusion about whether or not they reflect the state of some
system, or what happened to put it into that state - the very
existence of the comment itself is it’s own state, and all that
ever “happens”

in the context of forges though, the messages are semantic
instructions that have consequences, and the forge database is
the single source of truth reflecting those consequences (and
thats not: a “should be”, but it is an irrefutable “is”) - any
long-lived copies of activity-pub objects must reflect the state
of the database of the canonical forge repo, or else they are
stale orphans, presenting inaccurate information - they can not
be the canonical source of truth themselves; but they are either
representations of something is in the forge database, or
requests to modify the forge database somehow, which may or may
not actually happen - it makes no sense to author a “local”
ticket against a repo on someone else’s server, if the
maintainer of the remote repo is expected to ever see it - of
course it must be transmitted to its true destination - when
that happens, it will be logged into the database (maybe) and
only then, that will become the reality - after that, the repo
maintainer would have total control over it - any other
representations of it will simply not be genuine - surely, there
are some caveats regarding how comments can be updated over the
activity-pub channel; but the the state of the forge database is
the final truth, and everything else must respect that, or it is
incorrect

the only important distinction to make, when some remote data s
replicated, is whether it is maintained as a fork or
un-maintained as a mirror

if the local server has a clone of some remote repo or its issue
tracker, then that is a distinct clone (a fork), not the
identical project - any tickets posted to the fork’s issue
tracker should never be propagated to the foreign repo - those
tickets belong to that fork, they exist in the database of the
fork’s forge, and they are not interesting or even known to the
maintainer of any other fork

if this is about mirroring repos and their tickets (not
forking), then the local user should have no authority over, and
no ability to directly modify, the state of any of the locally
mirrored data - all interactions with it would necessarily be
done on the server of canonical foreign forge, merely mediated
by the local forge-fed peer on behalf of the local operator -
only once the interaction has been handled by the forge of the
canonical foreign repo, then the result would be simply mirrored
back to be presented locally - that is how mirrors work - if the
local user would directly modify the state of the mirrored repo
or tickets, in any way, then it would no longer be a mirror, but
a fork; and would be no longer interesting to the original
maintainers, because they clearly have no authority over it -
that is what merge requests are for

the maintainer of the repo where the ticket was posted against,
must have total authority over the life-cycle of the tickets and
all comments on the repo that they maintain, or else there is
chaos; and maintainers will not want to use forge-fed - they are
the ones who are actually handling the progress of that ticket
after-all - it is the only way to ensure that the state of that
ticket is an authentic representation of the actual activity that
is happening regarding that ticket

for example, the author of a ticket or comment against a foreign
repo, who is mirroring the tickets locally, should expect that
the ticket may be rejected, or closed, or their comments deleted
by the foreign maintainer, and their local copy would passively
and faithfully reflect the decision of the foreign maintainer -
any deviation from that would be a recipe for abuse and confusion

2 Likes

ok i just read this entire thread - wow-wee :slight_smile:

the two proposals A and B suggest to me that this is trying to
do way too much too soon - my vote is “none of the above” -
instead, keep the initial core spec as simple as possible,
without introducing any new, unfamiliar workflows - that is:
only the things that are absolutely necessary to allow forges to
inter-operate, using only the features that forges already
have in common - then anything complex or innovative can always
be added later, as an expanded revision or separate optional
workflow-specific specs

to accomplish the essential forge-to-forge communications, i
dont see any compelling reason to use activity-pub for anything
other than a data format for transmitting messages, then
discarding them - all of the other magic spells that
activity-pub could conjure up, including mastodon
inter-operability, sound wonderful; but i would consider them to
be non-essential special case workflows for future iterations of
the spec - right now they only seem to be impeding progress

regarding tickets specifically, i can only imagine the need for
six messages:

  • ‘post-ticket’
  • ‘post-ticket-comment’
  • ‘edit-ticket-comment’
  • ‘list-tickets’
  • ‘get-ticket’
  • ‘get-ticket-comment’

there no need for anything more elaborate than that, in order to
make forges inter-operate - all messages are signed in the same
way by the sender, and correlated to one of the forge’s phantom
users; so there should be no doubt about authenticity or
authorization

both of those proposals require the activity-pub objects to be
stored indefinitely and treated as authoritative - as it is
today, all data related to a project is stored in the database
of the forge hosting that project - that is incontrovertible and
non-negotiable - its just how forges work - no existing forge is
equipped to verify the contents of some data, other than GPG
signed git commits; and most of them dont even do that - its a
great idea, that i have been proposing; but SSH or GPG would be
better tools for that - for now, the signature on the AP
messages only needs to correlate the sender with a phantom user
in the forge database; because that is all that the forge will
present anyways

the idea of the sender being the authoritative source for
hosting the data, which is intended for other forges to present
as authoritative, just smells wrong to me; and would not be very
convincing anyways - for one thing, it would not be presented in
the original context of the destination where it is actually
relevant; which makes it dubious evidence of anything

anyone elses copy of any of data hosted by a forge is not
legitimate unless it could be verifiable, as an exact replica,
and being presented in the original context, just as the upstream
maintainers saw it and handled it - the pagure approach of
keeping all issues under git could do that, but only if every
comment was signed by its author and the forge signed off the
entire repo, attesting it as being the actual data that was once
hosted on the forge - but again, no forge actually does that
today - so there no reason to consider that for the core spec

and yes, the maintainer of the repo can modify everyone’s
comments; and the admin of the server can delete anything they
dont like - thats just how the cookie crumbles - its the only
way to control spam other than locking people out; and i dont
think that any project maintainer or sysadmin would want it any
other way - i have ben proposing GPG-signed comments all along -
that would detect any changes made by others to the original
message; but i dont see it as anything to lose sleep over

an issue tracker is not a venue for personal expression of
opinion - it is a issue tracker, for tracking the progress of
work that is being done - nothing of a personal nature is
appropriate or relevant in that context - personal expression
belongs on forums, blogs, mailing lists or other more
appropriate venues - the “role” of the person who opens a ticket
is only to ask the maintainers to do something - the maintainers
may do that thing and they may not - its purely business - it
does not need to be glorified and there no need for forge-fed to
try protecting freedom of speech - if the maintainers do not do
what you ask them to do, then thats really the end of the story -
“fork it” - the simple fact that one can host their own forge
and clone the freely licensed code to it, is all the freedom
that anyone needs regarding software

1 Like

There is something which was not mentioned, it seems: privacy. Hosting the issues on the author’s repos creates a privacy risk. When I visit the issue list of a repo, I don’t want my browser to make a lot of requests to various servers. Such an architecture would allow a bad guy to create an issue, just to be notified of visits to the repo.

@bortzmeyer this discussion is about the federation level, not UI level. And if someone did write a platform that doesn’t cache anything and someone wrote a client that fetches them on demand based on their ID’s - the whole point of federation is to have remote objects. That is how ActivityPub is designed, you have URL ID’s which get resolved (usually server side, not browser side).

Let’s agree on a goal for decentralization. Here’s my proposal, everyone please comment. Stuff that should be possible to have on different servers:

  • User can be member of remote group/team/organization
  • User can be collaborator or remote project/repo

Why is “repo and issues on same server” centralized while “all files under .git kept in a dir on same server” is acceptable? Where do we draw the line? Even in the fediverse, author and their toots are on same server. Toot and it’s collection of ID URLs of replies, are on same server. Author and their follower collection, are on same server.

My proposal: Projects, users and groups decentralized; project can be considered an atomic entity which reserves right to store its sub objects (git objects, issues, wiki pages, CI build tracking, releases, kanban, milestones…) together on same server.

Stuff not in goals to decentralize:

  • A project/repo and the group/team managing it
  • A project/repo and its content objects e.g. the object files under a repo’s .git/ dir don’t have to support being split across servers, same for issues and wiki pages and CI builds and so on. The content of a project can just be together with the project on the server. Supporting splitting is OK, but only if there’s no authenticity and reliability compromise.

Where does the requirement about “agreed upon by the repo team” come from? This is not true even in centralized forges currently. I can log an issue into any public GitHub repository and the repo team does not get to approve my issue. I’m not sure why this would be the case in the decentralized world.

You’re free to open an issue, but repo team should be able to enforce access control and CoC rules. So in that sense, your issue listed under a repo means repo team is okay with it. Think about issues in general, not specifically in forges: an issue list is like a to-do list of your work plan. And you want the confidence things don’t change, get edited out, etc. without you feeling in control over things. Like the to-do list on a refrigerator. Repos in forges are a “wild” example where sometimes locking, banning etc. are done. And project team should, I think, have the ability to say “this issue’s text is of high importance, no edits allowed”.

Information in a decentralized world can get out of sync easily (especially with push only protocols with no state like ActivityPub). If the author updates their copy but does not send it to the remote repository, then they make that choice for some reason. The repo team will likely have their cached copy and if they do end up GETting the author’s copy, they lose no more control than they lose with created issues.

Hmm I’m not sure :-/ when people HTTP GET the issue, they’d see the edit the author made, that repo team isn’t even aware of. Is that a desired situation? Imagine author changes “critical issue, be careful” into “issue solved, it’s safe to install now” and people believe that lie for a whole week because repo team didn’t get notified on the malicious/accidental edit. Maybe it’s a silly point, I want to say honestly that I bring it from an emotional place: the wish to be confident about the content of my to-do lists. It’s not about issue creation (don’t like an issue some opened? Just remove from its ID your list, problem solved), it’s about an issue you want, maybe got many important comments, and you want to be sure you’re in the loop about any edits made, including ability to forbid edits.

The create and any updates should follow inbox forwarding rules to reach all the servers that are interacting with the target repository. But I don’t see why everything should be accepted. When one comments on a post, remote servers don’t normally accept the comment. This doesn’t even happen in centralized forges.

Agreed. It’s also possible to just send Reject on CoC violating comments, or comments on issues where commenting is disabled. Trouble is, how can repo team be sure it’s enforced? I can imagine a heated discussion, repo team wants to lock comments, author feels like listing some new comments under the issue’s replies collection. Repo team finds themselves powerless to control the issue database of their own repo. How do we solve that in the 1st approach?

This is a major problem with the Offer flow. If I create an issue in a repository, I don’t want to lose my authorship on that issue to the target server. I want it to be authored by me.

No problem, I suggested a solution there. Even a copy of an issue can list you as author, and link to your Create/Offer/Ticket that you host as proof you’re the author, so in both approaches authorship isn’t lost.

I don’t believe this is a great reason for one of the most important details of the spec that people can post issues easily from social networks. The primary details IMHO should be “how to implement a forge” and IMHO we should assume that implementers of the spec are forges who implement the spec correctly.

Actually I’d like to allow e.g. gitolite to implement git repo federation stuff without having to implement federation of issues, wiki, CI… if you host a repo’s issue, suddenly you’re responsible for implementing all the issue stuff and managing it for the whole lifetime of the issue. I’m just wondering if it’s the right thing to do (repo team is the party with personal interest in issue authenticity and correctness and availability through its lifetime, feels right that it’s up to them to implement and manage stuff). Like email, you send it to report some bug but not required to host anything.

Besides, even if say, Mastodon, added support for Create { Issue }, the side effects and such are pretty much the same for an issue as for Create { Note }, in the sense of inbox forwarding which is probably the most important detail so that everyone receives the activity.

Consider not just issues, also patches and MRs and wiki pages… even issues alone: close, reopen, assign person, unassign, add/remove dependency, add/remove labels, milestones, edit access controls (e.g. forbid comments, edits, etc.), list related issues, list new comments, related MR… some of these are plain simple Update{Issue} but some aren’t. And each feature you’re missing is a feature repo team can’t use with issues you open. But fair point, basic stuff is just Create and Update on the issue.

I’m still not convinced Offer is better than Create. I think it goes against the idea of federating forges since it feels like forges become just API’s to each other instead of the forge objects (issues for example) being really decentralized. As soon as you Offer something, you lose control of it. You can’t even interact with it to fill in details until the remote server sends you an ID, which could be a huge problem with servers that are not reliably available (the real world of the federated web).

You could have those as drafts until you get the ID, or even apply the edits to your copy. I think this is a smaller problem because if repo is down, nobody can HTTP GET its list of issues anyway. And you can’t even toot some message to repo followers because repo server isn’t online to do inbox forwarding. On the other hand, a repo with issues hosted on lower-reliability servers can be a big pain. The entity that needs the issue is basically the repo team, so that they can work on it.

I don’t think it goes against federation because projects and users can be on different servers. When you send bug or patch by email, it’s visible and hosted on project public mailing list, and proof of authenticity is via your PGP signature, also via DKIM etc. of your mail server. This is not very different from what approach 2 does.

When you send a patch/MR to a git repo, should you host the git objects resulting from merging your code? And then when people run git clone on their laptop, it should actually fetch from multiple different servers? How deep do we want to dive with distributing the storage/hosting of sub objects, and why :slight_smile:

This brings me to the only bad thing I can think of the Create flow. Repository owners receiving a remote Issue would normally not be able to send an Update { Issue } out. This however could be written into the specification that receivers of an update to an issue should respect updates from the repository owners as well as the issue author. Any well behaving Forgefed server would then allow the repository owners to also update issues created by others.

That’s actually not a problem, either way we need to do access control and will probably use OCAPs for this :slight_smile: however I do see a problem in Create flow: authority on access control (repo team) isn’t the entity hosting the controlled object (the issue).That’s common in P2P but unusual for “just” decentralization, and it complicates the whole handling of access control and reliability of its enforcement, making the complexity resemble P2P. But Spritely isn’t here yet and it’s scary to me to commit to that complexity, especially since the Offer approach allows for “traditional” simple management the way OCAPs would be used everywhere else (e.g. on Groups and Forums and Events etc. once we have those on the fediverse).

I hope this clarifies some thoughts on the issue. Unfortunately there is starting to be so much text for this it’s hard to keep track of things. I do believe Forgefed should try to keep as close as possible to standard ActivityPub behaviours. Federation and decentralization are hard problems. But we shouldn’t avoid these problems by watering down solutions by encouraging servers to avoid remote objects. I still fail to see any forge related requirements. I know there are some above, but I don’t think they are valid as (hopefully) explained above.

I hope I clarified some of that. With the Offer approach I’m actually trying to have a clean elegant actor model approach, while prserving the situation where servers host the objects on which they manage access control. In that sense, approach 2 is more traditionally following safe established stuff, the way I see it ^ _ ^

How to get forward, since we’re clearly not going to reach common ground here?

I’ll post on Fediverse, forges and AP dev places.

I know it’s been a lot of text. I’ll try to make a summary of everything if we don’t make progress with what we have :slight_smile: and I’ll comment on your diagrams in a separately reply.

I posted a question on SocialHub. @jaywink, I hope we get some useful feedback there! I tried to simplify the debate into a simple question, I hope I did it well ^ _ ^

1 Like

I don’t understand. Why would ForgeFed not allow repo teams members to be decentralized? That would be very limiting if all project team members need to be on the same server.

Also I don’t understand the .git reference. Git doesn’t have a location, it’s decentralized by nature. You clone it and you have a copy. You fork it and you have a copy with your own url. I would expect fork to work similarly in ForgeFed. If I fork https://alice/repo to https://jaywink/repo then I have a fork pointing to my server.

Why would these not be goals?

They can. They can close the issue, edit it or even delete it - just like in centralized forges.

This is just one use case and it’s not even the GitHub one. This use case shouldn’t rule the whole spec.

I think this “attack vector” is being given too much space in this discussion. Also this is simply how the federated web works. If an object owner changes their object, it might be different than cached copies on the other servers. You can’t decentralize and centralize at the same time - you have to choose one.

Then just code a server that rejects any remote Create and just accepts Offer - done :slight_smile:

I’m starting to believe nightpool and aaronpk were right in the SocialCG room suggesting that both are valid cases. Didn’t even think of that, I’m glad they brought it up.

The Create flow I’m suggesting is closer to the traditional fediverse way of distributing content and the Offer flow is something that will suit better workflows like your todo list example. As there is no concensus in the working group to which one should be the one and both clearly have advantages and problems and neither solves all use cases, ForgeFed should document both. Server implementers are then free to choose which one applies to them, depending on whether they are building a replacement for GitHub or a project todo list which requires centralizing the issues under the one server and acceptance of all created issues.

Opinions?

Isn’t there a contradiction between remote collaboration and organization membership and centralized management?

1 Like

I don’t understand. Why would ForgeFed not allow repo teams members to be decentralized? That would be very limiting if all project team members need to be on the same server.

Oh, sorry, I probably phrased it poorly. Project team members can be on different servers.

Also I don’t understand the .git reference. Git doesn’t have a location, it’s decentralized by nature. You clone it and you have a copy. You fork it and you have a copy with your own url. I would expect fork to work similarly in ForgeFed. If I fork https://alice/repo to https://jaywink/repo then I have a fork pointing to my server.

When something has a clone URL, then that’s the location. What I mean is that a project consists of many sub objects: The issues, patches, MRs, wiki pages, git (or other VCS) repos and so on. And even those have their own sub objects: Each git repo is many files in a .git/ dir. Each issue has its collection of replies, dependencies and so on. We have an assumption, that when you host a git repo, you host all the files in the repo together. Sure, people can clone your repo, but your copy that is in your authority has the .git folder on your server’s filesystem. We aren’t trying to decentralize it, allow each git commit/tag/branch to be on the server of the person who created it. So what I’m saying is, the choice of how deep we want to go with allowing the small sub objects of a project to be split across servers is a somewhat arbitrary/flexible choice. Conceptually, you could think of an issue as a git repo with 1 text file, and then supposedly now “issues don’t have a location and are decentralized by nature”, right?

I just wanted to ask the question: If a specific repo with a specific URL/owner and all its commits/branches/tags are together on the same server and that’s okay, why wouldn’t it be okay if a repo and its issues are on the same server? Why are these two cases different?

You’re free to open an issue, but repo team should be able to enforce access control and CoC rules.

They can. They can close the issue, edit it or even delete it - just like in centralized forges.

If they’re lucky and author’s server cooperates with them and applies the edits :slight_smile: anyway, this point is covered, moving on.

Think about issues in general, not specifically in forges: an issue list is like a to-do list of your work plan. And you want the confidence things don’t change, get edited out, etc. without you feeling in control over things.

This is just one use case and it’s not even the GitHub one. This use case shouldn’t rule the whole spec.

Technically, a list of issues is a to-do list of stuff to do. Bugs to fix. Features to add. The issue part of ForgeFed is about task/issue/project management in general. I often use issues to manage and track work on a public project at a public spot, and I’m sure I’m not the only one. This use case shouldn’t rule the spec though, I agree, we just need to be sure the use case is supported and possible.

Hmm I’m not sure :-/ when people HTTP GET the issue, they’d see the edit the author made, that repo team isn’t even aware of. Is that a desired situation? Imagine author changes “critical issue, be careful” into “issue solved, it’s safe to install now” and people believe that lie for a whole week because repo team didn’t get notified on the malicious/accidental edit.

I think this “attack vector” is being given too much space in this discussion. Also this is simply how the federated web works. If an object owner changes their object, it might be different than cached copies on the other servers. You can’t decentralize and centralize at the same time - you have to choose one.

The Fediverse doesn’t have that use case right now, as far as I can see. So it can’t be how it works. A toot’s purpose is to reflect the author’s thoughts, so toot editing would just mean it’s a more accurate up to date reflection. And if someone relies on an outdated cache, the worst scenario is that they see something you said in the past and haven’t seen the new version yet. It’s like watching yesterday’s news, not being aware yet that there’s some more stuff. With issues, an edit by author may sometimes not be desirable at all, from repo team’s point of view.

I’m starting to believe nightpool and aaronpk were right in the SocialCG room suggesting that both are valid cases. Didn’t even think of that, I’m glad they brought it up.

Me too. Actually, I’d like to make a new proposal: Let’s support both.

The Create flow I’m suggesting is closer to the traditional fediverse way of distributing content and the Offer flow is something that will suit better workflows like your todo list example.

The Create flow is how purpose-is-expression-of-author-thoughts is distributed. But there’s something fundamentally different about objects that may get a long long lifetime of edits and updates outside of the control of the original author. However, if we remove the assumption that repo team is accountable for all the content and updates etc., I agree! I have some technical points though, I’ll write those at the bottom of the post.

As there is no concensus in the working group to which one should be the one and both clearly have advantages and problems and neither solves all use cases, ForgeFed should document both. Server implementers are then free to choose which one applies to them, depending on whether they are building a replacement for GitHub or a project todo list which requires centralizing the issues under the one server and acceptance of all created issues.

Even if you build a forge like Gitea/GitLab/githu8, you may wish to support hosting issues on repo team side.

Opinions?

Let’s support both. Do we agree on that @jaywink? Both flows supported and documented in the spec, and all assumptions in the spec take both approaches into account, making sure they’re both possible.

Technical points to resolve

@jaywink, if we agree on supporting both flows, there are some little technical questions I’d like to answer.

Be patient with me for just a bit more, I hope we’re almost done with this huge thread ^ _ ^

(1)

On the Fediverse, if you reply on my toot, then my toot now lists the ID URL of your toot under its replies. Am I right? There’s no need to Offer anything; you merely Create a Note and that’s enough for my server to know that it can list your toot as a reply to mine, so that people reading my toot can see your reply, and all the other replies of course. Am I correct so far?

Now let’s look at the Create flow for issues. If I understand your proposal correctly, it works the same, right? You Create an issue, and the repo then lists its ID URL under its list of issues, right? There would be some property for that, say issues or whatever, much like replies is used for listing replies on a toot. Am I correct so far?

Now let’s look at the Offer approach for a moment. Imagine I want to Create an issue, then do some edits and tweaks, and then finally I want to Offer it to the repo, and let them make a copy and take it from there. In other words, before I Offer an issue, it’s possible I separately Created it earlier. Correct?

Now we have a little problem: When you Create an issue, it automatically implies to the repo that they should list your issue. Maybe I don’t want that, maybe I want to make some edits and later I will Offer them the issue and not have to host it or worry about it ever again. This problem doesn’t exist on the Fediverse as far as I know, because once you publish a toot, it’s clear that it’s desired and safe to list it under the toot on which it replies (if there’s such a toot). When you Create a toot, you basically report its existence to the world. But here, since we have 2 flows, it’s not clear whether Createing an issue means you’re reporting a bug to the repo, or you’re just publishing some piece of text that you’ll want to edit and send it later.

What do we do?

IDEA: When you Create an issue, even if you later want to edit it and Offer it, it’s okay if the repo already lists your issue under the repo’s list of issues. There’s no harm in that. Later, when you Offer the issue, repo can say “oh we already list your issue but now you want us to host it” and repo makes a copy and lists the copy instead of the original issue you’ve been hosting (but their copy may still link to your issue, so author authenticity proof is preserved etc.).

Thoughts? Objections?

(2)

Now there’s another little question. Someone Created an issue, but repo isn’t listing it for some reason. Maybe delivery failed. Maybe issue was accidentally treated as spam and ignored. Or idk, any reason. But people see the Create activity and they’d like to make sure the repo knows about this issue and lists it. Perhaps it would be useful to allow an issue to be reported even after its creation, e.g. using an Announce activity?

According to AS2 vocab spec, Announce can have a target which is the entity to whose attention you’d like to bring the object. So an Announce with target being the repo (or even without target, I suppose, simply detecting that the Announce object is an issue on the repo) would be treated as a way to report an issue, and the repo would list the issue on its list.

Supporting that also means you can Create an issue privately, discuss and write it gradually, maybe even collaboratively with other people etc., and then make it public and Announce it when you want to.

Thoughts? Objections?

(3)

Another point: When you Create a Note as a reply to another toot, and that toot lists yours as a reply, you don’t get an Accept. I guess there’s no need for that? I guess in your client UI you’re redirected back to the toot on which you replied, and you can see your reply there. So, with issues, when you Create an issue and the repo lists it, is there a need to send an Accept to let you know your issue got listed?

Unlike with toots and replies, for issues perhaps some person/project would like to manually review issues before listing them, and also if you post some critical bug you’d want to be confident that it’s listed and everyone can see it. There’s no harm in sending an Accept. I think it would be nice and useful. But in the spec, should we require that an Accept it sent once the issue is listed?

PROPOSAL: Let’s say in the spec that the repo’s server SHOULD send the Accept, since that’s also the statement for Accept on a Follow in the AP spec:

(From https://www.w3.org/TR/activitypub/#follow-activity-inbox)

Follow Activity

The side effect of receiving this in an inbox is that the server SHOULD generate either an Accept or Reject activity with the Follow as the object and deliver it to the actor of the Follow. The Accept or Reject MAY be generated automatically, or MAY be the result of user input

So we’d say the repo’s side SHOULD send an Accept if it lists the issue, and SHOULD send a Reject if it sees the issue but decides not to list it. We could also add those parts about manual review and “MAY” stuff about not sending a Reject to protect privacy and so on.

Thoughts? Objections?

(4)

Hmm there’s also the question how the issue author would know which issue edits/updates to apply and which to reject. They’d get some Update with an OCAP attached, and would need to figure out whether the OCAP gives the actor of the Update permission to perform it.

PROPOSAL: We don’t have any OCAP stuff in the spec and we’ve haven’t made any decisions on OCAP usage. Even if it turns out to be complicated for author to use OCAPs, we could decide to rely on repo actor sending an Accept or an Announce on approved edits, and that would make it very easy for the issue author to know which edits to trust. The repo actor would essentially do the OCAP verification for the issue author, and just “give them green light” to apply the edit. So I’m proposing to leave this question not finally decided. To have this proposal documented, but finalize the decision only after more research into OCAPs (for example, if OCAPs are publicly verifiable and the role/permission system has a clear spec, then there’s no need for those Accept/Announce tricks).

Thoughts? Objections?

(5)

And one last point. I propose the following rule: When you Offer an issue, that means you want the repo to host it. If they Accept it, that means they host a copy, and the result of the Accept is the ID URL of that newly created copy. When you Create an issue, that means you’re okay with hosting it, maybe you even want to host it. The repo’s server gets to choose whether it lists the ID URL of your issue, or makes a whole copy. We will recommend that by default the repo’s server should just list your issue and let you host it, but it’s also possible it instead makes a copy, there’s no guarantee. We could explain in the spec in which cases it would be desirable to make a copy automatically, and suggest that people don’t implement stuff like that unless there’s a real reason (e.g. to closely protect some critical security issue, or manage a to-do list, etc.), and remind them that they can always make a copy manually/later if something isn’t right.

So, when you Offer an issue and the repo wants to list it, it MUST make a copy. When you Create an issue and the repo wants it, it SHOULD just like the ID URL and let you host the issue, but it MAY decide to make a copy. Then comes text about recommending not to copy and explaining/giving examples in which cases the copy would probably be the desirable option.

Hmm but maybe “repo MUST make a copy” is too harsh? Perhaps just SHOULD make a copy? What does it mean, if you Offer an issue but repo lists the ID URL instead of making its own copy? Perhaps we should allow repo servers to never host their issues? That way simple repo-only servers don’t have to handle issue stuff, they always want the issue author to host it. So, upon Offer, repo SHOULD host a copy, and upon Create, repo should just list the ID URL, but neither is guaranteed. However, if repo sends an Accept, it MUST do one of those two things, and which one happened can be detected: If the Accept has a result field, then a copy was created; if not, just the ID URL got listed. Sounds better? :slight_smile:

Thoughts? Objections?

It might be a flexible choice, but in my opinion this flexibility is effectively resolved by the requirement of Forgefed to be easily integratable into existing forges.
This requirement effectively dictates ForgeFed to be as conservative relative_to_existing_forges (not relative to AP) as possible while still providing basic federation functions (regarding issues) such as:

  • Opening an issue without a local account
  • Commenting on an issue without a local account
  • [Optional] Editing an issue without a local account
  • Closing an issue without a local account

Remote editing is optional because one can add/change info irc-style with subsequent comments.

The requirement to support existing forges doesn’t dictate much, since the existing forges aren’t federating. They’d need to do some non trivial dev work, no matter what. Remote editing isn’t optional, because object updates aren’t just updates of text fields, they’re updates of object fields in general. An Update could do stuff like close/reopen an issue. Set its milestone. Set assigned person. Modify the issue labels. A very very basic minimal spec could do without those things, but obviously we want to have them.

Anyway, the discussion now is on the new proposal to support both flows. So we won’t need to make a compromise. Take the best of both worlds.

I think it’s an unnecessary question because AFAICT we’re not trying to decentralize git, we’re trying to decentralize the social interactions on top, the collaboration level. Sure it’s possible there are different levels on which ForgeFed can tackle things. I don’t think Forgefed should try to replace a VCS, because that would be too much and unnecessary. Git is already decentralized - clone it and you own a copy! But the social interactions (merge requests, issues, discussion, likes, follows, etc) are on centralized silos. That is what we need to allow to happen across many instances.

A specification cannot ever make it impossible for bad implementations to ruin the day :wink: A federation specification only works as well as the implementations that implement it. You can’t force any server to be a good actor.

No, not always. Repositories are often used as feedback channels or even for discussion boards - think of the original GitHub forgefed repo for example.

The fediverse is much more varied than Mastodon and microblogging - so I’d avoid using the word “toot” since it implies “something under 500 characters”. Notes are also longform, for example blog posts.

Yay! :+1: from me!

Replies to technical points:

(1)

I think you’re basically talking about a draft mode right? Easy - your server should not send it to any remote servers (or even other local users) before you’re ready to send it out. This is a platform implementation concern, not an ActivityPub concern, even C2S since it doesn’t recognize draft so it needs to be implemented on the platform level.

(2)

I’ve never seen an Announce used on a Create but I would certainly expect people to Announce an Issue - it’s basically sharing it to their followers. I’m not sure I understand the problem you describe. The normal way to deal with network problems imho is to implement retries or pull the object once discovered (after Announced by someone for example).

Possibly you’re issue is with C2S while I’m thinking of only S2S since I’ve never seen C2S in the wild.

(3)

I would be ok with indicating servers SHOULD Accept an Issue that is created, though I would say only the server that is home to the repository where the issue is created? Other servers might also have the Create { Issue } delivered.

Sounds good.

(4)

I agree on the Accept but I’m not sure why Announce would be used? When delivering updates to other servers, one would just use inbox forwarding right?

(5)

I’d encourage not to encourage copies, of anything. I thought the Offer was sent without an ID and the repo owners then send the ID in a Create to their followers?

(1)

Actually no, I didn’t mean draft mode. I agree with you on draft mode, no sending to other servers until done writing draft :slight_smile: I mean if an object gets Created for some reason before it’s ready to be Offered to the repo server. I don’t have a really good example of that, I guess it wouldn’t happen often. Hmmm some examples I can think of: Discourse has drafts stored on the server AFAIK. What if you Created such a draft and one way or another it ends up reaching the repo server. Another example, some other kind of object, not issue. Say, a patch that gets Created and discussed. Obviously, there’s no need to address the repo actor before your issue/patch/etc. is ready, so these examples are weird.

DECISION: When repo server sees a Create {Ticket} referring to the repo, it can just proceed to add the issue’s ID URL to the list. If you write a draft, you probably don’t send it to repo until you’re done anyway :slight_smile: If any problems arise in implementations, we’ll consider adding some hint to let the repo server know it shouldn’t make a copy, or something like that. Maybe specify target in the Create. For now, no need.

(2)

What I mean is that if Createis the only way to open an issue, then once an issue got Created, it’s no longer possible to report it. If for any reason an issue got Created but not listed by repo - for example someone accidentally pressed a Reject button, or the issue got automatically flagged as spam and ignored, or the author/client didn’t list repo actor in to field, then it’s now impossible to report the issue again. So I’m suggesting that Announceing an issue to repo actor is an additional way to report it, i.e. if repo actor sees an Announce of a Create{Ticket}, it proceeds to list the issue. That way, any mistake/bug/whatever when Createing the issue doesn’t prevent reporting it again. It’s just a little harmless thing.

DECISION: For now, for spec simplicity, no need for this at this point. I’ll open an issue about this and review it again in the future.

(3)

Agreed, only the server hosting the repo SHOULD send the Accept.

DECISION: As described in my earlier post, where “server” is the server hosting the repo / issue tracker.

(4)

Oh, I mentioned both Accept and Announce as potential activity types to use for this. I mean, basically the purpose of the activity would be repo server saying “I approve this edit”. I suppose Accept makes more sense for this than Announce does.

DECISION: Repo actor sends Accept to indicate access control green light. For now, omit from spec if not strictly needed for other stuff. But when we deal with OCAPs, this Accept is a simple way to authorize edits.

(5)

The Offer can be sent with or without an ID, but either way behavior is the same: Repo server “SHOULD” make a copy.

Hmmm idk about encouraging not to make copies, worried about not having an answer for those cases where reliable control of the content is important.

DECISION: For now, for simplicity, the Offer should contain the whole issue object, and the issue object shouldn’t have an ID. In the future, for completeness and/or if stuff arises in implementations, we can add the additional case where the Offer merely specifies an issue ID URL, in which case repo server HTTP GETs that URL and copies the issue from there. As to encouraging copies, for now write the spec/elsewhere that the recommended behavior is to list the ID and not make copies. Link from spec/note/wherever to this thread or to an issue, where the potential cases for a copy would be listed. That way we can “test the Create flow in the wild”, and add recommendations for the Offer flow if challenges arise. Servers that support only the Create flow could probably just Reject when they get an Offer. We can add some smarter indication later if needed.

Ok, done :slight_smile:

I’m marking this thread as solved. If anyone has more comments/objections/questions, write them below. There will of course be a review round when all this stuff is documented in the spec. Big thanks to everyone who read and wrote stuff here! Especially @jaywink :slight_smile:

1 Like

Definitely more of a focus needs to be had on the client to server API … as well as more documentation so that ActivePub people don’t have to scramble to find development avenues.