@criztovyl's ForgeFed web log - TFO Edition

This is the continuation of my ForgeFed web log from socialhub.

A friend and me are currently thinking about whether I could maybe write my bachelor thesis on federating software development collaboration platforms (i.e. ForgeFed). I am little bit struggeling with the academic PoV here (actually since fosdem 19), but let’s see. :slight_smile:
Also there I might attend a small (private context) hackaton soon, I think I might try to hack on something for ForgeFed there. Maybe a little Rails app PoC for repository issue federation.

2 Likes

i suspect that struggle is a consequence of over-emphasizing the so-called “social” aspects of software development, casting it as intrinsically web-centric, and devaluing the practical merits of federation, as i often have seen people do

github, like sourceforge and others before it, are not collaboration platforms - they are code hosting and project management services - the fact that they are network services and not real toolkits or platforms is an important distinction on its own; but not germane to the clarification id like to make

more specifically, collaboration is just one aspect of project management; and one that is not intrinsic nor peculiar to it - a software project with a single developer/maintainer may not desire collaboration of any kind; but project management is nonetheless essential, even if that is in the form of scribbled notes on napkins - for projects that desire collaboration, email is fully adequate for that purpose; and has served developers well since may years before integrated project management websites existed, and continues to be widely used today - when git came along, it was designed to integrate naturally with the well-established patches-via-email workflow - no website is or was ever needed to allow convenient, globally-accessible collaboration - the internet alone facilitates that

sourceforge for example, served the purpose of web-based project management and code hosting for years before github; and collaboration was not, and still is not among the explicit features of that website - that was not an oversight; but something that was simply not required, because most projects preferred to handle patches manually, and to discuss them on a mailing list (which sourceforge provides as a project management essential; but github does not) - sourceforge also provides issue tracking which is a concern that is quite distinct from code review - github provides only an issue tracker; and re-used that component to support code review - that has lead people to conflate the two concerns, and often (mis)use the issue tracker as a substitute for a proper discussion forum or mailing list

the only new feature that github added to that genre of website was the web-based, collaborative features such as pull-requests and an intentionally superficial messaging mechanism (rather than allowing users to communicate directly or privately), along with non-collaborative gamified “social” features such as “stars”; but that is far from sufficient to cast it as a collaborative platform - in contrast to “stars”, sourceforge has user reviews, which, just as a proper discussion forum, require participants to be thoughtful about their opinions and endorsements, rather than simply “drive-by” clicking a “+1” button with their mouse - sourceforge also allows users to communicate directly and privately - when seen in such a light, the practical value of those github-style appendages/constraints to software development appears to be more dubious

probably, the biggest the reason for the popularity of the github-style workflow is that it made proposing patches and code review simpler for less-experienced programmers who were not so comfortable on the command line, or with the existing patching and code review tools; but it introduced the unfortunate side-effect of centralizing the collaboration workflow and legitimizing the reliance on third-party “walled-garden” infrastructure

one could argue that it contributed greatly to the explosion of the amount of free software available; but one could as well argue that it lowered the average software quality, and enticed people to treat software development as a fun social game among pals, rather than a skillful craft among a professionally-minded team of competent artisans, who treat their work more seriously than a friendly game of darts, or a competition to collect a fan-base of “followers”

all that has it’s historical value and social value; but has little technical value, such as allowing people to contribute who could not have done so otherwise, or increasing code-quality - probably, that is what obscures the technical merit of forge-fed, making it difficult to see anything academic about it - the “Motivation” section of the forge-fed documentation highlights it’s academic merits - in short, that git was designed to be de-centralized and to support self-sufficient, non-hierarchical project management, without impeding hierarchical management structures or constraining infrastructure options - later on, commercial websites centralized project management into the hierarchical “hub” paradigm, with a high degree of vendor lock-in - now forge-fed aims to restore de-centralized, non-hierarchical, self-sufficient project management and collaboration, while retaining all the webby bells and whistles that, for many, have invaded and over-shadowed the essentials of project management, and allowing migration of data to other compatible services

BTW, i started on some forge-fed boilerplate ruby code if you would like to add to it:

src/python/ruby made me laugh. ^^

wasnt intentional - that ‘python’ dir started as a separate git repo - i have not moved the ruby script only because ive not worked on it since before i merged it into forge-fed

I will write some comments, responses, thoughts to your longer reply, just didn’t took the time yet. :slight_smile:

A thing that comes into my mind when comparing mail to other forms of communication is that non-linear threads are a feature you do not have in other (web) communication; there communication is mostly linearized. In linear(ized) communication for example it is, at least to me, harder to follow a sub-topic in a thread. But that might come with the list+"pre"view structure most mail clients have.

I would boil this down to “quantity is not quality”. Sloppily said: “There is more, but is it any good? And even if it’s good, is it due to lowered bar of entry?” [citiation needed]

For the rest; I read it interestingly but cannot type any useful response right now. But I guess my response is not required, just interesting. ^^

Before I spend two more days polishing this post for not that much more value, I’ll let it be as it is, at least for the moment. :slight_smile:

I keep spending time thinking about the challenge of adding ActivityPub to existing software.
Basically, what keeps going around in my head is the challenge of making stuff asynchronous. Like, I comment on an issue and now the forge has to federate it to the subscribers: Doing this synchronous does not make much sense to me.

Looking into GitLab, there is no problem with asynchronicity, they already have an background processor.
During my quick look around in Gitea/Gogs I saw no background processor there. But go has go routines, so a real background worker might not be required.

Yep it doesn’t. It all starts falling apart once you need to send to hundreds or even thousands of remote servers. ActivityPub especially is pretty verbose as a protocol.

Background jobs using a queue system is the way to go :+1:

Regarding the problem of “Remote Users” (@fr33domlover) , I think the term “Users” is misleading here. I would rather stick with the AP term “Actor” (w/o “Remote”).

To me an User’s most important property is the authentication information (for example a password hash); it tells the application: Everyone who can authenticate is allowed to use the given identity (for example by knowing the password).
An Actor on the other hand just describes an identity; an identity an User is allowed to use.

For example imagine a issue discussion.

When thinking about the author of a comment, a status change and the author of the issue itself, the identity is important, not what is required to control it. In this context the author can be represented by an Actor.

The User only comes into play when trying to add a comment to the discussion (but it is not required, for example when using C2S1): There you first have to prove that you are worthy of controlling the identity (i.e. authenticate you). Afterwards the the only important thing is which identity you now control.

So, my thoughts are to move everything that is Actor related to an own table (identity information like [preferred] username, AP ID, profile picture, …) and only keep properties that are really required for Users in the corresponding table (authentication information).

The UX challenge here is then now how to present Actors in the UI. I think of something like:

@criztovyl https://criztovyl.space/actor

One challenge there is handling preferredUsername collisions. Especially malicious ones.
Like this (very badly constructed):
@f00user https://gitlab.com/f00user” vs. “@f00user https://gillab.com/f00user
But maybe add some kind of short, 4 char thingy to the username, from some kind of hash (sha256 in this case)?
@f00user 112c https://gitlab.com/f00user” vs. “@f00user 7999 https://gillab.com/f00user

So, this then?

@criztovyl b1ee https://criztovyl.space/actor

But that’s only 16 bits of 256bits, with 4 RGB colors one maybe could get more?
Looks okay directly below each other, no idea if it helps if it’s not directly below each other.

Screenshot%20from%202019-06-30%2015-20-05
Screenshot%20from%202019-06-30%2015-26-03
Screenshot%20from%202019-06-30%2015-36-06
I like the •'s
Screenshot%20from%202019-06-30%2015-28-24

Standalone Gitlab • Example (faked via Developer Tools):


1 in case of C2S you only need control over the identity’s private key for signing your request.

So, here the current challenges for building ForgeFed into existing software as I see them at moment, boiled down to two bullet points:

  • asynchronous processing of events (Federating events)
  • separate User and Actor entities

The last point, separation of Users and Actors, for example would help preventing the problems as described by @mikaela, even if in a completely different context ^^

1 Like

@criztovyl, FYI I added federation to an existing web application (that I wrote). So if you’re wondering what the challenging points are, I may have hints :slight_smile:

I don’t think async processing is a challenge; I coded my own async delivery mechanism. And if instead you use an existing one, as @jaywink suggests, then it’s even easier.

Handling of users and actors too; you can just do the plain regular Fediverse stuff: Publish actor documents, publish public keys for HTTP Signatures. Nothing fancy. And you don’t need to care at all about preferredUsername. I think it’s there only because of WebFinger. You identify remote users by their ID URI, not by their username.

I posted a little while ago a post on the Gitea forum with an idea for how to implement federation. It’s based on my experience implementing federation in Vervis. It’s very very general, but take a look :slight_smile:

1 Like

It is a challenge in the sense of not being part of the already present forge code. True, not a big challenge, but still a (small) (technical) problem to solve, something to have thought of, to be aware of. Adding AP is more than just adding a new REST endpoint.

For technical handling, yes, I don’t need to care, true.
But for UI/UX I think it an (important) point to keep in mind. In the end UI’s need to find something between plain usernames and full URIs/IRIs.
Either one stand-alone will lead to problems: Usernames are not “secure”, it would be too easy to impersonate somebody maliciously. Full IRIs on the other hand are not suitable for everyone and are not particularly nice-looking either. :smiley:

generally speaking, this seems like another example as i discussed with fr33domlover a few days ago, of how the language of AP confuses the thinking about integrating it into existing applications - the job of the forge-fed layer is to abstract away the quirks of AP so that the forge can behave as normally as it already does - for that reason, it would be far more helpful to think about it in terms of how to interpret AP messages into something that forges already do, rather than how to adapt the behavior of forges to conform to the AP paradigm - trying to change the way that forges behave would only add friction towards wide adoption

concretely, AP actors are not something that the forge needs to treat as first class - from the forge perspective, the relevant entities that need to be represented as AP actors are users, teams, repos, and tickets - all of these already exist in the forge database with the appropriate separation - an “AP actor”, from the forge perspective, equates to “anything that publishes a feed of event to which other users can subscribe, or that accepts analogous incoming events” - those are most likely the same set of forge entities: users, teams, repos, and tickets - for AP purposes, the forge would only need to expose an actor document for each of those entities, that are to be translated into AP actors at the forge-fed boundary - everything else is a matter of translating the standard forge events into AP messages that will be translated into the standard forge events on the remote forge, and vice-versa - its most likely that the forge-fed layer would require no storage or memory (other than the local actor documents) - ideally, the only intrusion into the forge database would be the HTTP signing keys - additionally, a new optional database field, indicating that foreign users are distinct from local login users, and an optional field to hold user’s GPG keys

one key implementation detail that was not noted here is that forges would need to add foreign users to their database, probably as a limited, non-login, phantom user - once you have that, the rest falls into place from the forge perspective

probably, most forges already post events asynchronously, such as webhooks and email notifications - there is no new magic in AP - it is just a messaging protocol - asynchronous messaging is common-place in large systems - it is most likely that all of the necessary parts or dependencies will already be present, except for HTTP signatures

the usernames are in the URI - that could be easily parsed from the URI to be correlated to a phantom user in the DB (or create a new one); as long as we mandate a conventional format for globally unique user-ids, such as ‘criztovyl@criztovyl.net’; and that the actor document be located in a directory named with the globally unique UID - IIRC the actor document is present in every AP message - that user’s URI would look something like: https://criztovyl.net/u/criztovyl@criztovyl.net/actor.json - globally unique user-ids are essential; and that is exactly what should be shown in the GUI to represent that user

i dont remember this being proposed before; but imagine supporting an “@mention” feature - it would be absolutely necessary to mention another remote user as @criztovyl@criztovyl.net; because the local forge may have its own local user with the user-id criztovyl, who can be pinged as @criztovyl - impersonation is not possible; because of the signatures on the messages

This would not be compatible with any platform that doesn’t include usernames in the URI. Using an UUID is very common out there since not all platforms want to lock the username like for example Mastodon does. Of course anyone who wants to federate with Mastodon probably has to :slight_smile: But anyway, if folks want to expose a username I would strongly suggest doing it the way it is currently being done on many platforms ie adding a preferredUsername to the Actor document.

The UX is a real issue in AP land, which is why preferredUsername is a popular thing. It’s much easier to expose that to end users than an URL ID.

Again, repeating some earlier concerns that ForgeFed tries to take on both forge related extensions and “fixing AP”. The username and UX issues are a part of the latter, nothing to do with forges. I’d suggest bundling username related recommendations concerning UX to a separate document outside ForgeFed extensions.

I don’t think it’s AP’s language that cofuses here.
My main issue was plainly the, in my opinion, imperfect way to represent AP Actors as Remote Users.
What I surely did do, was mess around with, well, wrong words.

Just replace “Actor” (AP) with “Identity” and the AP-confusion is gone:

Let me repeat here: My concern was not about Users and Actors specifically but Users and Identities generally, although I messed up the wording pretty bad.

Anyway, let’s continue.

Please note that my concern is about directly adding to the core without an additional layer.

User-vs-Identity is not AP paradigm but Federation paradigm. And the forge will have to adjust to it in one or another way.
My last post just criticized the “Remote User” way and described an “Identity” way. (And includes some thoughts on how to UX/UI-wise represent Identities (which can be applied to Remote Users too))

Yeah, I guess I am over-concerned with overloading the users table / entity.
When overloading, the essential field’s I would add would just be, expressed in my user-vs-identity thinking, an “identity-only” boolean field, the identity IRI and identity public key.

I think adding GPG keys should not require an additional field - forges that support keys will already have an relation for that; and forges not supporting it yet will be, in my opinion, better off with implementing it as a relation than a field.

Also, which role do the GPG keys play in this context?
Signing keys - sure, for verifying the signature. The indication field - obviously.
But GPG keys? Sure, for verifying commits, but that does not fit into how-to-represent-users-that-are-not-from-this-server context. :smiley:

As @jaywink said:

IIRC preferredUsername is even part of the AP spec!

The question is: What is the user ID?
To me, the ID is the actor’s IRI. So, for my examples, https://criztovyl.space/actor. That is globally unique. That is what can be presented to the user.

But, from UI/UX PoV, is not a good way. And, there we are at my dilemma of how to present the IRI ID to the user.
I don’t want to hide it. I just want to add something for the user to make the UX a little bit nicer. And just IRIs everywhere is simply not particularly nice IMHO1.
If you prefer IRIs everywhere - fine, there should be an option to hide the additional information. In the end the user must to be able to decide what (s)he wants. But any application I would distribute as a default always would show additional info for the IRI.

This very much depends on how you implement embedding mentions into and extracting them from text.

Internally, I would always embed the mentioned user by it’s internal ID. i.e. when the user mentiones the local @criztovyl a mention is embedded based on the id of that user in the users table. If it’s the https://criztovyl.space/actor the mention is embedded using the id of the corresponding user in the users table. Extraction and rendering for display simply the other way around.

Externally (i.e. AP), I suppose, without specifically knowing it, the mention won’t be extracted from text. There would be a mentions property that contains all the mentioned users and their IRI IDs. The text is purely visual and will contain the mention as produced in html (or whatever format) by the originating forge.

In the end it’s implementation detail how the forges deal with displaying mentions and is nothing I would include in the ForgeFed spec. The spec should only need to care about the technical problems of, for example, how to deliver the mention event.

On a technical level – yes. In UI sense – no.
A human seeing two @criztovyls cannot easily determine which one is the @criztovyl (s)he’s thinking of just because the message is signed.
Yes, nobody forces you to display just the username, but then you are again back in the dilemma: Show the full, possibly very ugly, hard to read IRI only? Show the IRI additionally? How to help the user to easily make a distinction between similar IRIs? etc

Yes, this is surely not an issue of the ForgeFed spec has to deal with; but something general in regards to AP/Federation.

And, as a side note: Although this thread has ForgeFed in it’s title, I am writing anything here that comes into my mind that is related in any way. ^^

Thank you all for reading & responding. :slight_smile:


1 humble, not honest

True, but it is established in the community already, so inventing yet another way to determine a username for an Actor seems like a bad idea for interoperation. Who knows preferredUsername could well be documented in the next iteration of AP, as it becomes an established not required thing.

Why “True, but …”?. ^^ Sorry for nit-picking. ^^
I am completly with you with what you say, but I don’t understand how it applies to the sentence you quoted. :slight_smile:

Well, it’s “true”, “but” there is an important factor to consider about real world adoption :wink: I can’t say preferredUsername is in the spec, but I don’t think it matters at this point.