Properties for commit message

How to specify the commit message in ActivityPub?

(1) Unified or split

Should the entire text, i.e. title and description, go into a single string, or should they be separate?

(Client UI would very likely want to be able to have them split, display one without the other, display them in different styles, etc.)

(2) Property for title

Which property to use for the commit message’s title? There’s name which is plain-text but is a commit title really the name of the commit? There’s also summary which would be good, but it’s HTML. Reasonable if we use summary in other places already, such as for Ticket titles and MR titles and so on.

(3) Property for description

If using separate properties for title and description, which property to use for the description?

If we use summary for title, then using plain-text for description would be weird. Should probably either use content or an embedded object with a content property, and source set to the exact original plain-text message.

Another option is to use entirely new custom properties, but is there really a good reason to deviate from the standard properties?

Thoughts / ideas / proposals?

NOTE: This isn’t git specific, so when you think about this stuff, consider other VCSs too such as Mercurial, Darcs, Monotone, etc.

the question as posed, reads like another example of trying to
bend the existing functionality of forges to fit into the
activity-pub paradigm; which leads to the wrong picture of what
actually will happen, and the simplest thing that forge-fed needs
to do to allow that to happen in the forge’s normal peculiar way

fundamentally, a pull request is a ticket, like any other
ticket, except that it has an associated checkout-id or patch
data - so the same question applies to any ticket, not only to
those associated with VCS commits - if it has already been
decided how to represent a ticket in AP; then this question
already has the same answer - the only additional caveat is
how to convey the associated commit-id, branch, or patch data -
that was discussed on the mailing list, as i remember - IIRC it
was decided that there needs to be a field for the commit-id or
branch-name, and an optional second one for the raw patch data -
the forge could use the source repo URL to fetch the sources and
look for something (commit/branch) that matches the
commit-id/branch-name field - otherwise when the optional raw
patch data field is present, the commit-id/branch-name field
would be set to ‘raw’

a git commit has no “title” nor “description”; but only a
commit-id and a commit message - the commit message is always of
the form:

FIRST_LINE
… empty line …
OPTIONAL_DETAILS

when you ask about the “commit message” as having “title” and
a “description”, you are referring to the forge ticket data and
not necessarily related to anything in the VCS commit or patch
data - the “title” is usually taken initially from the
“FIRST_LINE” of the commit message; but it is not intrinsically
related to it - it is usually modifiable afterward, just as the
title of any other ticket - the “description” is a normal ticket
comment - it is completely unrelated to the VCS commit

also, if the PR ticket has an associated git commit, then the
“title” in the AP payload would most likely be ignored; because
the forge will probably create that itself from the “FIRST_LINE”
of the commit message - the “title” in the AP payload would most
likely be used only for raw patch data type PR tickets; so the
ticket “description” (which is really just the same as the
comment of a ticket) must be a distinct field - another way to
see that, is that most ticket comments (other than the OP) would
have no title

I’d like to remind you that one of the ForgeFed features discussed many times is compatibility with the Fediverse. We’re using ActivityPub on purpose, not just a custom JSON format. So, yeah, I’m trying to figure out the best balance between the ActivityPub vocabulary and the needs of ForgeFed.

This question is about representing an existing commit in a repo, not a proposed patch or ticket or merge request. Indeed the usage of properties for tickets has been discussed separately.

We can’t use “raw patch data”. The only standard part of it, as far as I’m aware, is the unified diff format. But that applies only to the parts of a commit that are patches to text files! There can be many other changes. For example, a change of permissions on a file, or a new directory being added, or a file’s name changing. Different VCSs have different kinds of changes that a commit can contain. Potentially, you could have a smart VCS that supports programming-language-specific Abstract Syntax Tree patches. So, we’re probably going to have a vocabulary for these changes that a commit may contain. The forge’s job will be to take a commit, e.g. a Git commit, and to produce a ForgeFed description of it.

In git, a commit does have a title and a description. The title is the 1st line of the commit message, and the description is the rest. I agree with you that when you write a commit message, you write both parts in 1 piece of text, but, that’s just a tiny internal detail of UI and storage: What we’re concerned about it the conceptual model of a commit. In some VCSs, the title and description are kept in a more separate way. But, again, the point is that they’re conceptually separate pieces which are often used separately, regardless of how each VCS stores them. That’s why to me it’s not obvious they should be stored together.

Suggestion: If you’d like to have an easier time avoiding making git-specific assumptions, try doing ForgeFed speccing with at least one more VCS :slight_smile: The websites of Mercurial, Darcs, Monotone, etc. have nice tutorials that can give hints about some differences between the various VCSs.

Indeed if you open a MR using an existing commit/branch, we could have a C2S feature where you omit the title and/or description and the forge auto-fills them from the commit, but this is C2S, not S2S federation.

This question is about representing an existing commit in a
repo, not a proposed patch or ticket or merge request.

ok - because you used the terms: “titles” and “descriptions”; i
assumed that was referring to pull request tickets, and not
arbitrary commits - because commits do not have titles nor
descriptions - they have only an id and a commit message -
tickets have titles and descriptions (the OP comment) - i was
just separating the terminology of tickets from commits

but then the question is, why would any information about
arbitrary commits need to be communicated or represented
remotely? - what feature or event is that related to? - is it
for notifications like:

“bob just pushed 42 commits to the repo
https://bobs-server.net/bob/bobs-repo

why would any more detail need to be represented?

We can’t use “raw patch data”

i wont go into detail about that - it would be best in a
different topic thread; but i will say that we must support
that use-case in order to be maximally general and not
VCS-specific and not web specific, however difficult or
non-standard it may turn out to be - that is still a very
popular method of collaboration and review, even for projects
that will eventually merge the patches into a VCS

this would allow people to contribute email patches to projects
that only accept webby pull-requests; and conversely, it would
allow people to contribute using their web forge to projects
that only accept email patches

one could imagine a crazy new trend where some project would only
accept patches via mastodon toots - this would allow for such
future use-cases as well

i certainly can imagine wanting an email bridge for forge-fed -
including a raw patch text payload with a merge-request message
is essential for that

In some VCSs, the title and description are kept in a
more separate way. But, again, the point is that they’re
conceptually separate pieces which are often used separately,
regardless of how each VCS stores them. That’s why to me it’s
not obvious they should be stored together.

there are either presumptions in that, or imprecise terminology -
most importantly, forge-fed has no business to be concerned with
presentation nor storage - it is purely a messaging protocol -
the forge will store the data and present the data however it
chooses, regardless of how it was received - for example, the
git forges will not have a separate DB field for
“commit.description” - they most likely would not store any data
about commits other than what is in the VCS itself

just nit-picking there of course; but appropriate terminology is
important as not to cloud the concern - i assume what you meant
was: “should they be transmitted together in the same data
field” - just to be clear, that is transient message passing -
its entirely orthogonal to presentation and storage - the forge
is going to discard the message and then store the transmitted
data however, in its own way, possibly mutating it, and
possibly discarding it

id say the only relevant factor there, is that: if some VCSs do
in fact expose those strings differently, then we must account
for that - so probably the simplest generalized solution is to
transmit them separately; just in case the forge chooses to
treat them differently - again though, this is data that will
already be intrinsically present in the VCS storage; so its not
clear why this information would need to be duplicated in the
forge-fed message at all - a commit-id would sufficient to
reference it

Suggestion: If you’d like to have an easier time avoiding
making git-specific assumptions, try doing ForgeFed speccing
with at least one more VCS

agreed - i am trying to make no presumptions - not VCS-specific,
nor forge-specific, nor mastodon specific - we only need to
concern with the things common to all forges and VCSs that can be
generalized

luckily, all of the relevant features of existing activity-pub
peers like mastadon are already accounted for in the major forges

  • so any specifics of mastodon interoperability really do not
    need to be considered at all - they would all be re-used parts
    of the core forge-to-forge communication, that would be present
    even if activity-pub were not used

i really see interoperability with the larger fediverse as a
bonus feature that we get for free for almost zero effort,
simply for using the activity-pub protocol to communicate what
forges already do - its not something that we even need to try
to support explicitly - we can abstract that away with the same
forge generalizations

Trying to split the commit message to different fields sounds a bit git specific imho. I would just leave it as a content field ie making it very similar to a Note but of course with the ref as an additional property.

Clients can and will anyway be opinionated about display matters and trying to design an object for all VCS’s is going to not work.

@bill-auger, have you looked into web hooks? Web hook handlers need certain info about commits to be available for programmatic usage by the handler software. And that includes things like the list of files removed/added/patched in the commit. The “X pushed a commit to Y” stuff, that minimal summary, is good indeed as a notification for human viewers, but bots/handlers/automation/etc. needs more than that (just take a look at how Gogs web hooks work and how fpbot uses them).

@jaywink, in some VCSs the commit’s title and description are stored separately. But even in Git, where you enter them in 1 text file, they’re conceptually somewhat separate:

  • You don’t just write a paragraph with a 80-char line length and whatever the 1st line contains turns into a commit title; you intentionally write that 1st line to be a summary. Perhaps you write it in 1 file along with the longer description, but conceptually they have different roles. Git just happens (I guess) to store them together and call them “commit message”. Other VCSs call them separate names and store them separately in the VCS storage format (e.g. Darcs, which calls them “patch name” and “patch description”).
  • git log displays only the titles by default
  • Web forges usually display title and description in different sizes and style, and log views usually display only titles by default, much like git log

I’m just saying that 99% of the time they’re treated separately, and I think it may be better to be more “structured” and put them in separate properties, than dump them into 1 string and have clients do string manipulation to split the 2 parts. Git’s format where the 1st line is title and the rest is extra description is Git-specific; we could adopt that as a standard format and have other VCSs convert to it when creating the commit’s AP JSON object, but I’m wondering if that would in any way be better than using separate fields.

@jaywink, are there VCSs that don’t have any concept of commit title and use only arbitrary-length descriptions spanning multiple lines and paragraphs, without a 1-line summary at the top etc.? If yes, then that would be a challenge to treating the message as title+description indeed (unless ForgeFed commit titles are clearly stated to be optional and can be empty, to have clients be ready for that scenario).

im still very confused about what is the use case proposed in
this thread - why would forge-fed ever need to convey information
for the sake of web-hooks?

web-hooks are a forge-specific feature that occur well after
the forge-fed message is relevant, and most likely
asynchronously - it sounds like this is implying that forges
would need to modify their web-hooks to accept this information
through a different channel than what they already do - even if
there were some information in the forge-fed message that was
relevant for the web-hook, and that could not be gotten some
other way, the forge would need to store that information
somewhere for whenever (and if) the asynchronous web-hooks
fire; and setup a garbage collector to reap the orphaned ones

aside from that caveat, web-hooks are pre-commit and post-commit
hooks - they are initiated by VCS upon executing important
events, not by any user - regardless of how the VCS procedure was
initiated (e.g. user clicking a ‘merge-request’ button, or an
incoming forge-fed ‘merge-request’ message), the event that the
web-hook bound to, is a forge VCS event (or DB event like a new
“follower”) - the forge VCS software is already handling the
merge before the web-hooks fire; so all of the information that
the forge or the VCS software needs to pass to its web-hooks,
will already be present in its VCS after the checkout - it seems
to me that forge-fed only needs to convey the repo URL and which
checkout-point to pull the changes from; and the rest will “just
happen” as it normally does

generally speaking, forges should not need to change their
behavior in order to accommodate forge-fed users - forges that
have web-hooks already knows where to get all of the relevant
information for posting one - forge-fed message only need to
mimick interactions that could have been initiated by a local
forge user; and the forge should be able to treat them as such
blindly - with that goal in mind, i dont see why forge-fed would
need to send any information that the local user would not need
to specify in the GUI - for example, when initiating a
merge-request normally using the forge GUI (i assume this
related to the ‘merge-request’ forge-fed message), the local
forge user does not need to explicitly list the files
removed/added/patched in the commit nor the commit message - the
forge already knows how to discover that - for that to happen,
the user only needs to specify a reference to the change in the
VCS

regarding web-hooks, no local forge user ever supplies any
information directly to any web-hook, except when they initially
configure its remote endpoint - web-hooks are entirely the
forge’s internal business, done automatically using the data it
already has, and is related to events that the forge has already
executed successfully

@bill-auger, I can give you a quick example of how web hooks are related to this. It’s slightly off-topic, but, here it goes anyway. Imagine you set up a bot account on the fediverse. This bot follows various projects and repos across the fediverse, from different forges, and it does stuff with the activities it receives. Say, it reports the activity to IRC channels. This bot wants to grab info about a commit, such as the title and author and a list of files that were changed and the name and branch of the repo, and format a colorful IRC message. Another bot could use this info for CI, or for building web pages. These things are the “next generation” of web hooks, implemented using ActivityPub actors, and they need the same data that “traditional” web hooks need. Obviously, forges are free to continue to provide their old forge-specific webhook mechanisms.

Well, technically, Git :wink: From git commit --help:

Though not required, it’s a good idea to begin the commit message with a single short (less than 50 character) line summarizing the change, followed by a blank line and then a more thorough description. The text up to the first blank line in a commit message is treated as the commit title, and that title is used throughout Git. For example, git-format-patch(1) turns a commit into email, and it uses the title on the Subject line and the rest of the commit in the body.

Sure, convention and various Git related tools do treat it separately, but technically it’s just a text field.

I do get your point of aiming for the 99% usage case and for sure that can be a way to go, and could be the better choice. The minority can always do string splitting/joining where needed.