=============
== CSDUMMI ==
=============
I'm CSDUMMI (a.k.a. Joris Gutjahr). A software developer.

Hi

Hi, I thought I might want to service this blog a bit.

During the last two months I’ve been doing improvised work-and-travel across Europe. I got myself an Interrail pass for 2 months with 15 travel days and I got, through accident or random encounter, a remote job working with Serge on Babka.

By now my journey is coming to an end and I’m returning home for the festivities. I don’t know what I’m doing next month, Babka will probably still have a lot of work to do. But I’ll also be searching for other projects to keep me occupied, so if you want to, you can contact me.

My journey is private and I’ll not talk about it here. But I’d like to talk a bit about what I’ve been doing with Babka.

ActivityColander

The first project I worked on with Serge was ActivityColander. The pitch of this project was made by Serge at a conference talk in 2019. In short, ActivityColander is supposed to be akin to a spam filter for unwanted messages for ActivityPub.

Activities (the basic messages sent in ActivityColander) pass through ActivityColander before they reach the ActivityPub server and ActivityColander decides whether an activity should be:

  • Rejected => not forwarded to the ActivityPub server.
  • Accepted => forwarding an activity
  • Marked as Spam => forwarding the activity, but adding additional information to mark it as spam and the reason why it was marked as spam.

ActivityColander is an extensible system. Every instance administrator can compose their own pipeline of “checks” that an activity needs to pass through before being rejected, accepted or marked as spam. These checks each return a score and maybe a note. The score must be in the range of -1 and +1, while the note can be any arbitrary string explaining in a human readable form why a given score was returned.

Each check is also supplied with a weight by the administrator by which the score that this check returns is multiplied before ActivityColander sums all the score*weight value to receive a final score for an activity.

This final score is then compared with a blocking threshold and if it exceeds this threshold, the activity is withheld from the ActivityPub server. Blocking it. If the final score is below the blocking threshold but above the spam threshold, two headers are added to the request containing both the final score (ActivityPub-Spam-Result) and a detailed listing of all checks, their resulting score and the note they returned (ActivityPub-Spam-Details).

If the final score is below either threshold (which is configurable of course by the administrator), it is accepted and forwarded without comment to the ActivityPub server.

Notes about scoring:

  1. The spam threshold and blocking threshold are specified as values between 0 and 1, because the total score is divided by the total score is divided by the sum of weights of all the checks (the highest possible score an activity can receive in a given pipeline) to create a value that will always be in the range of -1 and +1
  2. A negative score is actually increasing the likelihood that a check will be accepted by ActivityColander. A possible use case for this would be a “Follower Check”, where an activity should be more likely to be accepted if the sender is part of the following set of the receiving user.

How was it implemented?

ActivityColander uses OpenResty’s extension of NGINX allowing us to write Lua scripts that are executed when NGINX processes a request and to modify or even block a request. All of ActivityColander, including checks and the surrounding software calculating final scores and blocking or modifying requests has been implemented in Lua.

This was a novel language for me - it’s the first language I’ve worked in using the end keyword instead of indention or } - but after a few days I didn’t feel like I was referencing the manual and other resources anymore than I would have done in any other language.

Besides the basic structure of ActivityColander - read a request, decide if it’s an activity, execute pipeline, calculate result, decide on block, mark or accept - we’ve also started implementing a few checks already. Including a check for bad keywords and bad domains. Importantly: none of these work as substitute of the existing blocking mechanisms in ActivityPub server implementations. While a domain-wide block on e.g. Mastodon, rejects all activities from that instance outright, the domain check on ActivityColander merely discourages it and depending on the spam and blocking threshold, it may only work to block or mark an activity in combination with other factors (such as problematic words, identified by the keyword check). The goal is to have many different checks that work together to allow an instance administrator to fine tune their moderation policy.

Another feature I’ve worked on in ActivityColander was to allow for persistence through a connection with a redis server and adding an HTTP client library that will allow checks to query external APIs. There are some checks that will need to store and persist data about requests, e.g. a rate limiting check that returns a higher score the more activities an instance or account sends to the ActivityPub server and other’s might want to connect to APIs to receive more accurate real time data (for example, a database of slurs and coded discriminatory language).

How will it be deployed?

We have not yet deployed ActivityColander to babka.social but intend to do so by building a docker image and running the container before the ActivityPub server and behind a reverse proxy handling SSL connections.

But this should not be the only deployment option for ActivityColander. I’ve just started working on a Makefile that is supposed to install Openresty, migrate NGINX settings to Openresty and add ActivityColander on an existing ActivityPub server running on Debian/Ubuntu and using NGINX for it’s reverse proxy.

But my work on this project has been suspended to support the launch of babka.social.

Babka.social

Babka’s main component is it’s Mastodon instance. This instance serves as a home for Babka members on the Fediverse. But because Babka is not just a Mastodon instance, Babka uses a Keycloak server as identity provider and Single-Sign-On with Mastodon and requires members to sign in through Keycloak. This has the benefit that Babka can add further services (such as a chat for example) and uses the same user database there as on Mastodon.

During testing we had to recognize that Single Sign On (SSO) is not handled very well on Mastodon at the moment. Mastodon nominally supports SAML, OIDC and CAS protocols, but I found that neither OIDC nor SAML support single-logout on Mastodon. This means that if you logout of a Mastodon instance using SSO, you are not automatically logged out of the SSO provider and after logging out of Mastodon you can still click on ‘Sign in’ and be logged into Mastodon without having to reenter a username and password or completing 2FA.

I consider this a dangerous issue, especially because it is invisible to the users. But we currently do not have this issue fixed on babka and on the GitHub issue I raised a few people mentioned that they believed user didn’t log out of their account very often - which is an argument about the severity of this bug, but not an argument against fixing this.

Another major issue I’m now working on is that Mastodon does not redirect back to the page a user came from after an SSO login. This has the curious implication that a user cannot use third-party apps because they need to authorize this app at the /oauth/authorize path, from which they are redirected to login via SSO and from which they are then always taken to the Mastodon homepage, instead of returning to /oauth/authorize. I’ve analyzed this bug and believe it to be caused by Mastodon resetting the session after a login - deleting the redirect path stored in the same session.

This too has an issue on Github where I publish my debug information and potential bug fixes.

Possible patches?

My work on Mastodon in this past month has lead me to fork not only Mastodon but also several of it’s dependencies, like devise (pretty much the standard library in the Ruby on Rails ecosystem for doing user authentication) or gitlab-omniauth-openid-connect (an extension of the omniauth library to add support for the OIDC protocol to this library, which is used by Mastodon to turn it into an SSO client that nominally supports SAML, OIDC and CAS).

Not everything I did on these forks should be merged upstream. But a few features that I might consider compiling into a patch for either Mastodon or the glitch fork of Mastodon are:

  • Replacing the separate “Sign in” and “Sign up” buttons in the Mastodon Web UI by a single “Login or Register” button that directly redirects to the SSO endpoint, instead of going through a pointless form with a single entry.
  • Replacing the Account Settings in Mastodon by a link to the corresponding settings on the identity provider. (These settings are used for changing your E-Mail and Password, which in an SSO-only setup, is of course not possible within Mastodon)
  • Allowing for single-login when using OIDC. This would not only be a patch for Mastodon but also for gitlab-omniauth-openid-connect, because this library only recognizes a request as a sign out attempt if the path ends in /logout. Which of course is not the case for Mastodon, where the sign out path ends is /auth/sign_out.
  • Fixing the no-redirect bug. I’m currently working on a solution (specific to the OIDC protocol, because Babka uses OIDC) that would append a query parameter to the OIDC redirect uri containing the path to redirect to after a successful login on the SSO. Work on this branch is not yet finished, but it’s currently the most promising solution.

Conclusion

My work on Babka has been great. I’ve gotten in contact with a lot of technology I’d never have touched on my, including Lua, Ruby on Rails, ReactJS, OIDC and SAML. And it gave me an appreciation of the commonalities between different technology and languages, because while I’d still rather start a new web project in Flask and Elm I know that the other technologies are not so alien after all.

(When will we see an ActivityPub server in Python with a purely functional Elm frontend? Wouldn’t that be something?)

  • CSDUMMI