November 01, 2022

Balance of Power: A rematch served cold

There's an old video game the memory of which recently escalated itself to my attention: Chris Crawford's Balance of Power, a geopolitics simulator first released for the Macintosh in 1985. According to Wikipedia it sold about a quarter million units, which was a lot at the time, and I must've been somewhere in the impressionable age range of 10 to 12 years old when my father bought Incredible Technologies' port for the Commodore Amiga.

Go ahead and boop the nook

Its Workbench icon featured a mushroom cloud with a hand over it in the universal that's-not-what-I-ordered gesture (though possibly gently petting it — or simply shielding the observer's eyes?), but the game itself had a minimalist, academic look, and beyond a simple dissolve effect on the title screen it featured no explosions or indeed animations of any kind. This was unusual on the Amiga, a platform known for its ability to make the rubble bounce in Technicolor.

Crawford's maze

A strange game

Although I had an idea of what was going on in the world — I'd observed the cultural touchstones at friends' birthday parties, caught glimpses of something less watchable but much more memorable, and seen the inside of a bomb shelter — I didn't have the toolbox for nuclear brinkmanship, let alone internalizing what I now recognize to be a beautiful 87-page manual with a hardcover sleeve and an extensive bibliography. In the end, for all the sneaking into my dad's office to play this in the dead of night, I couldn't really figure it out.

So. Since Halloween seems like a good occasion to indulge in a little psychological horror (no other reason), I decided to do a rematch of sorts — this time with the help of fs-uae.

Crawford sez: RTFM

Crawford released an updated game in 1989 (simply called the 1990 edition), with the Amiga port credited to Elaine Ditton. It introduced the multipolar mode, which had been left out of the 1985 release:

Unfortunately, this multipolar view of the world is just too cerebral for most game players. It has something to do with our expectations of games; a mature, otherwise sophisticated adult will sit down with this game and ask, "How do I nuke the Commies?" Games, like stories, must have conflict, but people have been so inundated with the brutal, violent conflict standard in computer games that they are unable to grasp the subtle, indirect conflict arising from a multipolar world. This was a very painful discovery, and it forced a shift in the game from a multipolar view toward a more bipolar view. Minor countries had been able to pursue their own foreign policies; now they are passive pawns. Neutralist policies were entirely feasible; now minor countries choose up sides along left-wing/right-wing lines. The result is less realistic but more suited to the needs of the game-playing audience. Perhaps someday a more sophisticated game will be possible.

BOP manual, p. 74

Ok, so it's 2022 now, and we're all sophisticated down here. We want the multipolar. Furthermore, although the game didn't anticipate certain pivotal events of 1991, we'll play as the USA just to be on the safe side.

Here in the future, we also come armed with source code courtesy of the man himself: In this case, it's written in Pascal with a tiny morsel of assembler, of which the latter mostly exists to calculate square roots. BOP doesn't really have any spoilers apart from the obvious one, and a peek here and there should be interesting.

For the RAND analyst in you

Eh, close enough

The centerpiece of the game is, logically enough, the world map. It reflects some of the assumptions and technical limitations of the time: Many countries were removed because they'd make too small click targets or were thought to be strategically redundant. For instance, my own little corner of the world has been unceremoniously rolled into Sweden, which is fine (I, for one, welcome the Kalmar Union redux).

Next up, "Finlandization" is represented in the game, but Finland is not. This seems unfair to the Finnish. Of course, there's the unfairness of the entire, uh — (gesturing vaguely at the map) — to consider. It's very much not a game about curing the world's ills. But we'll live. Probably.

So neutral

Each country has a large number of variables attached. Some are directly manipulable, contingent on the limitations of physics and diplomacy. For instance, you could send military aid to the Swedish government but not the rebels, since there aren't any in Sweden to speak of. However, you could still intervene on behalf of these "rebels" and bootstrap an insurgency on some flimsy pretext. There are also subtle ways to change the existing government's mind and eventually sign a treaty that would let you station troops with them.

The important variables, such as government stability, security, affiliation and power, are only indirectly manipulable. Some of them are hidden, and most of them are obfuscated by natural language. There's no way you can break out your calculator to predict exactly what'll happen on the next turn, no big "Democracy" button to slam for a +2 bonus to semiconductors or whatever. You're forced to read reports and think in terms of… well, not exactly the real world, but at least a facet of it.

The game plays out over eight turns representing the passage of calendar years. On each turn, countries make their policy moves and optionally dispute each other's moves, and then the simulation ticks. The last part is where the magic happens. As an example, here's how country i's desire to Finlandize to superpower j is calculated:

y:=MilPowr[i]-InsgPowr[i];
FOR j:=1 TO 2 DO
  BEGIN
    x:=InsgIMax(j,i);
    ProjPowr[j]:=(IntvConv(x)*ord4(MilPowr[j])) div MilMen[j];
    x:=Treaty[3-j,i];
    x:=(Should(x)*ord4(MilPowr[3-j])) div 128;
    SelfPowr[j]:=y+(x*ord4(Integrty[3-j])) div 128;
    IF SelfPowr[j]<1 THEN SelfPowr[j]:=1;
    temp:=((ord4(Adventur[j]-DipAff^^[j,i])*ProjPowr[j]*(Pressure[j,i]+4))
           div SelfPowr[j]);
    IF temp<0 THEN temp:=0; 
    IF temp>2048 THEN temp:=2048;
    FinlProb[j,i]:=temp div 8;
  END;

This is an interesting function with a couple of variables in play:

  • The government's military strength relative to insurgents' (MilPowr[i]-InsgPowr[i]).
  • The threatening superpower's military spending (MilPowr[j]/MilMen[j]) and ability to intervene in the area. InsgIMax(j,i) considers troops stationed in countries geographically adjacent to i.
  • The supporting superpower's military strength, treaty level and history of honoring its commitments (Integrty[3-j]).
  • The level of any diplomatic pressure campaign — harsh words, basically — aimed at the country (Pressure[j,i]). There's a small constant added to it, so the multiplier can never be zero; posting lots of soldiers next door can be quite menacing even if it's done quietly.
  • The adventurism factor (Adventur[j]), proportional to the demonstrated combativeness (Pugnacty[j]) of the threatening superpower relative to that of the other's plus the overall political tension in the world (Nastiness). This goes up if the situation seems dangerous and uncertain.
  • The diplomatic relationship between the threatening superpower and country i. This is a signed integer with 0 being neutral. Effectively, being on good terms neutralizes the threat posed by soldiers, but not that of general craziness.

Another delightful wrinkle is what happens if the superpowers put boots on the ground on opposing sides of a conflict, meaning the US and USSR are in a shooting war:

IF ((IntvGovt^^[1,i]>0) AND (IntvRebl^^[2,i]>0)) OR
   ((IntvGovt^^[2,i]>0) AND (IntvRebl^^[1,i]>0)) THEN
  BEGIN	{USA fights with USSR}
    DipAff^^[1,2]:=-127; 
    DipAff^^[2,1]:=-127;
    Nastiness:=127;
    Pugnacty[1]:=127; 
    Pugnacty[2]:=127;
  END;

BOP treats this as an extremely serious situation and cranks up the Nastiness and Pugnacty factors, causing worldwide ripple effects (conflicts may erupt or worsen, weak governments may collapse, etc).

There's a lot more to it, but you get the idea: It's an interconnected world with gradual change, tipping points and cascading effects. I think few games pull this off — or dare attempt it — because it can leave a very narrow road between the extremes of boring (too stable) and incomprehensible (too chaotic). Throw in expectations of realism, and it's a tall order.

BOP manages fine, in part due to carefully chosen initial conditions and constraints on what it deems "minor" (non-superpower) countries. Here's Canada:

InitCountry(14,'Canada',2040,228,441,-10,-125,10000,0,36,20,9,56,255,77,99,12);
  InitCont(14,1);
  MinorSph^^[14,17]:=TRUE;
  MinorSph^^[14,18]:=TRUE;
  MinorSph^^[14,20]:=TRUE;
FiniCntry(14);

Defined as being contiguous with the USA (country #1), it adds Britain, France and West Germany to its sphere of influence, so when the AI hammers out its foreign policy, it will only consider these four counterparts. Britain for its part has 15, and France is up there with 12, but Sweden has only the one on its eastern border.

Frank exchange of opinions

FUNCTION Crisis;
{Returns a value of TRUE if the missiles fly}

Of course, countries don't get to undermine and invade each other willy-nilly. There are checks on this activity, such as the local neighborhood superpower getting on the phone and politely telling them to knock it off. When this happens to a minor country, it will pull back and just quietly stew for a bit, souring diplomatic relations, but with superpowers on both ends, things get exciting: It turns into a game of chicken. The superpowers take turns escalating, starting with a zero-stakes back-channel discussion, through diplomatic and military crises, until either one of them backs down or a nuclear war starts. The prestige penalty for backing down increases as the crisis escalates.

This can be frustrating, since it's easy to misjudge the computer's commitment (though it drops hints in its diplomatic missives, e.g. when it "categorically refuses" you may be about to have a bad day). The 1990 edition added advisors, in theory making it easier to get a read on the situation.

Mr. B has an idea

You can usually sort of interpolate their various opinions, but it's harder when they disagree wildly on an issue — like the above, where the Soviets just dispatched 20,000 soldiers to topple the government of Sudan. It sounds like a mean thing to do, and — more importantly — it would strengthen Soviet influence in the region and weaken ours. I don't think they're serious about this, so let's put our chips on advisor number four, the one with the mustache. I like his attitude.

No u

Mr. Gorbachev turns a deaf ear, so we go public with a diplomatic challenge. Still no. The world is watching now, and since there's an uncomfortable amount of prestige at risk, we press the issue and threaten a military response.

NUH UH

They escalate to DEFCON 4, we go to DEFCON 3, and they respond by escalating again. That's DEFCON 2, with bombers in the air and humanity collectively holding its breath. We're left with two choices: either let rip (and end the game) or back down and suffer crushing humiliation. This is bad. The prestige hit will affect our diplomatic relations, hamstringing us and effectively cutting our lead in half. Good thing we can absorb the loss without falling hopelessly behind. Otherwise, well…

Anyway. How could advisor #4 lead us astray? The answer lies in this excerpt from the GImpt() function:

x:=DipAff^^[i,Obj] div 4;
y:=(Should(Treaty[i,Obj]) div 4)+1;
z:=(ord4(DontMess[Obj])*1280) div SumDMess;
t:=Adventur[i] div 2;

CASE Bias OF
  0: BEGIN END;
  1: BEGIN x:=x*MySqrt(Abs(x)); y:=MySqrt(y); END;
  2: BEGIN y:=y*MySqrt(y); z:=MySqrt(z); END;
  3: BEGIN z:=z*MySqrt(z); t:=MySqrt(t); END;
  4: BEGIN t:=t*MySqrt(t);
       IF x>0 THEN x:=MySqrt(x) ELSE x:=-MySqrt(Abs(x));
     END;
END;

The enum values 1-4 correspond to the different advisors. Looking at the fourth one, our guy is giving more consideration to Adventur[i] — the adventurism factor again, which frankly must've been pretty high at this point in the game — and less to our relations with Sudan. Our belligerence emboldened him, and we got some bad advice in return.

Again there's a lot more going on under the hood, and a lesson for budding diplomats:

This system produces some behaviors that may surprise you. For example, suppose that you as an American player gave economic aid to Nigeria. The Soviets take objection to this and start a crisis. You escalate, they escalate, and a nuclear war starts. The question on your lips is, why would those idiots annihilate the world over economic aid to Nigeria? The answer is, because you were willing to annihilate the world over Nigeria. Remember, it takes two to make a crisis. The computer figured that this just wasn't an important issue for you, and that, while trivial, it was still a more important issue for itself. It therefore stuck to its guns. […]

This raises a very important point about geopolitics. You could protest loudly, "But I do care about Nigeria! The computer can't assume what I'm really thinking!" You are absolutely right. Real-world diplomats don't know what's really going on in the minds of their interlocutors. They know perfectly well that today's words are only an expression of today's exigencies. The only thing they can rely on are the substantial events of the past. If you have built up a record of close relations with Nigeria, your behavior in a crisis will have to be taken seriously. If your record is of weak relations, then your behavior will not be taken seriously. The computer treats it that way. If you want to convince people that you're serious, you've got to lay the groundwork.

BOP manual, p. 77-78

Enough of that. Are ya winning, son?

Yeah, about that rematch. This isn't tic-tac-toe, so you can play it and win. Sort of.

Everyone gets a trophy

The main difficulty lies in us and humanity getting to this screen — although "kept the peace" tends to belie some dismal stuff unleashed along the way — after which we can relax and judge the result for ourselves.

As you can see, mistakes were made in 1995. Lest you judge me, the (now former) government of Mexico forced my hand! What followed made a lot of people unhappy, and then the French admin collapsed amidst the general chaos.

I did keep the green line up, though, so we'll call it the good ending. It took a few tries.

Unforeseen consequences

resource 'STR ' (705, purgeable) {
	"2* response* reply* answer* reaction*"
};

resource 'STR ' (706, purgeable) {
	"2* is coming* is being sent* is on its way* will arrive*"
};

resource 'STR ' (707, purgeable) {
	"2* via the North Pole.* over your northern border.* by ICBM.* by m"
	"issile courier.*"
};

It's not an easy game. You could know the rules like the back of your hand, and it still wouldn't eliminate the risks.

The crisis dialogue has one final surprise in store for us. As you click through the DEFCONs, there's a random factor that becomes increasingly relevant. It represents the likelihood of an oopsie:

x:=0;
CASE CrisisLevel OF
  2: x:=16;
  3: x:=8;
  4: x:=2;
END;
IF DipAff^^[1,2]>0 THEN x:=0 ELSE x:=(x*(-DipAff^^[1,2])) div 64;
y:=Random div 128;
IF x>(Abs(y)) THEN BEGIN CrisisLevel:=1; ANWFlag:=TRUE; END;

Watch out for that ANWFlag! The risk is at its highest when going to DEFCON 2 with superpower relations bottomed out (as they'd be in a shooting war): (16 * 127 / 64) / 128 = 0.248, or roughly a 25% chance of unintended fireworks.

In most cases, the probability will be much lower. If your relations are merely kind of bad at -32, and you go to DEFCON 4, the risk is 1/128, or 0.8%. In the real world that's not so low, but for the purposes of BOP it's a low probability/high impact risk you'll be taking again and again.

Good thing, then, that we only have to make it through eight years. The risks add up over time, and even if you play quite carefully, you will come to fear a certain arresting memo:

Thanks, I'm good

The "END" button drops you back to the desktop. In order to succeed at this game you need patience, a cool head and plenty of luck.

GNOME OS 43

Mirror https://download.gnome.org/gnomeos/43.0/gnome_os_installer_43.0.iso (ISO)

GNOME Disaster Recovery

Mirror https://download.gnome.org/

October 31, 2022

ephemerons and finalizers

Good day, hackfolk. Today we continue the series on garbage collection with some notes on ephemerons and finalizers.

conjunctions and disjunctions

First described in a 1997 paper by Barry Hayes, which attributes the invention to George Bosworth, ephemerons are a kind of weak key-value association.

Thinking about the problem abstractly, consider that the garbage collector's job is to keep live objects and recycle memory for dead objects, making that memory available for future allocations. Formally speaking, we can say:

  • An object is live if it is in the root set

  • An object is live it is referenced by any live object.

This circular definition uses the word any, indicating a disjunction: a single incoming reference from a live object is sufficient to mark a referent object as live.

Ephemerons augment this definition with a conjunction:

  • An object V is live if, for an ephemeron E containing an association betweeen objects K and V, both E and K are live.

This is a more annoying property for a garbage collector to track. If you happen to mark K as live and then you mark E as live, then you can just continue to trace V. But if you see E first and then you mark K, you don't really have a direct edge to V. (Indeed this is one of the main purposes for ephemerons: associating data with an object, here K, without actually modifying that object.)

During a trace of the object graph, you can know if an object is definitely alive by checking if it was visited already, but if it wasn't visited yet that doesn't mean it's not live: we might just have not gotten to it yet. Therefore one common implementation strategy is to wait until tracing the object graph is done before tracing ephemerons. But then we have another annoying problem, which is that tracing ephemerons can result in finding more live ephemerons, requiring another tracing cycle, and so on. Mozilla's Steve Fink wrote a nice article on this issue earlier this year, with some mitigations.

finalizers aren't quite ephemerons

All that is by way of introduction. If you just have an object graph with strong references and ephemerons, our definitions are clear and consistent. However, if we add some more features, we muddy the waters.

Consider finalizers. The basic idea is that you can attach one or a number of finalizers to an object, and that when the object becomes unreachable (not live), the system will invoke a function. One way to imagine this is a global association from finalizable object O to finalizer F.

As it is, this definition is underspecified in a few ways. One, what happens if F references O? It could be a GC-managed closure, after all. Would that prevent O from being collected?

Ephemerons solve this problem, in a way; we could trace the table of finalizers like a table of ephemerons. In that way F would only be traced if O is live already, so that by itself it wouldn't keep O alive. But then if O becomes dead, you'd want to invoke F, so you'd need it to be live, so reachability of finalizers is not quite the same as ephemeron-reachability: indeed logically all F values in the finalizer table are live, because they all will be invoked at some point.

In the end, if F references O, then F actually keeps O alive. Whether this prevents O from being finalized depends on our definition for finalizability. We could say that an object is finalizable if it is found to be unreachable after a full trace, and the finalizers F are in the root set. Or we could say that an object is finalizable if it is unreachable after a partial trace, in which finalizers are not themselves in the initial root set, and instead we trace them after determining the finalizable set.

Having finalizers in the initial root set is unfortunate: there's no quick check you can make when adding a finalizer to signal this problem to the user, and it's very hard to convey to a user exactly how it is that an object is referenced. You'd have to add lots of gnarly documentation on top of the already unavoidable gnarliness that you already had to write. But, perhaps it is a local maximum.

Incidentally, you might think that you can get around these issues by saying "don't reference objects from their finalizers", and that's true in a way. However it's not uncommon for finalizers to receive the object being finalized as an argument; after all, it's that object which probably encapsulates the information necessary for its finalization. Of course this can lead to the finalizer prolonging the longevity of an object, perhaps by storing it to a shared data structure. This is a risk for correct program construction (the finalized object might reference live-but-already-finalized objects), but not really a burden for the garbage collector, except in that it's a serialization point in the collection algorithm: you trace, you compute the finalizable set, then you have to trace the finalizables again.

ephemerons vs finalizers

The gnarliness continues! Imagine that O is associated with a finalizer F, and also, via ephemeron E, some auxiliary data V. Imagine that at the end of the trace, O is unreachable and so will be dead. Imagine that F receives O as an argument, and that F looks up the association for O in E. Is the association to V still there?

Guile's documentation on guardians, a finalization-like facility, specifies that weak associations (i.e. ephemerons) remain in place when an object becomes collectable, though I think in practice this has been broken since Guile switched to the BDW-GC collector some 20 years ago or so and I would like to fix it.

One nice solution falls out if you prohibit resuscitation by not including finalizer closures in the root set and not passing the finalizable object to the finalizer function. In that way you will never be able to look up E×OV, because you don't have O. This is the path that JavaScript has taken, for example, with WeakMap and FinalizationRegistry.

However if you allow for resuscitation, for example by passing finalizable objects as an argument to finalizers, I am not sure that there is an optimal answer. Recall that with resuscitation, the trace proceeds in three phases: first trace the graph, then compute and enqueue the finalizables, then trace the finalizables. When do you perform the conjunction for the ephemeron trace? You could do so after the initial trace, which might augment the live set, protecting some objects from finalization, but possibly missing ephemeron associations added in the later trace of finalizable objects. Or you could trace ephemerons at the very end, preserving all associations for finalizable objects (and their referents), which would allow more objects to be finalized at the same time.

Probably if you trace ephemerons early you will also want to trace them later, as you would do so because you think ephemeron associations are important, as you want them to prevent objects from being finalized, and it would be weird if they were not present for finalizable objects. This adds more serialization to the trace algorithm, though:

  1. (Add finalizers to the root set?)

  2. Trace from the roots

  3. Trace ephemerons?

  4. Compute finalizables

  5. Trace finalizables (and finalizer closures if not done in 1)

  6. Trace ephemerons again?

These last few paragraphs are the reason for today's post. It's not clear to me that there is an optimal way to compose ephemerons and finalizers in the presence of resuscitation. If you add finalizers to the root set, you might prevent objects from being collected. If you defer them until later, you lose the optimization that you can skip steps 5 and 6 if there are no finalizables. If you trace (not-yet-visited) ephemerons twice, that's overhead; if you trace them only once, the user could get what they perceive as premature finalization of otherwise reachable objects.

In Guile I think I am going to try to add finalizers to the root set, pass the finalizable to the finalizer as an argument, and trace ephemerons twice if there are finalizable objects. I think this wil minimize incoming bug reports. I am bummed though that I can't eliminate them by construction.

Until next time, happy hacking!

October 30, 2022

Pixel Inktober

Just like last year, October was filled with quick pixel dailies. I decided to only post on mastodon, but due to the twitter exodus couldn’t quite post the 30kB images for the two remaining days. Good old blog post it is!

x 1. Gargoyle 2. Scurry 3. Bat 4. Scallop 5. Flame 6. Bouquet 7. Trip 8. Match 9. Nest 10. Crabby 11. Eagle 12. Forget 13. Kind 14. Empty 15. Armadillo 16. Fowl 17. Salty 18. Scrape 19. Ponytail 20. Bluff 21. Bad Dog 22. Heist 23. Booger 24. Fairy 25. Tempting 26. Ego 27. Snack 28. Camping 29. Uh-oh 30. Gear 31. Farm

Previously, Previously, Previously.

On deprecations

If you are paying attention to GTK’s git repository, you may have noticed a change in the last weeks.

We have a directory gtk/deprecations, which is destined to contain source files that implement deprecated APIs and will be dropped in the next major release. For the 4.0 release, we emptied it out, and it has been empty ever since. But recently, it started to accumulate files again.

This is a good opportunity to remind folks how we are using deprecations in GTK. But first, lets take a look at the details.

The details, part 1: cell renderers

In GTK 4, we introduced a new family of list and grid widgets that are based around list models: GtkListView, GtkColumnView, GtkGridView. There is also a new combo box implementation using list models, called GtkDropDown. Taken together, these are meant to provide replacements for everything you can do with cell renderers in GTK 3.

The ultimate goal was to remove cell renderers, since they are a whole separate rendering and layout system that tends to interfere with GTK’s CSS and layout machinery, and makes everything more complicated.

But we did not quite get to the finish line for 4.0, mainly because we still had significant uses of treeviews in GTK itself. First and foremost, the file chooser.  Since the filechooser is getting ported to use a GtkColumnView in 4.10, now is the right time to deprecate the cell renderer machinery and all the widgets that use them.

This is a significant amount of code, more than 75.000 lines.

The details, part 2: dialogs

In GTK 4, we dropped gtk_main() and gtk_dialog_run(), since recursive mainloops are best avoided. Again, we did not get to the finish line and could not remove GtkDialog itself, since it is used as the base class for all our complex dialogs.

GTK 4.10 introduces replacement APIs for our ‘Chooser’ dialogs. The new APIs follow the gio async pattern. Here is an example:

GtkFileDialog * gtk_file_dialog_new (void);

void            gtk_file_dialog_open (GtkFileDialog *self,
                                      GtkWindow *parent,
                                      GFile *current_file,
                                      GCancellable *cancellable,
                                      GAsyncReadyCallback callback,
                                      gpointer user_data);

GFile *        gtk_file_dialog_open_finish (GtkFileDialog *self,
                                            GAsyncResult *result,
                                            GError **error);

This may look a bit unwieldy in C, but it translates very nicely to languages that have a concept of promises and exceptions:

try {
  const file = await dialog.open(parent, ...);
  
  ...
} catch (e) {
  ...
};

To learn more about the new APIs, you can look at their online docs: GtkColorDialog, GtkFontDialog, GtkFileDialog, GtkAlertDialog.

With these replacements in place, we could deprecate the Chooser interfaces, their widget implementations, and their base class GtkDialog.

No need to panic

Deprecations in GTK are an early outlook at changes that will appear in the next major release that is breaking API compatibility.  But the eventual GTK 5 release is still far away. We have not even made a plan for it yet.

There is absolutely no need to rush towards ‘deprecation cleanup’. You only need to remove all uses of deprecations when you want to port to GTK 5 – which does not exist yet.

There are still things you can do, though. We are introducing deprecations in 4.10 as a way to give our users time to adapt, and to provide feedback on our ideas. If you want to do so, you can file an issue in gitlab, start a discussion in discourse, or find us on matrix.

In the meantime…

Deprecation warnings can be annoying, but thankfully there are easy ways to turn them off. For the occasional call to a deprecated function, it is best to just wrap it in G_GNUC_BEGIN/END_IGNORE_DEPRECATIONS:

G_GNUC_BEGIN_IGNORE_DEPRECATIONS
gtk_dialog_add_button (dialog, "Apply", GTK_RESPONSE_APPLY);
G_GNUC_END_IGNORE_DEPRECATIONS

If you are sure that you never ever want to see any deprecation warnings, you can also just pass -Wno-deprecated-declarations to gcc.

October 28, 2022

#67 File Descriptors and Scopes

Update on what happened across the GNOME project in the week from October 21 to October 28.

Core Apps and Libraries

GLib

The low-level core library that forms the basis for projects such as GTK and GNOME.

Philip Withnall announces

Simon McVittie has just added a g_autofd attribute to GLib, which you can use to automatically close FDs when exiting a scope, just like g_autofree and g_autoptr() (https://gitlab.gnome.org/GNOME/glib/-/merge_requests/3007)

Third Party Projects

Tagger

An easy-to-use music tag (metadata) editor.

Nick says

Tagger is now at V2022.10.5! This release added support for new audio file types. Here’s the changelog:

  • Added support for oga files
  • Added support for m4a files

Girens for Plex

Girens is a Plex Gtk client for playing movies, TV shows and music from your Plex library.

tijder says

Girens 2.0.0 is released. This is the biggest update of Girens since the first release. In this release the following things are done:

  • Migrated from GTK 3 to GTK 4
  • Migrated from Libhandy to Libadwaita
  • Migrated to Blueprint for ui files
  • Improved lists for large libraries (thanks to the new Gtk4 lists)
  • Redesigned the album/artist view
  • Redesigned the show view
  • Added French and Norwegian translations
  • Improved the windowed view
  • A lot of bug fixes

Also this week I added support for weblate translation. If someone want’s to help with translating, it is now a lot easier.

Login Manager Settings

A settings app for the login manager GDM.

Mazhar Hussain reports

Login Manager Settings v2.beta.0 has been released.

New Features

  • Power settings
  • Import/export to file
  • Enlarge welcome message shown on login

Bug Fixes

  • “Apply Current Display Settings” feature not working on Ubuntu and similar systems
  • Some text showing up untranslated even if translation existed (fix by Sabri Ünal)
  • Some typos (fix by Kian-Meng Ang)

Other Changes

  • The app is now adaptive

    • Main window is adaptive
    • Pop-up dialogs are adaptive
  • App uses the new “About” window

  • Terminal output is colored now

For full release notes visit https://github.com/realmazharhussain/gdm-settings/releases/tag/v2.beta.0.

Flare

A unofficial Signal GTK client.

schmiddi says

Flare 0.5.3 was released. Compared to the last TWIG-update, Flare gained minor new features as pasting attachments into the input box, opening attachments in the default program and notifications for incoming calls . In 0.5.3 an additional critical bug has been fixed that rendered the application useless since 26. Oct 2022 because Signal updated their certificates. I urge everyone to update such that Flare works again. I am sorry for any inconveniences that happened due to this incident and hope we can learn from it such that it does not happen again.

That’s all for this week!

See you next week, and be sure to stop by #thisweek:gnome.org with updates on your own projects!

October 27, 2022

Kernel 6.1-rc# might break backlight control on old/weird laptops, please test

I have landed a large(ish) refactor of the ACPI/x86 backlight detection code in the kernel for 6.1. I have been very careful to try and not break things but there is a special group of laptops where the ability to control the backlight brightness may disappear because of this.

The most likely laptops to be hit by this are laptops which are either pretty old and or which are weird in some other way (e.g. flashed with coreboot, did not ship with Windows as factory os, ...). Note Chromebooks are affected by this too, but that special category has already been fixed.

You can check if your laptop is affected by this by running "ls /sys/class/backlight" if this shows only 1 entry and that entry is named "intel_backlight", "nouveau_bl", "amdgpu_bl0" or "radeon_bl0" then your laptop might be affected.

Note this is quite normal on modern(ish) laptops, a second check is to boot with "acpi_backlight=video" added to the kernel commandline and then run "ls /sys/class/backlight" again, if you now additionally also have an "acpi_video0" entry then your laptop should work fine with 6.1, if you don't have an "acpi_video0" entry please first do "cat /proc/cmdline" and check that "acpi_backlight=video" is present there.

If you have e.g. only the "intel_backlight" entry and adding "acpi_backlight=video" does not cause an "acpi_video0" entry to appear then 6.1 will likely break backlight control!

If you have a laptop which is likely affected by this then please run the following commands:

  • ls /sys/class/backlight > ls-backlight.txt

  • sudo dmesg > dmesg.txt

  • sudo dmidecode > dmidecode.txt

  • sudo acpidump -o acpidump.txt

And send me an email about this at [email protected] with the 4 generated .txt files attached. If possible please also give an actual 6.1 kernel a try and see if that indeed breaks things. E.g. for Fedora you can find 6.1 kernel builds here and see here for some install instructions for these Fedora kernel builds.

October 23, 2022

Brave New Trusted Boot World

🔐 Brave New Trusted Boot World 🚀

This document looks at the boot process of general purpose Linux distributions. It covers the status quo and how we envision Linux boot to work in the future with a focus on robustness and simplicity.

This document will assume that the reader has comprehensive familiarity with TPM 2.0 security chips and their capabilities (e.g., PCRs, measurements, SRK), boot loaders, the shim binary, Linux, initrds, UEFI Firmware, PE binaries, and SecureBoot.

Problem Description

Status quo ante of the boot logic on typical Linux distributions:

  • Most popular Linux distributions generate initrds locally, and they are unsigned, thus not protected through SecureBoot (since that would require local SecureBoot key enrollment, which is generally not done), nor TPM PCRs.

  • Boot chain is typically Firmware → shimgrub → Linux kernel → initrd (dracut or similar) → root file system

  • Firmware’s UEFI SecureBoot protects shim, shim’s key management protects grub and kernel. No code signing protects initrd. initrd acquires the key for encrypted root fs from the user (or TPM/FIDO2/PKCS11).

  • shim/grub/kernel is measured into TPM PCR 4, among other stuff

  • EFI TPM event log reports measured data into TPM PCRs, and can be used to reconstruct and validate state of TPM PCRs from the used resources.

  • No userspace components are typically measured, except for what IMA measures

  • New kernels require locally generating new boot loader scripts and generating a new initrd each time. OS updates thus mean fragile generation of multiple resources and copying multiple files into the boot partition.

Problems with the status quo ante:

  • initrd typically unlocks root file system encryption, but is not protected whatsoever, and trivial to attack and modify offline

  • OS updates are brittle: PCR values of grub are very hard to pre-calculate, as grub measures chosen control flow path, not just code images. PCR values vary wildly, and OS provided resources are not measured into separate PCRs. Grub’s PCR measurements might be useful up to a point to reason about the boot after the fact, for the most basic remote attestation purposes, but useless for calculating them ahead of time during the OS build process (which would be desirable to be able to bind secrets to future expected PCR state, for example to bind secrets to an OS in a way that it remain accessible even after that OS is updated).

  • Updates of a boot loader are not robust, require multi-file updates of ESP and boot partition, and regeneration of boot scripts

  • No rollback protection (no way to cryptographically invalidate access to TPM-bound secrets on OS updates)

  • Remote attestation of running software is needlessly complex since initrds are generated locally and thus basically are guaranteed to vary on each system.

  • Locking resources maintained by arbitrary user apps to TPM state (PCRs) is not realistic for general purpose systems, since PCRs will change on every OS update, and there’s no mechanism to re-enroll each such resource before every OS update, and remove the old enrollment after the update.

  • There is no concept to cryptographically invalidate/revoke secrets for an older OS version once updated to a new OS version. An attacker thus can always access the secrets generated on old OSes if they manage to exploit an old version of the OS — even if a newer version already has been deployed.

Goals of the new design:

  • Provide a fully signed execution path from firmware to userspace, no exceptions

  • Provide a fully measured execution path from firmware to userspace, no exceptions

  • Separate out TPM PCRs assignments, by “owner” of measured resources, so that resources can be bound to them in a fine-grained fashion.

  • Allow easy pre-calculation of expected PCR values based on booted kernel/initrd, configuration, local identity of the system

  • Rollback protection

  • Simple & robust updates: one updated file per concept

  • Updates without requiring re-enrollment/local preparation of the TPM-protected resources (no more “brittle” PCR hashes that must be propagated into every TPM-protected resource on each OS update)

  • System ready for easy remote attestation, to prove validity of booted OS, configuration and local identity

  • Ability to bind secrets to specific phases of the boot, e.g. the root fs encryption key should be retrievable from the TPM only in the initrd, but not after the host transitioned into the root fs.

  • Reasonably secure, automatic, unattended unlocking of disk encryption secrets should be possible.

  • “Democratize” use of PCR policies by defining PCR register meanings, and making binding to them robust against updates, so that external projects can safely and securely bind their own data to them (or use them for remote attestation) without risking breakage whenever the OS is updated.

  • Build around TPM 2.0 (with graceful fallback for TPM-less systems if desired, but TPM 1.2 support is out of scope)

Considered attack scenarios and considerations:

  • Evil Maid: neither online nor offline (i.e. “at rest”), physical access to a storage device should enable an attacker to read the user’s plaintext data on disk (confidentiality); neither online nor offline, physical access to a storage device should allow undetected modification/backdooring of user data or OS (integrity), or exfiltration of secrets.

  • TPMs are assumed to be reasonably “secure”, i.e. can securely store/encrypt secrets. Communication to TPM is not “secure” though and must be protected on the wire.

  • Similar, the CPU is assumed to be reasonably “secure”

  • SecureBoot is assumed to be reasonably “secure” to permit validated boot up to and including shim+boot loader+kernel (but see discussion below)

  • All user data must be encrypted and authenticated. All vendor and administrator data must be authenticated.

  • It is assumed all software involved regularly contains vulnerabilities and requires frequent updates to address them, plus regular revocation of old versions.

  • It is further assumed that key material used for signing code by the OS vendor can reasonably be kept secure (via use of HSM, and similar, where secret key information never leaves the signing hardware) and does not require frequent roll-over.

Proposed Construction

Central to the proposed design is the concept of a Unified Kernel Image (UKI). These UKIs are the combination of a Linux kernel image, and initrd, a UEFI boot stub program (and further resources, see below) into one single UEFI PE file that can either be directly invoked by the UEFI firmware (which is useful in particular in some cloud/Confidential Computing environments) or through a boot loader (which is generally useful to implement support for multiple kernel versions, with interactive or automatic selection of image to boot into, potentially with automatic fallback management to increase robustness).

UKI Components

Specifically, UKIs typically consist of the following resources:

  1. An UEFI boot stub that is a small piece of code still running in UEFI mode and that transitions into the Linux kernel included in the UKI (e.g., as implemented in sd-stub, see below)

  2. The Linux kernel to boot in the .linux PE section

  3. The initrd that the kernel shall unpack and invoke in the .initrd PE section

  4. A kernel command line string, in the .cmdline PE section

  5. Optionally, information describing the OS this kernel is intended for, in the .osrel PE section (derived from /etc/os-release of the booted OS). This is useful for presentation of the UKI in the boot loader menu, and ordering it against other entries, using the included version information.

  6. Optionally, information describing kernel release information (i.e. uname -r output) in the .uname PE section. This is also useful for presentation of the UKI in the boot loader menu, and ordering it against other entries.

  7. Optionally, a boot splash to bring to screen before transitioning into the Linux kernel in the .splash PE section

  8. Optionally, a compiled Devicetree database file, for systems which need it, in the .dtb PE section

  9. Optionally, the public key in PEM format that matches the signatures of the .pcrsig PE section (see below), in a .pcrpkey PE section.

  10. Optionally, a JSON file encoding expected PCR 11 hash values seen from userspace once the UKI has booted up, along with signatures of these expected PCR 11 hash values, matching a specific public key in the .pcrsig PE section. (Note: we use plural for “values” and “signatures” here, as this JSON file will typically carry a separate value and signature for each PCR bank for PCR 11, i.e. one pair of value and signature for the SHA1 bank, and another pair for the SHA256 bank, and so on. This ensures when enrolling or unlocking a TPM-bound secret we’ll always have a signature around matching the banks available locally (after all, which banks the local hardware supports is up to the hardware). For the sake of simplifying this already overly complex topic, we’ll pretend in the rest of the text there was only one PCR signature per UKI we have to care about, even if this is not actually the case.)

Given UKIs are regular UEFI PE files, they can thus be signed as one for SecureBoot, protecting all of the individual resources listed above at once, and their combination. Standard Linux tools such as sbsigntool and pesign can be used to sign UKI files.

UKIs wrap all of the above data in a single file, hence all of the above components can be updated in one go through single file atomic updates, which is useful given that the primary expected storage place for these UKIs is the UEFI System Partition (ESP), which is a vFAT file system, with its limited data safety guarantees.

UKIs can be generated via a single, relatively simple objcopy invocation, that glues the listed components together, generating one PE binary that then can be signed for SecureBoot. (For details on building these, see below.)

Note that the primary location to place UKIs in is the EFI System Partition (or an otherwise firmware accessible file system). This typically means a VFAT file system of some form. Hence an effective UKI size limit of 4GiB is in place, as that’s the largest file size a FAT32 file system supports.

Basic UEFI Stub Execution Flow

The mentioned UEFI stub program will execute the following operations in UEFI mode before transitioning into the Linux kernel that is included in its .linux PE section:

  1. The PE sections listed are searched for in the invoked UKI the stub is part of, and superficially validated (i.e. general file format is in order).

  2. All PE sections listed above of the invoked UKI are measured into TPM PCR 11. This TPM PCR is expected to be all zeroes before the UKI initializes. Pre-calculation is thus very straight-forward if the resources included in the PE image are known. (Note: as a single exception the .pcrsig PE section is excluded from this measurement, as it is supposed to carry the expected result of the measurement, and thus cannot also be input to it, see below for further details about this section.)

  3. If the .splash PE section is included in the UKI it is brought onto the screen

  4. If the .dtb PE section is included in the UKI it is activated using the Devicetree UEFI “fix-up” protocol

  5. If a command line was passed from the boot loader to the UKI executable it is discarded if SecureBoot is enabled and the command line from the .cmdline used. If SecureBoot is disabled and a command line was passed it is used in place of the one from .cmdline. Either way the used command line is measured into TPM PCR 12. (This of course removes any flexibility of control of the kernel command line of the local user. In many scenarios this is probably considered beneficial, but in others it is not, and some flexibility might be desired. Thus, this concept probably needs to be extended sooner or later, to allow more flexible kernel command line policies to be enforced via definitions embedded into the UKI. For example: allowing definition of multiple kernel command lines the user/boot menu can select one from; allowing additional allowlisted parameters to be specified; or even optionally allowing any verification of the kernel command line to be turned off even in SecureBoot mode. It would then be up to the builder of the UKI to decide on the policy of the kernel command line.)

  6. It will set a couple of volatile EFI variables to inform userspace about executed TPM PCR measurements (and which PCR registers were used), and other execution properties. (For example: the EFI variable StubPcrKernelImage in the 4a67b082-0a4c-41cf-b6c7-440b29bb8c4f vendor namespace indicates the PCR register used for the UKI measurement, i.e. the value “11”).

  7. An initrd cpio archive is dynamically synthesized from the .pcrsig and .pcrpkey PE section data (this is later passed to the invoked Linux kernel as additional initrd, to be overlaid with the main initrd from the .initrd section). These files are later available in the /.extra/ directory in the initrd context.

  8. The Linux kernel from the .linux PE section is invoked with with a combined initrd that is composed from the blob from the .initrd PE section, the dynamically generated initrd containing the .pcrsig and .pcrpkey PE sections, and possibly some additional components like sysexts or syscfgs.

TPM PCR Assignments

In the construction above we take possession of two PCR registers previously unused on generic Linux distributions:

  • TPM PCR 11 shall contain measurements of all components of the UKI (with exception of the .pcrsig PE section, see above). This PCR will also contain measurements of the boot phase once userspace takes over (see below).

  • TPM PCR 12 shall contain measurements of the used kernel command line. (Plus potentially other forms of parameterization/configuration passed into the UKI, not discussed in this document)

On top of that we intend to define two more PCR registers like this:

  • TPM PCR 15 shall contain measurements of the volume encryption key of the root file system of the OS.

  • [TPM PCR 13 shall contain measurements of additional extension images for the initrd, to enable a modularized initrd – not covered by this document]

(See the Linux TPM PCR Registry for an overview how these four PCRs fit into the list of Linux PCR assignments.)

For all four PCRs the assumption is that they are zero before the UKI initializes, and only the data that the UKI and the OS measure into them is included. This makes pre-calculating them straightforward: given a specific set of UKI components, it is immediately clear what PCR values can be expected in PCR 11 once the UKI booted up. Given a kernel command line (and other parameterization/configuration) it is clear what PCR values are expected in PCR 12.

Note that these four PCRs are defined by the conceptual “owner” of the resources measured into them. PCR 11 only contains resources the OS vendor controls. Thus it is straight-forward for the OS vendor to pre-calculate and then cryptographically sign the expected values for PCR 11. The PCR 11 values will be identical on all systems that run the same version of the UKI. PCR 12 only contains resources the administrator controls, thus the administrator can pre-calculate PCR values, and they will be correct on all instances of the OS that use the same parameters/configuration. PCR 15 only contains resources inherently local to the local system, i.e. the cryptographic key material that encrypts the root file system of the OS.

Separating out these three roles does not imply these actually need to be separate when used. However the assumption is that in many popular environments these three roles should be separate.

By separating out these PCRs by the owner’s role, it becomes straightforward to remotely attest, individually, on the software that runs on a node (PCR 11), the configuration it uses (PCR 12) or the identity of the system (PCR 15). Moreover, it becomes straightforward to robustly and securely encrypt data so that it can only be unlocked on a specific set of systems that share the same OS, or the same configuration, or have a specific identity – or a combination thereof.

Note that the mentioned PCRs are so far not typically used on generic Linux-based operating systems, to our knowledge. Windows uses them, but given that Windows and Linux should typically not be included in the same boot process this should be unproblematic, as Windows’ use of these PCRs should thus not conflict with ours.

To summarize:

PCR Purpose Owner Expected Value before UKI boot Pre-Calculable
11 Measurement of UKI components and boot phases OS Vendor Zero Yes
(at UKI build time)
12 Measurement of kernel command line, additional kernel runtime configuration such as systemd credentials, systemd syscfg images Administrator Zero Yes
(when system configuration is assembled)
13 System Extension Images of initrd
(and possibly more)
(Administrator) Zero Yes
(when set of extensions is assembled)
15 Measurement of root file system volume key
(Possibly later more: measurement of root file system UUIDs and labels and of the machine ID /etc/machine-id)
Local System Zero Yes
(after first boot once ll such IDs are determined)

Signature Keys

In the model above in particular two sets of private/public key pairs are relevant:

  • The SecureBoot key to sign the UKI PE executable with. This controls permissible choices of OS/kernel

  • The key to sign the expected PCR 11 values with. Signatures made with this key will end up in the .pcrsig PE section. The public key part will end up in the .pcrpkey PE section.

Typically the key pair for the PCR 11 signatures should be chosen with a narrow focus, reused for exactly one specific OS (e.g. “Fedora Desktop Edition”) and the series of UKIs that belong to it (all the way through all the versions of the OS). The SecureBoot signature key can be used with a broader focus, if desired. By keeping the PCR 11 signature key narrow in focus one can ensure that secrets bound to the signature key can only be unlocked on the narrow set of UKIs desired.

TPM Policy Use

Depending on the intended access policy to a resource protected by the TPM, one or more of the PCRs described above should be selected to bind TPM policy to.

For example, the root file system encryption key should likely be bound to TPM PCR 11, so that it can only be unlocked if a specific set of UKIs is booted (it should then, once acquired, be measured into PCR 15, as discussed above, so that later TPM objects can be bound to it, further down the chain). With the model described above this is reasonably straight-forward to do:

  • When userspace wants to bind disk encryption to a specific series of UKIs (“enrollment”), it looks for the public key passed to the initrd in the /.extra/ directory (which as discussed above originates in the .pcrpkey PE section of the UKI). The relevant userspace component (e.g. systemd) is then responsible for generating a random key to be used as symmetric encryption key for the storage volume (let’s call it disk encryption key _here, DEK_). The TPM is then used to encrypt (“seal”) the DEK with its internal Storage Root Key (TPM SRK). A TPM2 policy is bound to the encrypted DEK. The policy enforces that the DEK may only be decrypted if a valid signature is provided that matches the state of PCR 11 and the public key provided in the /.extra/ directory of the initrd. The plaintext DEK key is passed to the kernel to implement disk encryption (e.g. LUKS/dm-crypt). (Alternatively, hardware disk encryption can be used too, i.e. Intel MKTME, AMD SME or even OPAL, all of which are outside of the scope of this document.) The TPM-encrypted version of the DEK which the TPM returned is written to the encrypted volume’s superblock.

  • When userspace wants to unlock disk encryption on a specific UKI, it looks for the signature data passed to the initrd in the /.extra/ directory (which as discussed above originates in the .pcrsig PE section of the UKI). It then reads the encrypted version of the DEK from the superblock of the encrypted volume. The signature and the encrypted DEK are then passed to the TPM. The TPM then checks if the current PCR 11 state matches the supplied signature from the .pcrsig section and the public key used during enrollment. If all checks out it decrypts (“unseals”) the DEK and passes it back to the OS, where it is then passed to the kernel which implements the symmetric part of disk encryption.

Note that in this scheme the encrypted volume’s DEK is not bound to specific literal PCR hash values, but to a public key which is expected to sign PCR hash values.

Also note that the state of PCR 11 only matters during unlocking. It is not used or checked when enrolling.

In this scenario:

  • Input to the TPM part of the enrollment process are the TPM’s internal SRK, the plaintext DEK provided by the OS, and the public key later used for signing expected PCR values, also provided by the OS. – Output is the encrypted (“sealed”) DEK.

  • Input to the TPM part of the unlocking process are the TPM’s internal SRK, the current TPM PCR 11 values, the public key used during enrollment, a signature that matches both these PCR values and the public key, and the encrypted DEK. – Output is the plaintext (“unsealed”) DEK.

Note that sealing/unsealing is done entirely on the TPM chip, the host OS just provides the inputs (well, only the inputs that the TPM chip doesn’t know already on its own), and receives the outputs. With the exception of the plaintext DEK, none of the inputs/outputs are sensitive, and can safely be stored in the open. On the wire the plaintext DEK is protected via TPM parameter encryption (not discussed in detail here because though important not in scope for this document).

TPM PCR 11 is the most important of the mentioned PCRs, and its use is thus explained in detail here. The other mentioned PCRs can be used in similar ways, but signatures/public keys must be provided via other means.

This scheme builds on the functionality Linux’ LUKS2 functionality provides, i.e. key management supporting multiple slots, and the ability to embed arbitrary metadata in the encrypted volume’s superblock. Note that this means the TPM2-based logic explained here doesn’t have to be the only way to unlock an encrypted volume. For example, in many setups it is wise to enroll both this TPM-based mechanism and an additional “recovery key” (i.e. a high-entropy computer generated passphrase the user can provide manually in case they lose access to the TPM and need to access their data), of which either can be used to unlock the volume.

Boot Phases

Secrets needed during boot-up (such as the root file system encryption key) should typically not be accessible anymore afterwards, to protect them from access if a system is attacked during runtime. To implement this the scheme above is extended in one way: at certain milestones of the boot process additional fixed “words” should be measured into PCR 11. These milestones are placed at conceptual security boundaries, i.e. whenever code transitions from a higher privileged context to a less privileged context.

Specifically:

  • When the initrd initializes (“initrd-enter”)

  • When the initrd transitions into the root file system (“initrd-leave”)

  • When the early boot phase of the OS on the root file system has completed, i.e. all storage and file systems have been set up and mounted, immediately before regular services are started (“sysinit”)

  • When the OS on the root file system completed the boot process far enough to allow unprivileged users to log in (“complete”)

  • When the OS begins shut down (“shutdown”)

  • When the service manager is mostly finished with shutting down and is about to pass control to the final phase of the shutdown logic (“final”)

By measuring these additional words into PCR 11 the distinct phases of the boot process can be distinguished in a relatively straight-forward fashion and the expected PCR values in each phase can be determined.

The phases are measured into PCR 11 (as opposed to some other PCR) mostly because available PCRs are scarce, and the boot phases defined are typically specific to a chosen OS, and hence fit well with the other data measured into PCR 11: the UKI which is also specific to the OS. The OS vendor generates both the UKI and defines the boot phases, and thus can safely and reliably pre-calculate/sign the expected PCR values for each phase of the boot.

Revocation/Rollback Protection

In order to secure secrets stored at rest, in particular in environments where unattended decryption shall be possible, it is essential that an attacker cannot use old, known-buggy – but properly signed – versions of software to access them.

Specifically, if disk encryption is bound to an OS vendor (via UKIs that include expected PCR values, signed by the vendor’s public key) there must be a mechanism to lock out old versions of the OS or UKI from accessing TPM based secrets once it is determined that the old version is vulnerable.

To implement this we propose making use of one of the “counters” TPM 2.0 devices provide: integer registers that are persistent in the TPM and can only be increased on request of the OS, but never be decreased. When sealing resources to the TPM, a policy may be declared to the TPM that restricts how the resources can later be unlocked: here we use one that requires that along with the expected PCR values (as discussed above) a counter integer range is provided to the TPM chip, along with a suitable signature covering both, matching the public key provided during sealing. The sealing/unsealing mechanism described above is thus extended: the signature passed to the TPM during unsealing now covers both the expected PCR values and the expected counter range. To be able to use a signature associated with an UKI provided by the vendor to unseal a resource, the counter thus must be at least increased to the lower end of the range the signature is for. By doing so the ability is lost to unseal the resource for signatures associated with older versions of the UKI, because their upper end of the range disables access once the counter has been increased far enough. By carefully choosing the upper and lower end of the counter range whenever the PCR values for an UKI shall be signed it is thus possible to ensure that updates can invalidate prior versions’ access to resources. By placing some space between the upper and lower end of the range it is possible to allow a controlled level of fallback UKI support, with clearly defined milestones where fallback to older versions of an UKI is not permitted anymore.

Example: a hypothetical distribution FooOS releases a regular stream of UKI kernels 5.1, 5.2, 5.3, … It signs the expected PCR values for these kernels with a key pair it maintains in a HSM. When signing UKI 5.1 it includes information directed at the TPM in the signed data declaring that the TPM counter must be above 100, and below 120, in order for the signature to be used. Thus, when the UKI is booted up and used for unlocking an encrypted volume the unlocking code must first increase the counter to 100 if needed, as the TPM will otherwise refuse unlocking the volume. The next release of the UKI, i.e. UKI 5.2 is a feature release, i.e. reverting back to the old kernel locally is acceptable. It thus does not increase the lower bound, but it increases the upper bound for the counter in the signature payload, thus encoding a valid range 100…121 in the signed payload. Now a major security vulnerability is discovered in UKI 5.1. A new UKI 5.3 is prepared that fixes this issue. It is now essential that UKI 5.1 can no longer be used to unlock the TPM secrets. Thus UKI 5.3 will bump the lower bound to 121, and increase the upper bound by one, thus allowing a range 121…122. Or in other words: for each new UKI release the signed data shall include a counter range declaration where the upper bound is increased by one. The lower range is left as-is between releases, except when an old version shall be cut off, in which case it is bumped to one above the upper bound used in that release.

UKI Generation

As mentioned earlier, UKIs are the combination of various resources into one PE file. For most of these individual components there are pre-existing tools to generate the components. For example the included kernel image can be generated with the usual Linux kernel build system. The initrd included in the UKI can be generated with existing tools such as dracut and similar. Once the basic components (.linux, .initrd, .cmdline, .splash, .dtb, .osrel, .uname) have been acquired the combination process works roughly like this:

  1. The expected PCR 11 hashes (and signatures for them) for the UKI are calculated. The tool for that takes all basic UKI components and a signing key as input, and generates a JSON object as output that includes both the literal expected PCR hash values and a signature for them. (For all selected TPM2 banks)

  2. The EFI stub binary is now combined with the basic components, the generated JSON PCR signature object from the first step (in the .pcrsig section) and the public key for it (in the .pcrpkey section). This is done via a simple “objcopy” invocation resulting in a single UKI PE binary.

  3. The resulting EFI PE binary is then signed for SecureBoot (via a tool such as sbsign or similar).

Note that the UKI model implies pre-built initrds. How to generate these (and securely extend and parameterize them) is outside of the scope of this document, but a related document will be provided highlighting these concepts.

Protection Coverage of SecureBoot Signing and PCRs

The scheme discussed here touches both SecureBoot code signing and TPM PCR measurements. These two distinct mechanisms cover separate parts of the boot process.

Specifically:

  • Firmware/Shim SecureBoot signing covers bootloader and UKI

  • TPM PCR 11 covers the UKI components and boot phase

  • TPM PCR 12 covers admin configuration

  • TPM PCR 15 covers the local identity of the host

Note that this means SecureBoot coverage ends once the system transitions from the initrd into the root file system. It is assumed that trust and integrity have been established before this transition by some means, for example LUKS/dm-crypt/dm-integrity, ideally bound to PCR 11 (i.e. UKI and boot phase).

A robust and secure update scheme for PCR 11 (i.e. UKI) has been described above, which allows binding TPM-locked resources to a UKI. For PCR 12 no such scheme is currently designed, but might be added later (use case: permit access to certain secrets only if the system runs with configuration signed by a specific set of keys). Given that resources measured into PCR 15 typically aren’t updated (or if they are updated loss of access to other resources linked to them is desired) no update scheme should be necessary for it.

This document focuses on the three PCRs discussed above. Disk encryption and other userspace may choose to also bind to other PCRs. However, doing so means the PCR brittleness issue returns that this design is supposed to remove. PCRs defined by the various firmware UEFI/TPM specifications generally do not know any concept for signatures of expected PCR values.

It is known that the industry-adopted SecureBoot signing keys are too broad to act as more than a denylist for known bad code. It is thus probably a good idea to enroll vendor SecureBoot keys wherever possible (e.g. in environments where the hardware is very well known, and VM environments), to raise the bar on preparing rogue UKI-like PE binaries that will result in PCR values that match expectations but actually contain bad code. Discussion about that is however outside of the scope of this document.

Whole OS embedded in the UKI

The above is written under the assumption that the UKI embeds an initrd whose job it is to set up the root file system: find it, validate it, cryptographically unlock it and similar. Once the root file system is found, the system transitions into it.

While this is the traditional design and likely what most systems will use, it is also possible to embed a regular root file system into the UKI and avoid any transition to an on-disk root file system. In this mode the whole OS would be encapsulated in the UKI, and signed/measured as one. In such a scenario the whole of the OS must be loaded into RAM and remain there, which typically restricts the general usability of such an approach. However, for specific purposes this might be the design of choice, for example to implement self-sufficient recovery or provisioning systems.

Proposed Implementations & Current Status

The toolset for most of the above is already implemented in systemd and related projects in one way or another. Specifically:

  1. The systemd-stub (or short: sd-stub) component implements the discussed UEFI stub program

  2. The systemd-measure tool can be used to pre-calculate expected PCR 11 values given the UKI components and can sign the result, as discussed in the UKI Image Generation section above.

  3. The systemd-cryptenroll and systemd-cryptsetup tools can be used to bind a LUKS2 encrypted file system volume to a TPM and PCR 11 public key/signatures, according to the scheme described above. (The two components also implement a “recovery key” concept, as discussed above)

  4. The systemd-pcrphase component measures specific words into PCR 11 at the discussed phases of the boot process.

  5. The systemd-creds tool may be used to encrypt/decrypt data objects called “credentials” that can be passed into services and booted systems, and are automatically decrypted (if needed) immediately before service invocation. Encryption is typically bound to the local TPM, to ensure the data cannot be recovered elsewhere.

Note that systemd-stub (i.e. the UEFI code glued into the UKI) is distinct from systemd-boot (i.e. the UEFI boot loader than can manage multiple UKIs and other boot menu items and implements automatic fallback, an interactive menu and a programmatic interface for the OS among other things). One can be used without the other – both sd-stub without sd-boot and vice versa – though they integrate nicely if used in combination.

Note that the mechanisms described are relatively generic, and can be implemented and be consumed in other software too, systemd should be considered a reference implementation, though one that found comprehensive adoption across Linux distributions.

Some concepts discussed above are currently not implemented. Specifically:

  1. The rollback protection logic is currently not implemented.

  2. The mentioned measurement of the root file system volume key to PCR 15 is implemented, but not merged into the systemd main branch yet.

The UAPI Group

We recently started a new group for discussing concepts and specifications of basic OS components, including UKIs as described above. It's called the UAPI Group. Please have a look at the various documents and specifications already available there, and expect more to come. Contributions welcome!

Glossary

TPM

Trusted Platform Module; a security chip found in many modern systems, both physical systems and increasingly also in virtualized environments. Traditionally a discrete chip on the mainboard but today often implemented in firmware, and lately directly in the CPU SoC.

PCR

Platform Configuration Register; a set of registers on a TPM that are initialized to zero at boot. The firmware and OS can “extend” these registers with hashes of data used during the boot process and afterwards. “Extension” means the supplied data is first cryptographically hashed. The resulting hash value is then combined with the previous value of the PCR and the combination hashed again. The result will become the new value of the PCR. By doing this iteratively for all parts of the boot process (always with the data that will be used next during the boot process) a concept of “Measured Boot” can be implemented: as long as every element in the boot chain measures (i.e. extends into the PCR) the next part of the boot like this, the resulting PCR values will prove cryptographically that only a certain set of boot components can have been used to boot up. A standards compliant TPM usually has 24 PCRs, but more than half of those are already assigned specific meanings by the firmware. Some of the others may be used by the OS, of which we use four in the concepts discussed in this document.

Measurement

The act of “extending” a PCR with some data object.

SRK

Storage Root Key; a special cryptographic key generated by a TPM that never leaves the TPM, and can be used to encrypt/decrypt data passed to the TPM.

UKI

Unified Kernel Image; the concept this document is about. A combination of kernel, initrd and other resources. See above.

SecureBoot

A mechanism where every software component involved in the boot process is cryptographically signed and checked against a set of public keys stored in the mainboard hardware, implemented in firmware, before it is used.

Measured Boot

A boot process where each component measures (i.e., hashes and extends into a TPM PCR, see above) the next component it will pass control to before doing so. This serves two purposes: it can be used to bind security policy for encrypted secrets to the resulting PCR values (or signatures thereof, see above), and it can be used to reason about used software after the fact, for example for the purpose of remote attestation.

initrd

Short for “initial RAM disk”, which – strictly speaking – is a misnomer today, because no RAM disk is anymore involved, but a tmpfs file system instance. Also known as “initramfs”, which is also misleading, given the file system is not ramfs anymore, but tmpfs (both of which are in-memory file systems on Linux, with different semantics). The initrd is passed to the Linux kernel and is basically a file system tree in cpio archive. The kernel unpacks the image into a tmpfs (i.e., into an in-memory file system), and then executes a binary from it. It thus contains the binaries for the first userspace code the kernel invokes. Typically, the initrd’s job is to find the actual root file system, unlock it (if encrypted), and transition into it.

UEFI

Short for “Unified Extensible Firmware Interface”, it is a widely adopted standard for PC firmware, with native support for SecureBoot and Measured Boot.

EFI

More or less synonymous to UEFI, IRL.

Shim

A boot component originating in the Linux world, which in a way extends the public key database SecureBoot maintains (which is under control from Microsoft) with a second layer (which is under control of the Linux distributions and of the owner of the physical device).

PE

Portable Executable; a file format for executable binaries, originally from the Windows world, but also used by UEFI firmware. PE files may contain code and data, categorized in labeled “sections”

ESP

EFI System Partition; a special partition on a storage medium that the firmware is able to look for UEFI PE binaries in to execute at boot.

HSM

Hardware Security Module; a piece of hardware that can generate and store secret cryptographic keys, and execute operations with them, without the keys leaving the hardware (though this is configurable). TPMs can act as HSMs.

DEK

Disk Encryption Key; an asymmetric cryptographic key used for unlocking disk encryption, i.e. passed to LUKS/dm-crypt for activating an encrypted storage volume.

LUKS2

Linux Unified Key Setup Version 2; a specification for a superblock for encrypted volumes widely used on Linux. LUKS2 is the default on-disk format for the cryptsetup suite of tools. It provides flexible key management with multiple independent key slots and allows embedding arbitrary metadata in a JSON format in the superblock.

Thanks

I’d like to thank Alain Gefflaut, Anna Trikalinou, Christian Brauner, Daan de Meyer, Luca Boccassi, Zbigniew Jędrzejewski-Szmek for reviewing this text.

Devhelp

Tobias Bernard has recently published this article on the Planet GNOME and mentions Devhelp. Since I did a fair amount of development contributions to Devhelp more or less recently (compared to the whole GNOME history), I would like to talk a bit more about it.

Devhelp has always been a Local-First app, not requiring an internet connection to work. It hasn't been created by me, it existed long before I started developing with GTK, so the original authors and past contributors need to be thanked. The first commits were done by Johan Dahlin in 2001. In comparison my first commit to Devhelp was in 2015.

Tobias writes: "maybe we could revive/modernize Devhelp?". And as a suggestion at the end of his article: "Revive or replace the Devhelp offline documentation app", among other items. Between 2015 and 2021, I contributed 1024 commits, and there were other contributors. Here is a summary of what was the roadmap, to give an idea.

On the roadmap, see especially the last two items (still in the todo state) which I think are very interesting:

  • Download the latest stable/unstable GNOME API references - Only for the GNOME profile
  • Implement a start page specific for GTK/GNOME - Only for the GNOME profile

The concept of a "profile" for Devhelp doesn't come from me, there was a prototype done by Aleksander Morgado. The idea is to have different book shelfs for Devhelp: one set of books for GLib/GTK/GNOME (possibly with the ability to choose the GNOME version), and the possiblity to create other profiles because Devhelp is a generic API documentation browser and can be used by other software communities.

The done items in the roadmap were basically for two purposes:

  • Reducing the backlog of maintenance tasks, with an eye towards GTK 4.
  • Almost all the preparation for a clean and robust solution to implement profiles.

I like to think of the profile system preparation that I did as similar as glvnd (see this post by Christian Schaller and search "glvnd" and "hacks"), but for Devhelp it's at a much smaller scale of course.

One more thing, see this commit by Emmanuele Bassi (2021). This didn't (and still doesn't) help my motivation to contribute to Devhelp again.

On a related note, this reminds me of this GTK to Qt migration developer feedback (I really wish things can change!).

I hope my work in Devhelp is or was appreciated, although under-the-hood changes are mostly invisible. Thank you.

Even though it's not possible to write comments on this blog, don't hesitate to drop me an email. I do read them, and like to have feedbacks and to exchange ideas.

GAFAM to MAGMA

The GAFAM are evil, and the nice thing about it is that we can call them the MAGMA now (replace the F with M for Meta).

We can also call the MAGMA a form of hyper-capitalism: they are so big that they destroy any kind of competition, by either buying other companies, or creating something better. The "barrier to entry" to compete with them is just way too high.

So it's urgent that at some point the governments act to split these big corporations and stop the magma from rolling in any further.

Making Rust attractive for writing GTK applications

Rust, the programming language, has been gaining traction across many software disciplines - support for it has landed in the upstream Linux kernel, developers have been using it for games, websites, low-level OS components, and desktop applications.

The gtk-rs team has been doing an impressive amount of work during the last few years to make the experience of using GObject-based libraries in Rust enjoyable by providing high-quality, memory-safe bindings around those libraries, generated with gir from the introspection data.

Approximately two years ago and a few months before the release of GTK 4, I decided to take over the maintenance of gtk4-rs and push forward the initial work made by Xiang Fan during a Google Summer of Code internship. Nowadays, these are the most used GTK 4 bindings out there with probably more than 100 applications written in it, ranging from simple applications like Contrast to complex ones like Telegrand or Spot.

In this post, I will talk about the current status and what we have achieved since the first release of gtk4-rs.

Bindings

As mentioned above, a good portion of the bindings is generated automatically using gir. But sometimes, a manual implementation is needed, like in the following cases:

  • Make the code more Rust idiomatic
  • Handling cases that are too specific to be supported by gir. e.g, a x_get_type function being exposed only in a specific version.
  • Writing the necessary infrastructure code to allow developers to create custom GObjects, e.g. custom widgets or GStreamer plugins

Currently, gtk4-rs is composed of ~170 000 lines of code automatically generated and ~26 000 manually written, which is approximately 13% of manual code.

Subclassing

In the early days of gtk3-rs, the infrastructure for writing custom subclassses wasn't fully there yet, especially the amount of manual code that had to be written for supporting all the virtual functions (that can be overridden by a sub-implementation) of gtk::Widget was a lot. That caused people to avoid writing custom GTK widgets and do plenty of hacks like having one single ObjectWrapper GObject that would serialize a Rust struct into a JSON to store it as a string property in order to store these Rust types in a gio::ListModel and use it with gtk::ListBox::bind_model/gtk::FlowBox::bind_model.

Thankfully that is no longer the case for gtk4-rs as a huge amount of work went into manually implementing the necessary traits to support almost all of the types that can be subclassed with the exception of gtk::TreeModel (which would be deprecated starting from GTK 4.10), see https://github.com/gtk-rs/gtk4-rs/pull/169 for details why that didn't happen yet.

As more people started writing custom GTK widgets/GStreamer plugins, more people looking into simplifying the whole experience grew.

Creating a very simple and useless custom GTK widget looked like this the days of gtk-rs-core 0.10:

mod imp {
    pub struct SimpleWidget;

    impl ObjectSubclass for SimpleWidget {
        const NAME: &'static str = "SimpleWidget";
        type ParentType = gtk::Widget;
        type Instance = subclass::simple::InstanceStruct<Self>;
        type Class = subclass::simple::ClassStruct<Self>;

        glib_object_subclass!();

        fn new() -> Self {
            Self
        }
    }

    impl ObjectImpl for SimpleWidget {
        glib_object_impl!();
    }
    impl WidgetImpl for SimpleWidget {}
}
glib::wrapper! {
    pub struct SimpleWidget(ObjectSubclass<imp::SimpleWidget>)
        @extends gtk::Widget;
}

Nowadays it looks like this:

mod imp {
    #[derive(Default)]
    pub struct SimpleWidget;

    #[glib::object_subclass]
    impl ObjectSubclass for SimpleWidget {
        const NAME: &'static str = "SimpleWidget";
        type ParentType = gtk::Widget;
    }

    impl ObjectImpl for SimpleWidget {}
    impl WidgetImpl for SimpleWidget {}
}
glib::wrapper! {
    pub struct SimpleWidget(ObjectSubclass<imp::SimpleWidget>)
        @extends gtk::Widget;
}

Of course, the overall code is still a bit too verbose, particularly when you have to define GObject properties. However, people have been experimenting with writing a derive macro to simplify the properties declaration part, as well as a gobject::class! macro for generating most of the remaining boilerplate code. Note, those macros are still experiments and would need more time to mature before eventually getting merged upstream.

Composite Templates

In short, composite templates allow you to make your custom GtkWidget subclass use GTK XML UI definitions for providing the widget structure, splitting the UI code from it logic. The UI part can be either inlined in the code, or written in a separate file, with optional runtime validation using the xml_validation feature (unless you are using GResources).

mod imp {
    #[derive(Default, gtk::CompositeTemplate)]
    #[template(string = r#"
    <interface>
      <template class="SimpleWidget" parent="GtkWidget">
        <child>
          <object class="GtkLabel" id="label">
            <property name="label">foobar</property>
          </object>
        </child>
      </template>
    </interface>
    "#)]
    pub struct SimpleWidget {
        #[template_child]
        pub label: TemplateChild<gtk::Label>,
    }

    #[glib::object_subclass]
    impl ObjectSubclass for SimpleWidget {
        const NAME: &'static str = "SimpleWidget";
        type ParentType = gtk::Widget;

        fn class_init(klass: &mut Self::Class) {
            klass.bind_template();
        }

        fn instance_init(obj: &gtk::glib::subclass::InitializingObject<Self>) {
            obj.init_template();
        }
    }

    impl ObjectImpl for SimpleWidget {
        fn dispose(&self) {
            // since gtk 4.8 and only if you are using composite templates
            self.obj().dispose_template(Self::Type);
            // before gtk 4.8, for each direct child widget
            // while let Some(child) = self.obj().first_child() {
            //     child.unparent();
            // }
        }
    }
    impl WidgetImpl for SimpleWidget {}
}
glib::wrapper! {
    pub struct SimpleWidget(ObjectSubclass<imp::SimpleWidget>)
        @extends gtk::Widget;
}

Composite templates also allow you to also set a function to be called when a specific signal is emitted:

mod imp {
    #[derive(Default, gtk::CompositeTemplate)]
    #[template(string = r#"
    <interface>
      <template class="SimpleWidget" parent="GtkWidget">
        <child>
          <object class="GtkButton" id="button">
            <property name="label">Click me!</property>
            <signal name="clicked" handler="on_clicked" swapped="true" />
          </object>
        </child>
      </template>
    </interface>
    "#)]
    pub struct SimpleWidget {
        #[template_child]
        pub label: TemplateChild<gtk::Label>,
    }

    #[glib::object_subclass]
    impl ObjectSubclass for SimpleWidget {
        const NAME: &'static str = "SimpleWidget";
        type ParentType = gtk::Widget;

        fn class_init(klass: &mut Self::Class) {
            klass.bind_template();
            klass.bind_template_instance_callbacks();
        }

        fn instance_init(obj: &gtk::glib::subclass::InitializingObject<Self>) {
            obj.init_template();
        }
    }

    impl ObjectImpl for SimpleWidget {
        fn dispose(&self) {
            // since gtk 4.8 and only if you are using composite templates
            self.obj().dispose_template(Self::Type);
            // before gtk 4.8, for each direct child widget
            // while let Some(child) = self.obj().first_child() {
            //     child.unparent();
            // }
        }
    }
    impl WidgetImpl for SimpleWidget {}
}
glib::wrapper! {
    pub struct SimpleWidget(ObjectSubclass<imp::SimpleWidget>)
        @extends gtk::Widget;
}

#[gtk::template_callbacks]
impl SimpleWidget {
    #[template_callback]
    fn on_clicked(&self, _button: &gtk::Button) {
        println!("Clicked!");
    }
}

More details can be found in https://gtk-rs.org/gtk4-rs/stable/latest/docs/gtk4_macros/derive.CompositeTemplate.html and https://gtk-rs.org/gtk4-rs/stable/latest/docs/gtk4_macros/attr.template_callbacks.html.

Book, Documentations, Examples

Great bindings by themselves sadly won't help newcomers trying to learn GTK for the first time, despite the increasing number of apps written using gtk4-rs.

For that reason, Julian Hofer has been writing a "GUI development with Rust and GTK 4" book that you can find at https://gtk-rs.org/gtk4-rs/stable/latest/book/.

On top of that, we spend a good amount of time ensuring the documentation we provide, which is based on the C library documentation, has valid intra-links, uses the images provided by GTK C documentation to represent the various widgets, and that most of the types and functions are properly documented.

The gtk4-rs repository also includes ~35 examples that you can find in https://github.com/gtk-rs/gtk4-rs/tree/master/examples. Finally there's a GTK & Rust application repository template that you can use to get started with your next project: https://gitlab.gnome.org/World/Rust/gtk-rust-template.

Flatpak & Rust SDKs

Beyond the building blocks provided by the bindings, we have also worked on providing stable and nightly Rust SDKs that can be used to either distribute your application as a Flatpak or as a development environment. The stable SDK comes with the Mold linker pre-installed as well, which is recommended for improving build times.

Flatpak can be used as a development environment with either GNOME Builder, VSCode with the flatpak-vscode extension, or fenv, if you prefer the CLI for building/running your application.

Portals

Modern Linux applications should make use of the portals to request runtime permissions to access resources, such as capturing the camera feed, starting a screencast session or picking a file.

ASHPD is currently the way to go for using portals from Rust as it provides a convenient and idomatic API on top of the DBus one.

Code example from ashpd-0.4.0-alpha.1 for using the color picker in a desktop environment agnostic way:

use ashpd::desktop::screenshot::ColorResponse;

async fn run() -> ashpd::Result<()> {
    let color = ColorResponse::builder().build().await?;
    println!("({}, {}, {})", color.red(), color.green(), color.blue());
    Ok(())
}

Keyring & oo7

Applications that need to store sensitive information, such as passwords, usually use the Secret service, a protocol supported by both gnome-keyring and kwallet nowadays. There were multiple attempts to provide a Rust wrapper for the DBus API but some common pitfalls were that they lacked async support, provided no integration with the secret portal, or they had no way to allow applications to migrate their secrets from the host keyring to the application sandboxed keyring.

Those were the primary reasons we started working on oo7. It is still in alpha stage, but should cover most of the use cases already.

GStreamer

Part of the experiments I did when porting Authenticator to GTK 4 was figuring out how I can replicate the internal GStreamer sink included in GTK to convert a video frame into a gdk::MemoryTexture in order to render it in some widget for QR code scanning purposes.

Jordan Petridis took over my WIP work and turned it into a proper GStreamer plugin written in Rust, see https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/tree/main/video/gtk4 for an example on how to use it in your application.

Missing Integrations

Better Gettext Support

Currently, gettext doesn't support Rust officially, and most importantly, it doesn't like the ! character, that is used by declarative macros in Rust, such as println! and format!, so we can't use string formatting for translatable texts.

Kévin Commaille has submitted a patch for upstream gettext but sadly it hasn't been reviewed yet :( For now people are working around this by manually replacing variable names with https://doc.rust-lang.org/std/primitive.str.html#method.replace, which is not ideal, but it is what we have for now.

Reduced Boilerplate in Subclassing Code

As mentioned above, subclassing code is still too verbose in some cases. Ideally, we would simplify most of it, since it is probably one of the most confusing things you have to deal with as a beginner to the gtk-rs ecosystem.

Even Better Documentation

I personally think our documentation has gotten a lot better in the last couple of releases but there are always things to improve. Here is my wishlist of things that I hope to find the time to work on for the next release:

Feel free to pick any of the above issues if you would like to help.

Automatically-Generated Subclassing Traits

Presently, we have to manually write all the necessary traits needed for making it possible to subclass a type or implement an interface.

Sadly, we can't do much about it today as we would need new gobject-introspection annotations, see e.g. https://gitlab.gnome.org/GNOME/gobject-introspection/-/issues/411.

Fix GNOME CI Template to Avoid Duplicated Builds

Most of the GNOME applications hosted on https://gitlab.gnome.org are using https://gitlab.gnome.org/GNOME/citemplates/ as a CI template to provide a Flatpak job. The template is inefficient when building a Rust application as it removes the Flatpak repository between a regular build and a test build, which means rebuilding all the crates a second time, see https://gitlab.gnome.org/GNOME/citemplates/-/issues/5. flatpak-builder itself already provides an easy way to run the tests of a specific module, which is what flatpak-github-actions uses, but I am yet to find the time and apply the same thing to GNOME's CI template.

Special Thanks

  • Julian Hofer for the gtk4-rs book
  • Christopher Davis, Jason Francis, Paolo Borelli for their work on composite templates macros
  • Sophie Herold for her awesome work implementing the portal backend of oo7
  • Zeeshan for zbus/zvariant which unlocked plenty of use cases
  • wayland-rs developers for making it possible to integrate Wayland clients with ASHPD
  • Every other contributor to the gtk-rs ecosystem
  • Ivan Molodetskikh, Sebastian Dröge and Christopher Davis for reviewing this post

October 22, 2022

Making Visual Studio compilers directly runnable from any shell (yes, even plain cmd.exe)

The Visual Studio compiler toolchain behaves in peculiar ways One of the weirdest is that you can't run the compiler from any shell. Instead you have to run the compiler either from a special, blessed shell that comes with VS (the most common are "x86 native tools shell" and "x64 native tools shell", there are also ARM shells as well as cross compilation shells) or by running a special bat file inside a pristine shell that sets things up. A commonly held misbelief is that using the VS compiler only requires setting PATH correctly. That is not true, it requires a bunch of other stuff as well (I'm not sure if all of that is even documented).

To anyone who has used unixy toolchains, this is maddening. The classic Unix approach is to have compiler binaries with unique names like a hypothetical armhf-linux-gcc-11 from any shell. Sadly this VS setup has been the status quo for decades now and it is unlikely to change. In fact, some times ago I had a discussion with a person from Microsoft where I told them about this problem and the response I got back was, effectively: "I don't understand what the problem is" followed by "just run the compiles from the correct shell".

So why is this a bad state of things then? There are two major issues. The first one is that you have to remember how every one of your build trees has been set up. If you accidentally run a compilation command using the wrong shell, the outcome is very undefined. This is the sort of things that happens all the time because human beings are terrible at remembering the states of complicated systems and specific actions that need to be taken depending on their state (as opposed to computers, which are exceptionally good at those things). The second niggle is that you can't have two different compilers active in the same shell at the same time. So if, for example, you are cross compiling and you need to build and run a tool as part of that compilation (e.g. Protobuf) then you can't do that with the command line VS tools. Dunno if it can be with solution files either.

Rolling up them sleeves

The best possible solution would be for Microsoft to provide compiler binaries that are standalone and parallel runnable with unique names like cl-14-x64.exe. This seems unlikely to happen in the near future so the only remaining option is to create them ourselves. At first this might seem infeasible but the problem breaks neatly down into two pieces:

  • Introspect all changes that the vsenv setup bat file performs on the system.
  • Generate a simple executable that does the same setup and then invokes cl.exe with the same command line arguments as were given to it.

The code that implements this can be found in this repository. Most of it was swiped from Meson VS autoactivator. When you run the script in a VS dev tools shell (it needs access to VS to compile the exe) you get a cl-x64.exe that you can then use from any shell. Here we use it to compile itself for the second time:

Downsides

Process invocation on Windows is not particularly fast and with this approach every compiler invocation becomes two process invocations. I don't know enough about Windows to know whether one could avoid that with dlopen trickery or the like.

For actual use you'd probably need to generate these wrappers for VS linkers too.

You have to regenerate the wrapper binary every time VS updates (at least for major releases, not sure about minor ones).

The end results has not been tested apart from simple tests. It is a PoC after all.

Post Collapse Computing Part 3: Building Resilience

Part 1 of this series looks at the state of the climate crisis, and how we can still get our governments to do something about it. Part 2 considers the collapse scenarios we’re likely to face if we fail in those efforts. In this part we’re looking at concrete things we could work towards to make our software more resilient in those scenarios.

The takeaway from part 2 was that if we fail to mitigate the climate crisis, we’re headed for a world where it’s expensive or impossible to get new hardware, where electrical power is scarce, internet access is not the norm, and cloud services don’t exist anymore or are largely inaccessible due to lack of internet.

What could we do to prepare our software for these risks? In this part of the series I’ll look at some ideas and relevant art for resilient technlogy, and how we could apply this to GNOME.

Local-First

Producing power locally is comparatively doable given the right equipment, but internet access is contingent on lots of infrastructure both locally and across the globe. This is why reducing dependence on connectivity is probably the most important challenge for resilience.

Unfortunately we’ve spent the past few decades making software ever more reliant on having fast internet access, all the time. Many of the apps people spend all day in are unusable without an internet connection. So what would be the opposite of that? Is anyone working in the direction of minimizing reliance on the network?

As it turns out, yes! It’s called “local-first”. The idea is that instead of the primary copy of your data being on a server and local apps acting as clients to it, the client is the primary source of truth. The network is only used optionally for syncing and collaboration, with potential conflicts automatically resolved using CRDTs. This allows for superior UX because you’re not waiting on the network, better privacy because you can end-to-end encrypt everything, and better handling of low-connectivity cases. All of this is of course technically very challenging, and there aren’t many implementations of it in production today, but the field is growing and maturing quickly.

Among the most prominent proponents of the local-first idea are the community around the Ink & Switch research lab and Muse, a sketching/knowledge work app for Apple platforms. However, there’s also prior work in this direction from the GNOME community: There’s Christian Hergert’s Bonsai, the Endless content apps, and it’s actually one of the GNOME Foundation’s newly announced goals to enable more people to build local-first apps.

For more on local-first software, I recommend watching Rob’s GUADEC talk (Recording on Youtube), reading the original paper on local-first software (2019), or listening to this episode of the Metamuse podcast (2021) on the subject.

Other relevant art for local-first technology:

  • automerge, a library for bulding local-first software
  • Fullscreen, a web-based whiteboard app which allows saving to a custom file format that includes history and editing permissions
  • Magic Wormhole, a system to send files directly between computers without any servers
  • Earthstar, a local-first sync system with USB support

USB Fallback

Local-first often assumes it’s possible to sometimes use the network for syncing or transferring data between devices, but what if you never have an internet connection?

It’s possible to use the local network in some instances, but they’re not very reliable in practice. Local networks are often weirdly configured, and things can fail in many ways that are hard to debug (Source: Endless tried it and decided it was not worth the hassle). In contrast USB storage is reliable, flexible, and well-understood by most people, making it a much better fallback.

As a practical example, a photo management app built in this paradigm would

  • Store all photos locally so there’s never any spinners after first setup
  • Allow optionally syncing with other devices and collaborative album management with other people via local network or the internet
  • Automatically reconcile conflicts if something changed on other devices while they were disconnected
  • Allow falling back to USB, i.e. copying some of the albums to a USB drive and then importing them on another device (including all metadata, collaboration permissons, etc.)
Mockup for USB drive support in GNOME Software (2020)

Some concrete things we could work on in the local-first area:

  • Investigate existing local-first libraries, if/how they could be integrated into our stack, or if we’d need to roll our own
  • Prototype local-first sync in some real-world apps
  • Implement USB app installation and updates in GNOME Software (mockups)

Resource Efficiency

While power can be produced locally, it’s likely that in the future it will be far less abundant than today. For example, you may only have power a few hours a day (already a reality in parts of the global south), or only when there’s enough sun or wind at the moment. This makes power efficiency in software incredibly important.

Power Measurement is Hard

Improving power efficiency is not straightforward, since it’s not possible to measure it directly. Measuring the computer’s power consumption as a whole is trivial, but knowing which program caused how much of it is very difficult to pin down (for more on this check out Aditya Manglik’s GUADEC talk (Recording on Youtube) about power profiling tooling). Making progress in this area is important to allow developers to make their software more power-efficient.

However, while better measurements would be great to have, in practice there’s a lot developers can do even without it. Power is in large part a function of CPU, GPU, and memory use, so reducing each of these definitely helps, and we do have mature profiling tools for these.

Choose a Low-Power Stack

Different tech stacks and dependencies are not created equal when it comes to power consumption, so this is a factor to take into account when starting new projects. One area where there are actual comparative studies on this is programming languages: For example, according to this paper Python uses way more power than other languages commonly used for GNOME app development.

Relative energy use of different programming languages (Source: Pereira et al.)

Another important choice is user interface toolkit. Nowadays many applications just ship their own copy of Chrome (in the form of Electron) to render a web app, resulting in huge downloads, slow startup times, large CPU and memory footprints, and laggy interfaces. Using native toolkits instead of web technologies is a key aspect of making resilient software, and GTK4/Adwaita is actually in a really good position here given its performance, wide language support, modern feature set and widgets, and community-driven development model.

Schedule Power Use

It’s also important to actively consider the temporal aspect of power use. For example, if your power supply is a solar panel, the best time to charge batteries or do computing-intensive tasks is during the day, when there’s the most sunlight.

If we had a way for the system to tell apps that right now is a good/bad time to use a lot of power, they could adjust their behavior accordingly. We already do something similar for metered connections, e.g. Software doesn’t auto-download updates if your connection is metered. I could also imagine new user-facing features in this direction, e.g. a way to manually schedule certain tasks for when there will be more power so you can tell Builder to start compiling the long list of dependencies for a newly cloned Rust project tomorrow morning when the sun is back out.

Some concrete things we could work on in the area of resource efficiency:

  • Improve power efficiency across the stack
  • Explore a system API to tell apps whether now is a good time to use lots of power or not
  • Improve the developer story for GTK on Windows and macOS, to allow more people to choose it over Electron

Data Resilience

In hedging against loss of connectivity, it’s not enough to have software that works offline. In many cases what’s more important is the data we read/write using that software, and what we can do with it in resource-constrained scenarios.

The File System is Good, Actually

The 2010s saw lots of experimentation with moving away from the file system as the primary way to think about data storage, both within GNOME and across the wider industry. It makes a lot of sense in theory: Organizing everything manually in folders is shit work people don’t want to do, so they end up with messy folder hierarchies and it’s hard to find things. Bespoke content apps for specific kinds of data, with rich search and layouts custom-tailored to the data are definitely a nicer, more human-friendly way to deal with content–in theory.

In practice we’ve seen a number of problems with the content app approach though, including

  • Flexibility: Files can be copied/pasted/deleted, stored on a secondary internal drive, sent as email attachments, shared via a USB key, opened/changed using other apps, and more. With content apps you usually don’t have all of these options.
  • Interoperability: The file system is a lowest common denominator across all OSes and apps.
  • Development Effort: Building custom viewers/editors for every type of content is a ton of work, in part because you have to reimplement all the common operations you get for free in a file manager.
  • Familiarity: While it’s messy and not that easy to learn, most people have a vague understanding of the file system by now, and the universality of this paradigm means it only has to be learned once.
  • Unmaintained Apps: Data living in a specific app’s database is useless if the app goes unmaintained. This is especially problematic in free software, where volunteer maintainers abandoning projects is not uncommon.

Due to the above reasons, we’ve seen in practice that the file system is not in fact dying. It’s actually making its way into places where it previously wasn’t present, including iPhones (which now come with a Files app) and the web (via Nextcloud, Google Drive, and company).

From a resilience point of view some of the shortcomings of content apps listed above are particularly important, such as the flexibility to be moved via USB when there’s no internet, and cross-platform interoperability. This is why I think user-accessible files should be the primary source of truth for user data in apps going forward.

Simple, Standardized Formats

With limited connectivity, a potential risk is that you don’t have the ability to download new software to open a file you’re encountering. This is why sticking to well-known standard formats that any computer is likely to have a viewer/editor for is generally preferable (plain text, standard image formats, PDF, and so on).

When starting a new app, ask yourself, is a whole new format needed or could it use/extend something pre-existing? Perhaps there’s a format you could use that already has an ecosystem of apps that support it, especially on other platforms?

For example, if you were to start a new notes app that can do inline media you could go with a custom binary format and a database, but you could also go with Markdown files in a user-accessible folder. In order to get inline media you could use Textbundle, an extension to Markdown implemented by a number of other Markdown apps on other platforms, which basically packs the contained media into an archive together with the Markdown file.

Side note: I really want a nice GTK app that supports Textbundle (more specifically, its compressed variant Textpack), if you want to make one I’d be deligthed to help on the design side :)

Export as Fallback

Ideally data should be stored in standardized formats with wide support, and human-readable in a text editor as a fallback (if applicable). However, this isn’t possible in every case, for example if an app produces a novel kind of content there are no standardized formats for yet (e.g. a collaborative whiteboard app). In these cases it’s important to make sure the non-standard format is well-documented for people implementing alternative clients, and has support for exporting to more common formats, e.g. exporting the current state of a collaborative whiteboard as PDF or SVG.

Some concrete things we could work on towards better data resilience:

  • Explore new ways to do content apps with the file system as a backend
  • Look at where we’re using custom formats in our apps, and consider switching to standard ones
  • Consider how this fits in with local-first syncing

Keep Old Hardware Running

There are many reasons why old hardware stops being usable, including software built for newer, faster devices becoming too slow on older ones, vendors no longer providing updates for a device, some components (especially batteries) degrading with use over time, and of course planned obsolescence. Some of these factors are purely hardware-related, but some also only depend on software, so we can influence them.

Use old Hardware for Development

I already touched on this in the dedicated section above, but obviously using less CPU, RAM, etc. helps not only with power use, but also allows the software to run on older hardware for longer. Unfortunately most developers use top of the line hardware, so they are least impacted by inefficiencies in their personal use.

One simple way to ensure you keep an eye on performance and resource use: Don’t use the latest, most powerful hardware. Maybe keep your old laptop for a few years longer, and get it repaired instead of buying a new one when something breaks. Or if you’re really hardcore, buy an older device on purpose to use as your main machine. As we all know, the best way to get developers to care about something is to actually dogfood it :)

Hardware Enablement for Common Devices

In a world where it’s difficult to get new hardware, it’ll become increasingly important to reuse existing devices we have lying around. Unfortunately, a lot of this hardware is stuck on very old versions of proprietary software that are both slow and insecure.

With Windows devices there’s an easy solution: Just install an up-to-date free software OS. But while desktop hardware is fairly well-supported by mainline Linux, mobile is a huge mess in this regard. The Android world almost exclusively uses old kernels with lots of non-upstreamable custom patches. It takes years to mainline a device, and it has to be done for every device.

Projects like PostmarketOS are working towards making more Android devices usable, but as you can see from their device support Wiki, success is limited so far. One especially problematic aspect from a resilience point of view is that the devices that tend to be worked on are the ones that developers happen to have, which are generally not the models that sell the most units. Ideally we’d work strategically to mainline some of the most common devices, and make sure they actually fully work. Most likely that’d be mid-range Samsung phones and iPhones. For the latter there’s curiously little work in this direction, despite being a gigantic, relatively homogeneous pool of devices (for example, there are 224 million iPhone 6 out there which don’t get updates anymore).

Hack Bootloaders

Unfortunately, hardware enablement alone is not enough to make old mobile devices more long-lived by installing more up-to date free software. Most mobile devices come with locked bootloaders, which require contacting the manufacturer to get an unlock code to install alternative software – if they allow it at all. This means if the vendor company’s server goes away or you don’t have internet access there’s no way to repurpose a device.

What we’d probably need is a collection of exploits that allow unlocking bootloaders on common devices in a fully offline way, and a user-friendly automated unlocking tool using these exploits. I could imagine this being part of the system’s disk utility app or a separate third-party app, which allows unlocking the bootloader and installing a new OS onto a mobile device you plug in via USB.

Some concrete things we could work on to keep old hardware running:

  • Actively try to ensure older hardware keeps working with new versions of our software (and ideally getting faster with time rather than slower thanks to ongoing performance work)
  • Explore initiatives to do strategic hardware eneblament for some of the most common mobile devices (including iPhones, potentially?)
  • Forge alliances with the infosec/Android modding community and build convenient offline bootloader unlocking tools

Build for Repair

In a less connected future it’s possible that substantial development of complex systems software will stop being a thing, because the necessary expertise will not be available in any single place. In such a scenario being able to locally repair and repurpose hardware and software for new uses and local needs is likely to become important.

Repair is a relatively clearly defined problem space for hardware, but for software it’s kind of a foreign concept. The idea of a centralized development team “releasing” software out into the world at scale is built into our tools, technologies, and culture at every level. You generally don’t repair software, because in most cases you don’t even have the source code, and even if you do  (and the software doesn’t depend on some server component) there’s always going to be a steep learning curve to being able to make meaningful changes to an unfamiliar code base, even for seasoned programmers.

In a connected world it will therefore always be most efficient to have a centralized development team that maintains a project and makes releases for the general public to use. But with that possibly no longer an option in the future, someone else will end up having to make sure things work as best they can at the local level. I don’t think this will mean most people will start making changes to their own software, but I could see software repair becoming a role for specialized technicians, similar to electricians or car mechanics.

How could we build our software in a way that makes it most useful to people in such a future?

Use Well-Understood, Accessible Tech

One of the most important things we can do today to make life easier for potential future software repair technicians is using well-established technology, which they’re likely to already have experience with. Writing apps in Haskell may be a fun exercise, but if you want other people to be able to repair/repurpose them in the future, GJS is probably a better option, simply because so many more people are familiar with the language.

Another important factor determining a technology stack’s repairability is how accessible it is to get started with. How easy is it for someone to get a development environment up and running from scratch? Is there good (offline) documentation? Do you need to understand complex math or memory management concepts?

Local-First Development

Most modern development workflows assume a fast internet connection on a number of levels, including downloading and updating dependencies (e.g. npm modules or flatpak SDKs), documentation, tutorials, Stackoverflow, and so on.

In order to allow repair at the local level, we also need to rethink development workflows in a local-first fashion, meaning things like:

  • Ship all the source code and development tools needed to rebuild/modify the OS and apps with the system
  • Have a first-class flow for replacing parts of the system or apps with locally modified/repaired versions, allowing easy management of different versions, rollbacks, etc.
  • Have great offline documentation and tutorials, and maybe even something like a locally cached subset of Stackoverflow for a few technologies (e.g. the 1000 most popular questions with the “gtk” tag)

Getting the tooling and UX right for a fully integrated local-first software repair flow will be a lot of work, but there’s some interesting relevant art from Endless OS from a few years back. The basic idea was that you transform any app you’re running into an IDE editing the app’s source code (thanks to Will Thompson for the screencast below). The devil is of course in the details for making this a viable solution to local software repair, but I think this would be a very interesting direction to explore further.

Some concrete things we could work on to make our software more repairable:

  • Avoid using obscure languages and technologies for new projects
  • Avoid overly complex and brittle dependency trees
  • Investigate UX for a local-first software repair flow
  • Revive or replace the Devhelp offline documentation app
  • Look into ways to make useful online resources (tutorials, technical blog posts, Stackoverflow threads, etc.) usable offline

This was part three of a four-part series. In the fourth and final installment we’ll wrap up the series by looking at some of the hurdles in moving towards resilience and how we could overcome them.

the sticky mark-bit algorithm

Good day, hackfolk!

The Sticky Mark-Bit Algorithm

Also an intro to mark-sweep GC

7 Oct 2022 – Igalia

Andy Wingo

A funny post today; I gave an internal presentation at work recently describing the so-called "sticky mark bit" algorithm. I figured I might as well post it here, as a gift to you from your local garbage human.

Automatic Memory Management

“Don’t free, the system will do it for you”

Eliminate a class of bugs: use-after-free

Relative to bare malloc/free, qualitative performance improvements

  • cheap bump-pointer allocation
  • cheap reclamation/recycling
  • better locality

Continuum: bmalloc / tcmalloc grow towards GC

Before diving in though, we start with some broad context about automatic memory management. The term mostly means "garbage collection" these days, but really it describes a component of a system that provides fresh memory for new objects and automatically reclaims memory for objects that won't be needed in the program's future. This stands in contrast to manual memory management, which relies on the programmer to free their objects.

Of course, automatic memory management ensures some valuable system-wide properties, like lack of use-after-free vulnerabilities. But also by enlarging the scope of the memory management system to include full object lifetimes, we gain some potential speed benefits, for example eliminating any cost for free, in the case of e.g. a semi-space collector.

Automatic Memory Management

Two strategies to determine live object graph

  • Reference counting
  • Tracing

What to do if you trace

  • Mark, and then sweep or compact
  • Evacuate

Tracing O(n) in live object count

I should mention that reference counting is a form of automatic memory management. It's not enough on its own; unreachable cycles in the object reference graph have to be detected either by a heap tracer or broken by weak references.

It used to be that we GC nerds made fun of reference counting as being an expensive, half-assed solution that didn't work very well, but there have been some fundamental advances in the state of the art in the last 10 years or so.

But this talk is more about the other kind of memory management, which involves periodically tracing the graph of objects in the heap. Generally speaking, as you trace you can do one of two things: mark the object, simply setting a bit indicating that an object is live, or evacuate the object to some other location. If you mark, you may choose to then compact by sliding all objects down to lower addresses, squeezing out any holes, or you might sweep all holes into a free list for use by further allocations.

Mark-sweep GC (1/3)

freelist := []

allocate():
  if freelist is empty: collect()
  return freelist.pop()

collect():
  mark()
  sweep()
  if freelist is empty: abort

Concretely, let's look closer at mark-sweep. Let's assume for the moment that all objects are the same size. Allocation pops fresh objects off a freelist, and collects if there is none. Collection does a mark and then a sweep, aborting if sweeping yielded no free objects.

Mark-sweep GC (2/3)

mark():
  worklist := []
  for ref in get_roots():
    if mark_one(ref):
      worklist.add(ref)
  while worklist is not empty:
    for ref in trace(worklist.pop()):
      if mark_one(ref):
        worklist.add(ref)

sweep():
  for ref in heap:
    if marked(ref):
      unmark_one(ref)
    else
      freelist.add(ref)

Going a bit deeper, here we have some basic implementations of mark and sweep. Marking starts with the roots: edges from outside the automatically-managed heap indicating a set of initial live objects. You might get these by maintaining a stack of objects that are currently in use. Then it traces references from these roots to other objects, until there are no more references to trace. It will visit each live object exactly once, and so is O(n) in the number of live objects.

Sweeping requires the ability to iterate the heap. With the precondition here that collect is only ever called with an empty freelist, it will clear the mark bit from each live object it sees, and otherwise add newly-freed objects to the global freelist. Sweep is O(n) in total heap size, but some optimizations can amortize this cost.

Mark-sweep GC (3/3)

marked := 1

get_tag(ref):
  return *(uintptr_t*)ref
set_tag(ref, tag):
  *(uintptr_t*)ref = tag

marked(ref):
  return (get_tag(ref) & 1) == marked
mark_one(ref):
  if marked(ref): return false;
  set_tag(ref, (get_tag(ref) & ~1) | marked)
  return true
unmark_one(ref):
  set_tag(ref, (get_tag(ref) ^ 1))

Finally, some details on how you might represent a mark bit. If a ref is a pointer, we could store the mark bit in the first word of the objects, as we do here. You can choose instead to store them in a side table, but it doesn't matter for today's example.

Observations

Freelist implementation crucial to allocation speed

Non-contiguous allocation suboptimal for locality

World is stopped during collect(): “GC pause”

mark O(n) in live data, sweep O(n) in total heap size

Touches a lot of memory

The salient point is that these O(n) operations happen when the world is stopped. This can be noticeable, even taking seconds for the largest heap sizes. It sure would be nice to have the benefits of GC, but with lower pause times.

Optimization: rotate mark bit

flip():
  marked ^= 1

collect():
  flip()
  mark()
  sweep()
  if freelist is empty: abort

unmark_one(ref):
  pass

Avoid touching mark bits for live data

Incidentally, before moving on, I should mention an optimization to mark bit representation: instead of clearing the mark bit for live objects during the sweep phase, we could just choose to flip our interpretation of what the mark bit means. This allows unmark_one to become a no-op.

Reducing pause time

Parallel tracing: parallelize mark. Clear improvement, but speedup depends on object graph shape (e.g. linked lists).

Concurrent tracing: mark while your program is running. Tricky, and not always a win (“Retrofitting Parallelism onto OCaml”, ICFP 2020).

Partial tracing: mark only a subgraph. Divide space into regions, record inter-region links, collect one region only. Overhead to keep track of inter-region edges.

Now, let's revisit the pause time question. What can we do about it? In general there are three strategies.

Generational GC

Partial tracing

Two spaces: nursery and oldgen

Allocations in nursery (usually)

Objects can be promoted/tenured from nursery to oldgen

Minor GC: just trace the nursery

Major GC: trace nursery and oldgen

“Objects tend to die young”

Overhead of old-to-new edges offset by less amortized time spent tracing

Today's talk is about partial tracing. The basic idea is that instead of tracing the whole graph, just trace a part of it, ideally a small part.

A simple and effective strategy for partitioning a heap into subgraphs is generational garbage collection. The idea is that objects tend to die young, and that therefore it can be profitable to focus attention on collecting objects that were allocated more recently. You therefore partition the heap graph into two parts, young and old, and you generally try to trace just the young generation.

The difficulty with partitioning the heap graph is that you need to maintain a set of inter-partition edges, and you do so by imposing overhead on the user program. But a generational partition minimizes this cost because you never do an only-old-generation collection, so you don't need to remember new-to-old edges, and mutations of old objects are less common than new.

Generational GC

Usual implementation: semispace nursery and mark-compact oldgen

Tenuring via evacuation from nursery to oldgen

Excellent locality in nursery

Very cheap allocation (bump-pointer)

But... evacuation requires all incoming edges to an object to be updated to new location

Requires precise enumeration of all edges

Usually the generational partition is reflected in the address space: there is a nursery and it is in these pages and an oldgen in these other pages, and never the twain shall meet. To tenure an object is to actually move it from the nursery to the old generation. But moving objects requires that the collector be able to enumerate all incoming edges to that object, and then to have the collector update them, which can be a bit of a hassle.

JavaScriptCore

No precise stack roots, neither in generated nor C++ code

Compare to V8’s Handle<> in C++, stack maps in generated code

Stack roots conservative: integers that happen to hold addresses of objects treated as object graph edges

(Cheaper implementation strategy, can eliminate some bugs)

Specifically in JavaScriptCore, the JavaScript engine of WebKit and the Safari browser, we have a problem. JavaScriptCore uses a technique known as "conservative root-finding": it just iterates over the words in a thread's stack to see if any of those words might reference an object on the heap. If they do, JSC conservatively assumes that it is indeed a reference, and keeps that object live.

Of course a given word on the stack could just be an integer which happens to be an object's address. In that case we would hold on to too much data, but that's not so terrible.

Conservative root-finding is again one of those things that GC nerds like to make fun of, but the pendulum seems to be swinging back its way; perhaps another article on that some other day.

JavaScriptCore

Automatic memory management eliminates use-after-free...

...except when combined with manual memory management

Prevent type confusion due to reuse of memory for object of different shape

addrof/fakeobj primitives: phrack.org/issues/70/3.html

Type-segregated heaps

No evacuation: no generational GC?

The other thing about JSC is that it is constantly under attack by malicious web sites, and that any bug in it is a step towards hackers taking over your phone. Besides bugs inside JSC, there are bugs also in the objects exposed to JavaScript from the web UI. Although use-after-free bugs are impossible with a fully traceable object graph, references to and from DOM objects might not be traceable by the collector, instead referencing GC-managed objects by reference counting or weak references or even manual memory management. Bugs in these interfaces are a source of exploitable vulnerabilities.

In brief, there seems to be a decent case for trying to mitigate use-after-free bugs. Beyond the nuclear option of not freeing, one step we could take would be to avoid re-using memory between objects of different shapes. So you have a heap for objects with 3 fields, another objects with 4 fields, and so on.

But it would seem that this mitigation is at least somewhat incompatible with the usual strategy of generational collection, where we use a semi-space nursery. The nursery memory gets re-used all the time for all kinds of objects. So does that rule out generational collection?

Sticky mark bit algorithm

collect(is_major=false):
  if is_major: flip()
  mark(is_major)
  sweep()
  if freelist is empty:
    if is_major: abort
    collect(true)

mark(is_major):
  worklist := []
  if not is_major:
    worklist += remembered_set
    remembered_set := []
  ...

Turns out, you can generationally partition a mark-sweep heap.

Recall that to visit each live object, you trace the heap, setting mark bits. To visit them all again, you have to clear the mark bit between traces. Our first collect implementation did so in sweep, via unmark_one; then with the optimization we switched to clear them all before the next trace in flip().

Here, then, the trick is that you just don't clear the mark bit between traces for a minor collection (tracing just the nursery). In that way all objects that were live at the previous collection are considered the old generation. Marking an object is tenuring, in-place.

There are just two tiny modifications to mark-sweep to implement sticky mark bit collection: one, flip the mark bit only on major collections; and two, include a remembered set in the roots for minor collections.

Sticky mark bit algorithm

Mark bit from previous trace “sticky”: avoid flip for minor collections

Consequence: old objects not traced, as they are already marked

Old-to-young edges: the “remembered set”

Write barrier

write_field(object, offset, value):
  remember(object)
  object[offset] = value

The remembered set is maintained by instrumenting each write that the program makes with a little call out to code from the garbage collector. This code is the write barrier, and here we use it to add to the set of objects that might reference new objects. There are many ways to implement this write barrier but that's a topic for another day.

JavaScriptCore

Parallel GC: Multiple collector threads

Concurrent GC: mark runs while JS program running; “riptide”; interaction with write barriers

Generational GC: in-place, non-moving GC generational via sticky mark bit algorithm

Alan Demers, “Combining generational and conservative garbage collection: framework and implementations”, POPL ’90

So returning to JavaScriptCore and the general techniques for reducing pause times, I can summarize to note that it does them all. It traces both in parallel and concurrently, and it tries to trace just newly-allocated objects using the sticky mark bit algorithm.

Conclusions

A little-used algorithm

Motivation for JSC: conservative roots

Original motivation: conservative roots; write barrier enforced by OS-level page protections

Revived in “Sticky Immix”

Better than nothing, not quite as good as semi-space nursery

I find that people that are interested in generational GC go straight for the semispace nursery. There are some advantages to that approach: allocation is generally cheaper in a semispace than in a mark space, locality among new objects is better, locality after tenuring is better, and you have better access locality during a nursery collection.

But if for some reason you find yourself unable to enumerate all roots, you can still take advantage of generational collection via the sticky mark-bit algorithm. It's a simple change that improves performance, as long as you are able to insert write barriers on all heap object mutations.

The challenge with a sticky-mark-bit approach to generations is avoiding the O(n) sweep phase. There are a few strategies, but more on that another day perhaps.

And with that, presentation done. Until next time, happy hacking!

Fall of releases

A couple of Rust crate releases.

Now that the new glib-rs (and gtk-rs) are out, it's time for an update of gudev Rust bindings using the newer version of glib-rs.

At the same time I updated midi-control to use a newer version of midir to iron out a cargo audit warning.

Available from crates.io.

October 21, 2022

the death of the meme

Coming Soon ™
Buy Me a Coffee at ko-fi.com

#66 Foundation Updates

Update on what happened across the GNOME project in the week from October 14 to October 21.

GNOME Foundation

Thib announces

Some (good) news from the Foundation!

🇧🇪 Foundation staff is busy with GNOME Asia and prep work for LAS. We’ve also asked for a stand at FOSDEM, and hope to see you in Brussels in February

🧑‍💼 Synchronising everyone in the ED Search Committee has been a little difficult over the summer. After a hiatus, we’re making steady progress again. We’re coming after you, next ED!

🧹 The sysadmin team has also been busy getting rid of legacy services to reduce their maintenance load, focus on the essential ones, and give Flathub some love. Our mailing-lists won’t accept nor distribute new mail and most of them will be moved to Discourse. The archives will remain online. All of the IRC bots have been decommissioned and are about to be replaced by hookshot, the multi-purpose Matrix bot that speaks GitLab. More on “getting rid of legacy services” soon.

📈 Last, but definitely not least: you might remember the three initiatives we told you about a while ago? Newcomers, Local-First Apps, and Flathub Payments. There’s one making outstanding progress: Flathub Payments. We’re working on raising funds through some grant applications to cover the staff and operating costs, and have lawyers working on the compliance, corporate and governance matters we need in the background to support payments and donations in Flathub.

💻️ On the more technical side of things, there are only a couple of tasks remaining for correctly generating invoices in Stripe before Codethink’s final work phase is complete. We’re going to start building a roadmap to launch the new features over the coming months. You can follow the progress here and here.

🤝 A special thank you to our president Robert McQueen in particular for sinking countless hours in orchestrating the Flathub Payments project (including hilarious paperwork and exhilarating legal stuff) and huge kudos to Codethink for being an amazing partner to work with. They have definitely been going above and beyond to support Flathub. We cannot stress enough how great they have been, thank you Codethink!

Core Apps and Libraries

GLib

The low-level core library that forms the basis for projects such as GTK and GNOME.

Philip Withnall says

Emmanuel Fleury has continued his campaign of tackling the oldest GLib bugs, by adding support for optimised g_str_has_prefix() and g_str_has_suffix() checks when passed static strings — the request for this feature was 18 years old (https://gitlab.gnome.org/GNOME/glib/-/issues/24)

Philip Withnall reports

Thomas Haller has dug into a race condition with EINTR handling and close() in g_spawn_*() and has fixed it and documented his findings (https://gitlab.gnome.org/GNOME/glib/-/merge_requests/2947)

Web

Web browser for the GNOME desktop.

Alexander Mikhaylenko says

The GTK4 port of Epiphany has finally landed

Circle Apps and Libraries

Identity

Compare images and videos.

Ivan Molodetskikh announces

I’ve published Identity 0.4.0! It’s got a new Media Properties dialog showing some information about the current video. Opening files with drag-and-drop or copy-paste on Flatpak now also works thanks to the update to the GNOME 43 platform.

gtk-rs

Safe bindings to the Rust language for fundamental libraries from the GNOME stack.

Bilal Elmoussaoui reports

A new gtk-rs release is out with plenty of improvements and bug fixes. Details can be read at https://gtk-rs.org/blog/2022/10/18/new-release.html

Third Party Projects

sonnyp announces

I published Retro, a toy digital segment clock that can be customized with CSS.

Tube Converter

An easy-to-use video downloader (yt-dlp frontend).

Nick announces

Tube Converter is now at V2022.10.3 and has seen many improvements and new features this week. Here’s some of the changes:

  • Added a preference to embed metadata in a download
  • Added the ability to download subtitles for a video
  • Implemented proper stop function for download
  • ‘New Filename’ is now allowed to be empty. If it is empty, the video title will be used
  • Improved video url checking

Tagger

An easy-to-use music tag (metadata) editor.

Nick reports

Tagger is now at V2022.10.4 and has seen many improvements and new user-requested features this week. Here are some of the changes:

  • Added an Advanced Search function to search through contents of files' tags to find properties that are empty or contain a certain value . Type ! in the search box to activate it and learn more
  • Added ‘Discard Unapplied Changes’ action
  • Tagger now remembers user-filled tag properties waiting to be applied
  • Fixed ogg file handling
  • Improved closing and reloading dialogs

Hebbot

Hebbot is the bot behind TWIG that manages all the news.

Felix announces

Thib has made Hebbot (aka TWIG-Bot ) a little smarter. Hebbot now scans your messages for keywords, and can automatically match them to appropriate projects. A small, but nice improvement, which reduces the administration effort of TWIG again noticeably.

In concrete terms, this means that you no longer have to define the usual_reporters field in in the bot configuration.

Flatseal

A graphical utility to review and modify permissions of Flatpak applications.

Martín Abente Lahaye says

Flatseal 1.8.1 is out! It brings support for the new --socket=gpg-agent permission, Tamil and Hebrew translations, use of different colors for override status icons, updated Flatpak icon, a few important bug fixes and more.

Get it on Flathub!

Eyedropper

A powerful color picker and formatter.

FineFindus says

Version 0.4 has of Eyedropper has been released. In addition to the features of the last few weeks, it is now possible to search by the color name. To view the full changelog, visit the release page or download the newest version from Flathub.

Kerberos Authentication

An application to acquire and list Kerberos tickets.

Guido says

I’ve ported krb5-auth-dialog to GTK4 and libadwaita making it usable on mobile phones too. While at that I fixed the PKINIT support with smart cards when using Heimdal Kerberos.

GNOME Shell Extensions

glerro says

Hi i have created a new Gnome Shell extension that add a switch to the WiFi menu, in the GNOME system menu, that show a QrCode of the active connection. This can be useful for quickly connecting devices capable of reading QrCode and applying the settings to the system, without having to type in the name and the password of the WiFi. (e.g. Android Smartphone).

Miscellaneous

Sophie says

Apps for GNOME received a bunch of changes in the background. These changes will not only help with the maintenance of the code base but will also allow sharing a joint base with other projects currently in development. If everything went well, this should not have changed anything in the website’s appearance.

That’s all for this week!

See you next week, and be sure to stop by #thisweek:gnome.org with updates on your own projects!

October 18, 2022

Status update 18/10/2022

The most important news this week is that my musical collaborator Vladimir Chicken just released a new song about Manchester’s most famous elephant. Released with a weird B-side about a “Baboon on the Moon”, I am not sure what he was thinking with that one.

I posted on discourse.gnome.org already about GNOME OpenQA testing, now that the tests are up to date I’m aiming to keep an eye on them for a full release cycle and see how much ongoing maintenance effort they need. Hopefully at next year’s GUADEC we’ll be able to talk about moving this beyond an “alpha” service. We’ll soon have something like GNOME Continuous back in action after “only” 6 years of downtime.

Other exciting things in this area: Abderrahim Kitouni and Jordan Petridis have updated gnome-build-meta to track exact refs in its Git history; there are some details to work out so that it still provides quick CI feedback but this was basically necessary to ensure build reproducibility. And Tristan Van Berkom already blogged about research to use Recc inside BuildStream, with the eventual goal of unlocking fast incremental builds within the reproducibility guarantees that BuildStream already provides.

There is no direct link between these projects but I think we share the common vision that Colin Walters already laid out 10 years ago when describing Continuous: GNOME contributors need to be able to develop and test system-level changes involving GNOME, using a reliable & documented process with modest hardware requirements. Many issues and bug reports go beyond a single component, and in many cases right down to the kernel. As an example, when a background indexing task causes lagging in the desktop shell, folk blame the background indexer process, but the indexer is not in control of its own scheduling and such an issue can’t be fully reproduced if we don’t control exactly which kernel is running. Hopefully when these streams of work come to fruition, these kinds of bugs will finally become “shallow”.

Outside of volunteer efforts, I’ve been working on a new client project that is essentially a complex database migration. I don’t get to do much database work at Codethink, its nice to have absolutely no legacy Makefiles to deal with for once, and its been a good opportunity to try out Nushell in a bit more depth. My research so far is mostly setting up Python scripts to run database queries and output CSV, then using Nushell to filter and sort the output. When I tried Nushell a few years ago it still lacked some important features – it didn’t even have a way to set variables at that point – now it’s prepared for anything you can throw at it and I look forward to doing more data processing with it.

I’m not yet ready to switch completely from Fish to Nushell, but … who knows? Maybe it’s coming.

October 17, 2022

Ubuntu bug fix anniversary

I first installed Ubuntu when Ubuntu 6.06 LTS “Dapper Drake” was released. I was brand new to Linux. This was Ubuntu’s first LTS release; the very first release of Ubuntu was only a year and a half before. I was impressed by how usable and useful the system was. It soon became my primary home operating system and I wanted to help make it better.

On October 15, 2009, I was helping test the release candidates ISOs for the Ubuntu 9.10 release. Specifically, I tested Edubuntu. Edubuntu has since been discontinued but at the time it was an official Ubuntu flavor preloaded with lots of education apps. One of those education apps was Moodle, an e-learning platform.

When testing Moodle, I found that a default installation would make Moodle impossible to use locally. I figured out how to fix this issue. This was really exciting: I finally found an Ubuntu bug I knew how to fix. I filed the bug report.

This was very late in the Ubuntu 9.10 release process and Ubuntu was in the Final Freeze state. In Final Freeze, every upload to packages included in the default install need to be individually approved by a member of the Ubuntu Release Team. Also, I didn’t have upload rights to Ubuntu. Jordan Mantha (LaserJock), an Edubuntu maintainer, sponsored my bug fix upload.

I also forwarded my patch to Debian.

While trying to figure out what wasn’t working with Moodle, I stumbled across a packaging bug. Edubuntu provided a choice of MySQL or PostgreSQL for the system default database. MySQL was the default, but if PostgreSQL were chosen instead, Moodle wouldn’t work. I figured out how to fix this bug too a week later. Jordan sponsored this upload and Steve Langasek from the Release Team approved it so it also was able to be fixed before 9.10 was released.

Although the first bug was new to 9.10 because of a behavior change in a low-level dependency, this PostgreSQL bug existed in stable Ubuntu releases. Therefore, I prepared Stable Release Updates for Ubuntu 9.04 and Ubuntu 8.04 LTS.

Afterwards

Six months later, I was able to attend my first Ubuntu Developer Summit. I was living in Bahrain (in the Middle East) at the time and a trip to Belgium seemed easier to me than if I were living in the United States where I usually live. This was the Ubuntu Developer Summit where planning for Ubuntu 10.10 took place. I like to believe that I helped with the naming since I added Maverick to the wiki page where people contribute suggestions.

I did not apply for financial sponsorship to attend and I stayed in a budget hotel on the other side of Brussels. The event venue was on the outskirts of Brussels so there wasn’t a direct bus or metro line to get there. I rented a car. I didn’t yet have a smartphone and I had a LOT of trouble navigating to and from the site every day. I learned then that it’s best to stay close to the conference site since a lot of the event is actually in the unstructured time in the evenings. Fortunately, I managed to arrive in time for Mark Shuttleworth’s keynote where the Unity desktop was first announced. This was released in Ubuntu 10.10 in the Ubuntu Netbook Remix and became the default for Ubuntu Desktop in Ubuntu 11.04.

Ubuntu’s switch to Unity provided me with a huge opportunity. In April 2011, GNOME 3.0 was released. I wanted to try it but it wasn’t yet packaged in Ubuntu or Debian. It was suggested that I could help work on packaging the major new version in a PPA. The PPA was convenient because I was able to get permission to upload there easier than being able to upload directly to Ubuntu. My contributions there then enabled me to get upload rights to the Ubuntu Desktop packages later that year.

At a later Ubuntu Developer Summit, it was suggested that I start an official Ubuntu flavor for GNOME. So along with Tim Lunn (darkxst), I co-founded Ubuntu GNOME. Years later, Canonical stopped actively developing Unity; instead, Ubuntu GNOME was merged into Ubuntu Desktop.

Along the way, I became an Ubuntu Core Developer and a Debian Developer. And in January 2022, I joined Canonical on the Desktop Team. This all still feels amazing to me. It took me a long time to be comfortable calling myself a developer!

Conclusion

My first Ubuntu bugfix was 13 years ago this week. Because Ubuntu historically uses alphabetical adjective animal release names, 13 years means that we have rolled around to the letter K again! Later today, we begin release candidate ISO testing for Ubuntu 22.10 “Kinetic Kudu”.

I encourage you to help us test the release candidates and report bugs that you find. If you figure out how to fix a bug, we still sponsor bug fixes. If you are an Ubuntu contributor, I highly encourage you to attend an Ubuntu Summit if you can. The first Ubuntu Summit in years will be in 3 weeks in Prague, but the intent is for the Ubuntu Summits to be recurring events again.

October 14, 2022

BuildStream at ApacheCon 2022 New Orleans

About ApacheCon

This was my first real conference since worldwide panic spread a few years ago, and it is hard to overstate how exciting it was to be in New Orleans and meet real people again, especially since this was an opportunity to meet the Apache contributor base for the first time.

The conference took place at the Sheraton hotel on Canal street, where we had access to a discounted rate for rooms, a large room where everyone could attend the keynotes with a coffee/snack area outside where booths were also located, and on the 8th floor we had 6 small conference rooms for the various tracks.

The talks I attended were refreshingly outside of my comfort zone, such as big data workflow scheduling with DolphinScheduler, a talk about Apache Toree which is a Jupyter Kernel for Scala / Apache Spark (a framework for data analysis and visualization) and my personal favorite was a talk about the SDAP (Science Data Analytics Platform) which is a platform built to support Earth Science use cases, this talk explored some of the implementation details and use cases of an engine which can be used to search efficiently through Earth related data (collected from satellites and various sensors) which can be freely obtained from NASA. In case you’re wondering, unfortunately actually leveraging this data requires that you download and house the (immense) data which you intend to analyze, either in the elastic cloud or on-premise clusters.

In the evenings, the foundation prepared a speakers reception on Tuesday evening and then a general attendee reception on Wednesday evening, and the Gradle folks also organized a bonus event, this provided a nice balance for socializing with new people with the opportunity to explore the music scene in New Orleans.

The Spotted Cat jazz club on Frenchmen street (highly recommended)

 

 

 

 

 

 

Overall this felt very much like a GUADEC inasmuch as it is a small(ish) tightly knit community, leaving us with much quality time for networking and bonding with new fellow hackers.

BuildStream talk

On Tuesday I gave my BuildStream presentation, which was an introduction to BuildStream, along with a problem statement about working with build systems like Maven which love to download dependencies from the internet at build time. This bore some fruit as I was able to discuss the problem at length with people more familiar with java/mvn later on over dinner.

But… all of that is not the point of this blog post.

The point is that directly after my talk, Sander (a long time contributor and stakeholder) and I scurried off into a corner and finally pounded out a piece of code we had been envisioning for years.

BuildStream / Bazel vision

Good integration with Bazel has always been an important goal for us especially since we got involved with the remote execution space and as such are using the same remote execution APIs (or REAPI).

While BuildStream generally excels in build correctness and robustness, Bazel is the state of the art in build optimization. Bazel achieves this by maximizing parallelism of discrete compilation units, coupled with caching the results of these fine grained compilation units. Of course this requires that a project adopt/support Bazel as a build system, similar to how a project may support being built with cmake or autotools.

Our perspective has been that, and I hope I’m being fair:

  • Bazel requires significant configuration and setup, from the perspective of one who just wants to download a project and build it for the first time.
  • Bazel will build against your system toolchain by default, but provides some features for executing build actions inside containers, which is slow due to spawning separate containers for each action, and is still considered an experimental feature.
  • BuildStream should be able to provide a safe harness for running Bazel without too much worries about configuration of build environments as well as provide strict control of what dependencies (toolchain) Bazel can see.
  • A marriage of both tools in this manner should provide a powerful and reliable setup for organizations which are concerned with:
    • Building large mono repos with Bazel
    • Building the base toolchain/runtime and operating system on which their large Bazel built code bases are intended to be built with and run

While running Bazel inside BuildStream has always been possible and does ensure determinism of all build inputs (including Bazel itself), this has never been attractive since Bazel would not have access to remote execution features and more importantly it would not have access to the CAS where all of the discrete compilation units are cached.

And this is the crux of the issue; in order to have a performant solution for running Bazel in BuildStream, we need to have the option for tools in the sandbox to have access to some services, such as the CAS service and remote execution services.

And so this leads me to our proof of concept.

BuildStream / recc proof of concept

At the conference we initially tried this with Bazel, but I was unable to successfully configure Bazel to enable remote caching, so we went for an equivalent test using the Remote Execution Caching Compiler (recc). Recc is essentially like ccache, but uses the CAS service for caching compilation results.

After some initial brainstorming, some hacking, and some advice from Jürg, we were able to come up with this one line patch all in a matter of hours.

As visualized in the following approximate diagram, the patch for now simply unconditionally exposes a unix domain socket at a well known location in the build environment, allowing tooling in the sandbox to have access to the CAS service.

 

 

 

 

 

 

 

 

 

Asides from the simple patch, we generated the buildstream-recc-demo repository to showcase recc in action. The configuration for this in the sandbox is dead simple, as evidenced by the example.bst, building the example simply results in running the following command twice:

recc g++ -I. -c -o hello-time.o hello-time.cc

One can observe evidence that this is working by observing the buildstream logs of the element, and indeed the second compile (or subsequent compiles) results in the expected cache hit.

By itself, using recc in this context to build large source bases like WebKit should provide significant performance benefits, especially if the cache results can be pushed to a central / shared CAS and shared with peers.

Next steps

In order to bring this proof of concept to fruition, we’ll have to consider topics such as:

  • How to achieve the same thing in a remote execution context
    • Possibly this needs to be a feature for a remote worker to implement, and as such at first, BuildStream would probably only support the BuildGrid remote execution service implementation for this feature
  • Consider the logistics of sharing cached data with peers using remote CAS services
    • Currently this demo gives access to the local CAS service
  • Consider possible side effects when multiple different projects are using the same CAS, and whether we need to delegate any trust to the tools we run in our sandboxes:
    • Do tools like recc or bazel consider the entire build environment digest when composing the actions used to index build results / cache hits ?
    • If not, could these tools be extended to be informed of a digest which uniquely identifies the build environment ?
    • Could BuildStream or BuildBox act as a middle man in this transaction, such that we could automatically augment the actions used to index cached data to include a unique digest of the build environment ?

The overarching story here is of course more work than our one line patch, but put in perspective it is not an immense amount of work. We’re hoping that our proof of concept is enough to rekindle the interest required to actually push us over the line here.

In closing

To summarize, I think it was a great to meet with a completely different community at ApacheCon and I would recommend it to anyone involved in FLOSS to diversify the projects and conferences they are involved in, as a learning experience, and also because in a sense, we are all working towards very similar common goods.

Thanks to Codethink for sponsoring the BuildStream project’s presence at the conference, with the 2.0 release very close on the horizon it should be an exciting time for the project.

 

October 13, 2022

GNOME 43: Endless’s Part In Its Creation

GNOME 43 is out, and as always there is lots of good stuff in there. (Me circa 2014 would be delighted to see the continuous improvements in GNOME’s built-in RDP support.) During this cycle, the OS team at Endless OS Foundation spent a big chunk of our time on other initiatives, such as bringing Endless Key to more platforms and supporting the Endless Laptop programme. Even so, we made some notable contributions to this GNOME release. Here are a few of them!

App grid pagination improvements

The Endless OS desktop looks a bit different to GNOME, most notably in that the app grid lives on the wallpaper, not behind it. But once you’re at the app grid, it behaves the same in both desktops. Endless OS computers typically have hundreds of apps installed, so it’s normal to have 2, 3, or more pages of apps.

We’ve learned from Endless OS users and partners that the row of dots at the bottom of the grid did not provide enough of a clue that there are more pages than the first. And when given a hint that more pages are available, indicated by those dots, users rarely discovered that they can switch with the scroll wheel or a swipe: they would instead click on those tiny dots. Tricky even for an accomplished mouse user!

GNOME 40 introduced an effect where moving the mouse to the edges of the screen would cause successive pages of apps to “peek” in. As we’ve carried out user testing on our GNOME 41-based development branch (more on this another time) we found that this was not enough: if you don’t know the other pages are there, there’s no reason to deliberately move your mouse pointer to the empty space at the edges of the screen.

So, we proposed for GNOME something similar to what we designed and shipped in Endless OS 4: always-visible pagination arrows. What we ended up implementing & shipping in GNOME 43 is a bit different to what we’d proposed, after several rounds of iteration with the GNOME design team, and I think it’s even better for it. Implementing this was also an opportunity to fix up many existing bugs in the grid, particularly when dragging and dropping between pages.

GNOME 43 app grid, showing a pagination arrow to the right-hand side

GNOME 43 app grid, showing next page peeking in while dragging-and-dropping an app

GNOME Software

43% of the code-changing commits between GNOME Software 42.0 and the tip of gnome-software main as of 29th September came from Endless – not bad, but still no match for Red Hat’s Milan Crha, who single-handedly wrote 46% of the commits in that range! (Counting commits is a crude measure, and excluding translation updates and merge commits overlooks significant, essential work; even with those caveats, I still think the number is striking.)

Many of our contributions in this cycle were part of the ongoing threading rework that Philip Withnall spoke about at GUADEC 2022, with the goals of improving performance, reducing memory usage, and eliminating hangs due to thread-pool exhaustion. Along the way, this included some improvements to the way that featured and Editor’s Choice apps are retrieved.

Several patches bearing Joaquim Rocha’s name and an Endless email address landed in this cycle, improving Software’s handling of apps queued for installation, despite Joaquim having moved on from Endless 3 years ago. These originally come from Endless’s fork of GNOME Software and date back to 2018, and made their way upstream as part of our quest to converge our fork with upstream. In related news, we recently rebased the Endless OS branch of Software onto the gnome-43; we are down from 200+ patches a few years ago to 19. Nearly there!

At the start of this year, Phaedrus Leeds was contracted by the GNOME Foundation (funded by Endless Network) to reintroduce the ability to install and manage web apps with Software, even when GNOME Web is installed with Flatpak. This work was not quite ready for the GNOME 42 feature freeze, and landed in GNOME 43. I personally did a trivial amount of work to enable this feature in GNOME OS and add a few sites to the curated list, but as I write this post I have realised that these additions were not actually shipped in the GNOME Software 43.0 tarball. I did a bit of research into how we can expand this curated list without creating a tonne of extra work for our community of volunteers, but haven’t had a chance to write this up just yet.

Five "Picks from the Web" In GNOME Software 43

Looking to the future, Georges Stavracas has recently spent some time improving Software’s sysprof integration to help understand where Software is spending its time, and hence improve its perceived responsiveness. One of the first discoveries is that the majority of the delay before a category page becomes responsive is spent downloading app icons; making this asynchronous will make Software feel much snappier. Alas, the current approach for fixing this will change Software’s plugin API, so will have to wait for GNOME 44. I’m sure that with decent profiler integration and enough eyes on the profiler, we’ll be able to find more cases like this.

GTK 4-flavoured Initial Setup

Serial GTK 4 porter Georges Stavracas ported Initial Setup to GTK 4. Since Initial Setup uses libmalcontent-ui to implement its parental controls pages, he also ported the Parental Controls app to GTK 4.

"About You" page in GNOME Initial Setup 43. Full name: Michael Banyan. Username: bovine poet laureate

Parental controls in GNOME 43

This port was a direct update of the existing UI to a new toolkit version, only adopting new widgets like GtkPasswordEntry and AdwPreferencesPage where it was trivial to do so. Designs exist for a refreshed Initial Setup interface – anyone interested in picking this up?

Initial Setup has a remarkably large dependency graph, which made this update trickier than it might otherwise have been. I made a start back in January for GNOME 42 and dealt with some of the easier library changes, but more traps awaited:

  • Initial Setup depends on goa-backend-1.0, the bit of GNOME Online Accounts that actually has a user interface (which uses WebKit). This is, for now, GTK 3 only. The solution Georges used here was the same as he used in GNOME Settings: move it into a separate process.
  • Next up, Initial Setup itself uses WebKit (to show the Mozilla Location Service terms of service and abrt privacy policy). The GTK 4 port of WebKit was not widely available in distros at the time of the port. As a result, Initial Setup’s GitLab CI switched from Fedora to Arch. It also means that Initial Setup has a transitive dependency on libsoup 3…
  • Malcontent uses libflatpak; until recently, libflatpak had a hard dependency on libsoup 2.4 with no libsoup 3 port in evidence. So Initial Setup would transitively link to both libsoup 2.4 and 3, and abort on startup, if parental controls were enabled. Happily, a libcurl backend appeared in libflatpak 1.14, and libostree already had a libcurl backend, so if your distro configures both of those to use libcurl then parental controls can be safely enabled in Initial Setup 43.

It’s interesting to me that libsoup’s API changes have caused several GNOME-adjacent projects not to migrate to the new API, but to de-facto move away from libsoup. Hindsight is 20/20, etc. There is a draft pull request to build ostree against libsoup 3, so perhaps they will return to the soup tureen in due course.

Behind the scenes / friends of GNOME

Not all heroes wear capes, and not all contributions are as visible as others. Our team continues to co-maintain countless GNOME and GNOME-adjacent modules, and fix tricky problems at their source, such as this file-descriptor leak Georges caught in libostree.

We’ve been involved in GNOME design discussions, with Cassidy James Blaede (who joined Endless earlier this year) joining the Design team. We helped reach consensus for the new Quick Settings design, and are continuing to be involved in future design initiatives.

Someone recently asked Cassidy on Twitter whether it is true that GNOME OS is “basically just a modified version of Endless OS”, as they had heard. It’s not! But you could probably consider them second cousins once removed, and they have enough in common that improvements flow both ways. GNOME OS uses eos-updater, our libostree-based daemon that downloads and installs OS updates (and does some other stuff that GNOME OS doesn’t use). A while back, Dan Nicholson taught eos-updater how to not lose changes in /etc in the time between an update being installed, and it being booted. (Which can be pretty bad! Entire users can be lost this way!) But we found that this libostree feature interacted poorly with the way /boot is automounted on systems that use systemd-boot, so the change was disabled on such systems. More recently, Dan fixed libostree to work correctly in this case, so eos-updater can now correctly preserve changes in /etc. GNOME OS uses systemd-boot, so in due course the fixed libostree and eos-updater will appear there and this problem you probably didn’t know you have will be fixed.

And since my last post, Jian-Hong Pan updated TurtleBlocks on Flathub to the GNOME 42 runtime, dealing with another of the long tail of Flathub apps on end-of-lifed runtime versions. Sadly it fails to build on the GNOME 43 runtime due to an apparent setuptools regression, but 42 has another 6 months in it yet.

I could go on, and I’m sure I’ve there are things my fickle memory has overlooked, but for now: onwards!

October 10, 2022

Post Collapse Computing Part 2: What if we Fail?

This is a lightly edited version of my GUADEC 2022 talk, given at c-base in Berlin on July 21, 2022. Part 1 briefly summarizes the horrors we’re likely to face as a result of the climate crisis, and why civil resistance is our best bet to still avoid some of the worst-case scenarios. Trigger Warning: Very depressing facts about climate and societal collapse.

While I think it’s critical to use the next few years to try and avert the worst effects of this crisis, I believe we also need to think ahead and consider potential failure scenarios.

What would it mean if we fail to force our governments to enact the necessary drastic climate action, both for society at large but also very concretely for us as free software developers? In other words: What does collapse mean for GNOME?

In researching the subject I discovered that there’s actually a discipline studying questions like this, called “Collapsology”.

Collapsology studies the ways in which our current global industrial civilization is fragile and how it could collapse. It looks at these systemic risks in a transdisciplinary way, including ecology, economics, politics, sociology, etc. because all of these aspects of our society are interconnected in complex ways. I’m far from an expert on this topic, so I’m leaning heavily on the literature here, primarily Pablo Servigne and Raphaël Stevens’ book How Everything Can Collapse (translated from the french original).

So what does climate collapse actually look like? What resources, infrastructure, and organizations are most likely to become inaccessible, degrade, or collapse? In a nutshell: Complex, centralized, interdependent systems.

There are systems like that in every part of our lives of course, from agriculture, to pharma, to energy production, and of course electronics. Because this talk’s focus is specifically the impact on free software, I’ll dig deeper on a few areas that affect computing most directly: Supply chains, the power grid, the internet, and Big Tech.

Supply Chains

As we’ve seen repeatedly over the past few years, the supply chains that produce and transport goods across the globe are incredibly fragile. During the first COVID lockdowns it was toilet paper, then we got the chip shortage affecting everything from Play Stations to cars, and more recently a baby formula shortage in the US, among others. To make matters worse, many industries have moved to just-in-time manufacturing over the past decades, making them even less resilient.

Now add to that more and more extreme natural disasters disrupting production and transport, wars and sanctions disrupting trade, and financial crises triggered or exacerbated by some of the above. It’s not hard to imagine goods that are highly dependent on global supply chains becoming prohibitively expensive or just impossible to get in parts of the world.

Computers are one of the most complex things manufactured today, and therefore especially vulnerable to supply chain disruption. Without a global system of resource extraction, manufacturing, and trade there’s no way we can produce chips anywhere near the current level of sophistication. On top of that chip supply chains are incredibly centralized, with most of global chip production being controlled by a single Taiwanese company, and the machines used for that production controlled by a single Dutch company.

Power Grid

Access to an unlimited amount of power, at any time, for very little money, is something we take for granted, but probably shouldn’t. In addition to disruptions by extreme weather events one important factor here is that in an ever-hotter world, air conditioning starts to put an increasing amount of strain on the power grid. In parts of the global south this is one of the reasons why power outages are a daily occurrence, and having power all the time is far from guaranteed.

In order to do computing we of course need power, not only to run/charge our own devices, but also for the data centers and networking infrastructure running a lot of the things we’re connecting to while using those devices.

Which brings us to our next point…

Internet

Having a reliable internet connection requires a huge amount of interconnected infrastructure, from undersea cables, to data centers, to the local cable infrastructure that goes to your neighborhood, and ultimately your router or a nearby cellular tower.

All of this infrastructure is at risk of being disrupted by sea level rise and extreme weather, taken over by political actors wanting to control the flow of information, abandoned by companies when it becomes unprofitable to operate in a certain area due to frequent extreme weather, and so on.

Big Tech

Finally, at the top of the stack there’s the actual applications and services we use. These, too, have become ever more centralized and fragile at all levels over the past decades.

At the most basic level there’s OS updates and app stores. There are billions of iOS devices out there that are literally unable to get security updates or install new software if they lose access to Apple’s servers. Apple collapsing seems unlikely in the short term, but, for example, what if they stop doing business in your country because of sanctions?

We used to warn about lock-in to proprietary software and formats, but at least Photoshop CS2 continues to run on your computer regardless of what happens to the company. With Figma et al you can not only not access your existing files anymore if the server isn’t accessible, you can’t even create new ones.

In order to get a few nice sharing and collaboration features people are increasingly just running all software in the cloud on someone else’s computer, whether it’s Google Slides for presentations, SketchUp for 3D modeling, Notion for note taking, Figma for design, and even games via game streaming services like Stadia.

From a free software perspective another particularly risky point of corporate centralization is Github, given that a huge number of important projects are hosted there. Even if you’re not actively using it yourself for development, you’re almost certainly depending on other projects hosted on Github. If something were to happen to it… yikes.

Failure Scenarios

So to summarize, this is a rough outline of a potential failure scenario, as applied to computing:

  • No new hardware: It’s difficult and expensive to get new devices because there’s little to no new ones being made, or they’re not being sold where you live.
  • Limited power: There’s power some of the time, but only a few hours a day or when there’s enough sun for your solar panels. It’s likely that you’ll want to use it for more important things than powering computers though…
  • Limited connectivity: There’s still a kind of global internet, but not all countries have access to it due to both degraded infrastructure and geopolitical reasons. You’re able to access a slow connection a few times a month, when you’re in another town nearby.
  • No cloud: Apple and Google still exist, but because you don’t have internet access often enough or at sufficient speeds, you can’t install new apps on your iOS/Android devices. The apps you do have on them are largely useless since they assume you always have internet.

This may sound like an unrealistically dystopian scenario, until you realize: Parts of the global south are experiencing this today. Of course a collapse of these systems at the global level would have a lot of other terrible consequences, but I think seeing the global south as a kind of preview of where everyone else is headed is a helpful reference point.

A Smaller World

The future is of course impossible to predict, but in all likelihood we’re headed for a world where everything is a lot more local, one way or the other. Whether by choice (to reduce emissions and be more resilient), or through a full-on collapse, our way of life is going to change drastically over the next decades.

The future we’re looking at is likely to be a lot more disconnected in terms of the movement of goods, people, as well as information. This will necessitate producing things locally, with whatever resources are available locally. Given the complexity of most supply chains, this means many things we build today probably won’t be produced at all anymore, so there will need to be a lot more repair, and a lot less consumption.

Above all though, this will necessitate much stronger communities at the local level, working together to keep things running and make life liveable in the face of the catastrophes to come.

To be Clear: Fuck Nazis

When discussing apocalyptic scenarios like these I think a lot of people’s first point of reference is the Hollywood version of collapse – People out for themselves, fighting for survival as rugged individuals. There are certain types of people attracted by that who hold other reprehensible views, so when discussing topics like preparing for collapse it’s important to distance oneself from them.

That said, individual prepping is also not an effective strategy, because real life is not a Hollywood movie. In crisis scenarios mutual aid is just as natural a response for people as selfishness, and it’s a much better approach to actually survive longer-term. Resilient communities of people helping each other is our best bet to withstand whatever worst case scenarios might be headed our way.

We’ll Still Need Computers…

If this future comes to pass, how to do computing will be far from our biggest concern. Having enough food, drinkable water, and other necessities of life are likely to be higher on our priority list. However, there will definitely be important things that we will need computers for.

The thing to keep in mind here is that we’re not talking about the far future here: The buildings, roads, factories, fields, etc. we’ll be working with in this future are basically what we have today. The factories where we’re currently building BMWs are not going away overnight, even if no BMWs are being built. Neither are the billions of Intel laptops and mid-range Android phones currently in use, even if they’ll be slow and won’t get updates anymore.

So what might we need computers for in this hyper-local, resource-constrained future?

Information Management

At the most basic level, a lot of our information is stored primarily on computers today, and using computers is likely to remain the most efficient way to access it. This includes everything from teaching materials for schools, to tutorials for DIY repairs, books, scientific papers, and datasheets for electronics and other machines.

The same goes for any kind of calculation or data processing. Computers are of course good at the calculations needed for construction/engineering (that’s kind of what they were invented for), but even things like spreadsheets, basic scripting, or accounting software are orders of magnitude more efficient than doing the same things without a computer.

Local Networking

We’re used to networking always meaning “access to the entire internet”, but that’s not the only way to do networks – Our existing computers are perfectly capable of talking to each other on a local network at the level of a building or town, with no connection to a global internet.

There are lots of examples of potential use cases for local-only networking and communication, e.g. city-level mesh networks, or low-connectivity chat apps like Briar.

Reuse, Repair, Repurpose

Finally, there’s a ton of existing infrastructure and machinery that needs computers in order to be able to run, be repaired, or repurposed, including farm equipment, medical devices, public transit, and industrial tools.

I’m assuming – but this is conjecture on my part, it’s really not my area of expertise – the machines we’re currently using to build cars and planes could be repurposed to make something more useful, which can actually still be constructed with locally available resources in this future.

…Running Free Software?

As we’ve already touched on earlier, the centralized nature of proprietary software means it’s inherently less resilient than free software. If the company building it goes away or doesn’t sell you the software anymore, there’s not much you can do.

Given all the risks discussed earlier, it’s possible that free software will therefore have a larger role in a more localized future, because it can be adapted and repaired at the local level in ways that are impossible with proprietary software.

Assumptions to Reconsider?

However, while free software has structural advantages that make it more resilient than proprietary software, there are problematic aspects of current mainstream technology culture that affect us, too. Examples of assumptions that are pretty deeply ingrained in how most modern software (including free software) is built include:

  • Fast internet is always available, offline/low-connectivity is a rare edge case, mostly relevant for travel
  • New, better hardware is always around the corner and will replace the current hardware within a few years
  • Using all the resources available (CPU, storage, power, bandwidth) is fine

Assumptions like these manifest in many subtle ways in how we work and what we build.

Dependencies and Package Managers

Over the past decade language-specific package managers such as npm and crates.io have taken off in an unprecedented way, leading to software with larger and more complex dependency graphs than ever before. This is the dominant paradigm for building software today, newer languages all come with their own built-in package manager.

However, just like physical supply chains, more complex dependency graphs are also less resilient. More dependencies, especially with pinned versions and lack of caching between projects means huge downloads and long build times when building software locally, resulting in lots of bandwidth, power, and disk space being used. Fully offline development is basically impossible, because every project you build needs to download its own specific version of every dependency.

It’s possible to imagine some kind of cross-project shared local dependency cache for this, but to my knowledge no language ecosystem is doing this by default at the moment.

Web-Based Tooling

Core parts of the software development workflow are increasingly moving to web-based tools, especially around code forges like Github or Gitlab. Issue management, merge requests, CI, releases, etc. all happen on these platforms, which are primarily or exclusively used via very, very, slow websites. It’s hard to overstate this: Code forges are among the slowest, shittiest websites out there, basically unusable unless you have a fast connection.

This is, of course, not resilient at all and a huge problem given that we rely on these tools for many of our key workflows.

Cloud Storage & Streaming

As already discussed relying on data centers is problematic on a number of levels, but in practice most people (even in the free software community), have embraced cloud services in some areas, at least at a personal level.

Instead of local photo, music, and movie collections many of us just use Google Photos, Spotify, and Netflix nowadays, which of course affects which kinds of apps are being built. For example, there are no modern, actively developed apps to manage your photo collection locally anymore, but we do have a nice, modern Spotify client

Global Community Without the Internet?

Scariest of all, I think, is imagining free software development without the internet. This movement came into existence and grew alongside the global internet in the 80s and 90s, and it’s almost impossible to imagine what it could look like without it.

Maybe the movement as a whole, as well as individual projects would splinter into smaller, local versions in the regions that are still connected? But would there be a sufficient amount of expertise in each of those regions? Would development at any real scale just stop, and instead people would only do small repairs to what they have at the local level?

I don’t have any smart answers here, but I believe it’s something we really ought to think about.

This was part two of a four-part series. In part 3 we’ll look at concrete ideas and examples of things we can work towards to make our software more resilient.

How I Started Programming, and How You Can Too

Introduction

I am writing this article on my birthday to give my thanks and appreciations to those who helped me start and continue my journey with programming. I want to return the favor by explaining how I started programming, for those who are struggling with getting started with programming, and give them some motivation to continue their journey.

I’ve had a lot of trouble getting started with programming. About 6 years ago, I tried reading free books and documentation online all by myself. The complications and assumptions from these resources caused me to lose motivation very quickly. I tried this approach several times every couple of months, but the results were consistent – it always ended with me giving up and not making much progress.

Last year, I tried a completely different approach: taking a course from Harvard University, and then contributing to free and open source projects. This approach was really effective and got me to a point where I joined the Bottles project, one of the most popular applications on Flathub. And later became a member of the GNOME Foundation, a leading organization on the Linux desktop.

Online Courses From Known Institutions

The very first step I suggest is to take an online course from a well-known institution, by listening to lectures and completing assignments. I chose CS50’s Introduction to Computer Science, by Harvard University. The course itself, learning materials, assignments and grading are entirely free of cost, but you can optionally pay for a certification.

The goal of this course is to provide resources and explain the fundamental basics of programming for those who have no prior experience with programming. It is self-paced. You can get started at any time, and you can transfer your assignments the following year to continue your work, without starting over. There are 11 classes of CS50, from week 0 to week 10. For simplicity, a week can be considered as an “episode”. Each week has at least one assignment.

Bear in mind that the course is challenging and time consuming. However, CS50 is active in several platforms, so you can ask the CS50 community for help. In my experience, the community was really friendly, answered my questions very clearly and was fast at responding.

I struggled a lot with CS50 because of their challenging assignments, but I noticed that it was the only time I spent more than a month with programming without giving up at all. In my opinion, this challenge is worth accepting, and I highly recommend starting with CS50, as there’s a big community contributing to it.

Opening up the Possibilities

After completing from week 0 to week 8 of CS50 and from the knowledge and experience you gained from those weeks and assignments, I suggest to contribute to free and open source projects that interest you, especially ones that you regularly use. For me, it took me roughly 4 months to complete from week 0 to week 8, as I struggle with concentration.

You can contact the maintainer of your preferred project and ask if there are issues that need to be addressed, such as bugs or feature proposals. If there is an issue that you’d like to tackle, then you can ask the maintainer to assign you to that issue and provide guidance if need be. I decided to contribute to Bottles, which helped me expand to different projects.

Bottles

I was interested in Bottles for a long time, as I used it a lot (and still do), so it was my first pick. I noticed that the project is sophisticated and beyond my level of comprehension. Instead of postponing my contributions to Bottles, I attacked it head on. I suggest to have a similar mentality as well, unless you firmly believe that the project in question is too sophisticated. This was when I met Mirko Brombin, the founder and maintainer of Bottles.

My first code contributions to Bottles were correcting typos. After some time, I got the idea to create a dialog for vkBasalt as a feature proposal, as I wanted to learn more about vkBasalt and felt like it would be really useful for Bottles. Since I had very little knowledge with Bottles’s codebase, I asked many questions. I could say more than 100 questions. He tutored me with this subproject that later became my very first major contribution to the project.

Two months later of hard work, my massive 1000+ lines contribution was finally introduced to Bottles and later as an announcement! Since then, I continue to contribute to Bottles, and Mirko became a good friend of mine.

“Help Me Help You”

This method I call “help me help you”. It’s where I ask the maintainer of a project to help me with making my first contributions, so I can help them back with any kinds of contributions, be it code, quality assurance, documentation, etc. This is really effective, as it got me to understand Bottles, Python and GTK. I heavily encourage others to try this method.

Other Projects

With the knowledge I gained from my experience with Bottles, I started contributing to other GTK projects. I ported Fractal and Workbench to the new About window (AdwAboutWindow). At the time of writing this article, I am helping rework the user interface of Tubefeeder.

Why Did I Fail Initially?

My biggest struggle with programming wasn’t even programming itself, but my mentality, and I feel like this is the case for many people who struggle with programming. For a long time, I had the mindset of doing everything myself, without asking for help. I had a fear of asking stupid questions and people making fun of me, so I prevented both for 6 years.

However, I didn’t notice the harm it caused to my mental health and opportunities. Unfortunately, this ignorance costed me 6 years of my life. If you have the fear of asking stupid questions, then my advice to you is: what’s more stupid than asking stupid questions is to not ask the question, so ask your damn question. By avoiding to ask potential stupid questions, you are actively preventing yourself from asking good questions, which can literally hinder your learning experience. I urge people to not repeat my mistake.

If your “tutor” makes fun of you or is condescending to you, then I advise you to politely ask them to stop. After all, maintainers are humans too, so this kind of behavior can potentially happen; giving them another chance can be beneficial to long-term relationships between maintainers and potential contributors. If they keep this attitude by not respecting your request, then I argue they lost a valuable potential contributor. In that case, I suggest to contribute to another project. Don’t let this incident stop you, but use it as a means to help you deal with similar scenarios in the future.

Conclusion

I struggled with programming for a really long time, due to my ignorance and fears. Every decision you take in your life comes with at least one compromise.

At the beginning, I prioritized comfort, but unnoticeably compromised opportunities. Last year, I did the other way around, which not only increased my opportunities to learn new things, but in turn got me really comfortable with the outcome.

When you take a decision, try to look at what you are compromising. See if this compromise is worth addressing and address it if it is. Ask for help when you are certain you need help – and be prepared to be made fun of, so you can continue your journey without it negatively affecting the future.

I want to thank Mirko Brombin, the staffs at CS50 and the community for guiding me through my journey in programming. I hope this gives the motivation to anyone who is struggling with programming, or prevent potential learners from repeating my mistake.


Edit 1: Improve sentences and clarity

Trying out Zola

For nearly two years I have been inactive on my blog despite spending time making a fancy website, but I can no longer afford the extra code to maintain and infrastructure work to keep it running. So I decided to move the posts I had on the old website to a statically generated one while waiting for the CI to pass during the gtk-rs hackfest.

One of the annoyances I had with static websites generators is that they were too slow. Add to that Jekyll is written in Ruby and it was too difficult to get any rubygem installed on my machine.

Enters Zola

Per the project description, Zola is a "A fast static site generator in a single binary with everything built-in".

Installation

The installation process is pretty easy using cargo

cargo install zola

You could also use the pre-built binaries if you want to avoid waiting a bit of time for everything to compile locally, see https://www.getzola.org/documentation/getting-started/installation/.

Once Zola is installed, you can initialize a new project:

zola init my-new-blog

To run the web server and test your changes:

zola serve

Then you either have the choice to write your template/theme based on your needs or if you want something that just works like me, you will probably be served by https://www.getzola.org/themes/.

Writing process

That would depend on the theme you ended up picking, in the case of Serene, all I had to do is create a blog directory inside content and add a markdown file for each post.

A very helpful feature I noticed while porting my articles to markdown, that validates all the links on your posts is

zola check

Publishing

The simplest way of publishing a static website nowadays is to use something like Github/Gitlab Pages. The documentation got you covered as they include the yaml receipe for the popular services out there https://www.getzola.org/documentation/deployment/overview/.

Although, when using the Github receipe, it seems there is an error in the documentation and you will have to do the following change for it to work

-           TOKEN: ${{ secrets.GITHUB_TOKEN }}
+           TOKEN: $GITHUB_ACTOR:${{ secrets.GITHUB_TOKEN }}

Conclusion

As you might have noticed, I managed to migrate all the content I had on the old website a few hours after installing Zola for the first time. The whole process is pretty smooth if you are familiar with static website generators.

If you are looking for a Jekyll/Hugo replacement, I highly recommend you to give Zola a try.

October 06, 2022

Using cppfront with Meson

Recently Herb Sutter published cppfront, which is an attempt to create C++ a new syntax to fix many issues that can't be changed in existing C++ because of backwards compatibility. Like with the original cfront compiler, cppfront works by parsing the "new syntax" C++ and transpiling it to "classic" C++, which is then compiled in the usual way. These kinds of source generators are fairly common (it is basically how Protobuf et al work) so let's look at how to add support for this in Meson. We are also going to download and build the cppfront compiler transparently.

Building the compiler

The first thing we need to do is to add Meson build definitions for cppfront. It's basically this one file:

project('cppfront', 'cpp', default_options: ['cpp_std=c++20'])

cppfront = executable('cppfront', 'source/cppfront.cpp',
  override_options: ['optimization=2'])

meson.override_find_program('cppfront', cppfront)
cpp2_dep = declare_dependency(include_directories: 'include')

The compiler itself is in a single source file so building it is simple. The only thing to note is that we override settings so it is always built with optimizations enabled. This is acceptable for this particular case because the end result is not used for development, only consumption. The more important bits for integration purposes are the last two lines where we define that from now on whenever someone does a find_program('cppfront') Meson does not do a system lookup for the binary but instead returns the just-built executable object instead. Code generated by cppfront requires a small amount of helper functionality, which is provided as a header-only library. The last line defines a dependency object that carries this information (basically just the include directory).

Building the program

The actual program is just a helloworld. The Meson definition needed to build it is this:

project('cpp2hello', 'cpp',
    default_options: ['cpp_std=c++20'])

cpp2_dep = dependency('cpp2')
cppfront = find_program('cppfront')

g = generator(cppfront,
  output: '@[email protected]',
  arguments: ['@INPUT@', '-o', '@OUTPUT@']
  )

sources = g.process('sampleprog.cpp2')

executable('sampleprog', sources,
   dependencies: [cpp2_dep])

That's a bit more code but still fairly straightforward. First we get the cppfront program and the corresponding dependency object. Then we create a generator that translates cpp2 files to cpp files, give it some input and compile the result.

Gluing it all together

Each one of these is its own isolated repo (available here and here respectively). The simple thing would have been to put both of these in the same repository but that is very inconvenient. Instead we want to write the compiler setup once and use it from any other project. Thus we need some way of telling our app repository where to get the compiler. This is achieved with a wrap file:

[wrap-git]
directory=cppfront
url=https://github.com/jpakkane/cppfront
revision=main

[provide]
cpp2 = cpp2_dep
program_names = cppfront

Placing this in the consuming project's subprojects directory is all it takes. When you start the build and try to look up either the dependency or the executable name, Meson will see that they are provided by the referenced repo and will clone, configure and build it automatically:

The Meson build system
Version: 0.63.99
Source dir: /home/jpakkane/src/cpp2meson
Build dir: /home/jpakkane/src/cpp2meson/build
Build type: native build
Project name: cpp2hello
Project version: undefined
C++ compiler for the host machine: ccache c++ (gcc 11.2.0 "c++ (Ubuntu 11.2.0-19ubuntu1) 11.2.0")
C++ linker for the host machine: c++ ld.bfd 2.38
Host machine cpu family: x86_64
Host machine cpu: x86_64
Found pkg-config: /usr/bin/pkg-config (0.29.2)
Found CMake: /usr/bin/cmake (3.22.1)
Run-time dependency cpp2 found: NO (tried pkgconfig and cmake)
Looking for a fallback subproject for the dependency cpp2

Executing subproject cppfront 

cppfront| Project name: cppfront
cppfront| Project version: undefined
cppfront| C++ compiler for the host machine: ccache c++ (gcc 11.2.0 "c++ (Ubuntu 11.2.0-19ubuntu1) 11.2.0")
cppfront| C++ linker for the host machine: c++ ld.bfd 2.38
cppfront| Build targets in project: 1
cppfront| Subproject cppfront finished.

Dependency cpp2 from subproject subprojects/cppfront found: YES undefined
Program cppfront found: YES (overridden)
Build targets in project: 2

As you can tell from the logs, Meson first tries to find the dependencies from the system and only after it fails does it try to download them from the net. (This behaviour can be altered.) Now the code can be built and the end result run:

$ build/sampleprog
Cpp2 compilation is working.

The code has only been tested with GCC but in theory it should work with Clang and VS too.

Cloud desktops aren't as good as you'd think

Fast laptops are expensive, cheap laptops are slow. But even a fast laptop is slower than a decent workstation, and if your developers want a local build environment they're probably going to want a decent workstation. They'll want a fast (and expensive) laptop as well, though, because they're not going to carry their workstation home with them and obviously you expect them to be able to work from home. And in two or three years they'll probably want a new laptop and a new workstation, and that's even more money. Not to mention the risks associated with them doing development work on their laptop and then drunkenly leaving it in a bar or having it stolen or the contents being copied off it while they're passing through immigration at an airport. Surely there's a better way?

This is the thinking that leads to "Let's give developers a Chromebook and a VM running in the cloud". And it's an appealing option! You spend far less on the laptop, and the VM is probably cheaper than the workstation - you can shut it down when it's idle, you can upgrade it to have more CPUs and RAM as necessary, and you get to impose all sorts of additional neat security policies because you have full control over the network. You can run a full desktop environment on the VM, stream it to a cheap laptop, and get the fast workstation experience on something that weighs about a kilogram. Your developers get the benefit of a fast machine wherever they are, and everyone's happy.

But having worked at more than one company that's tried this approach, my experience is that very few people end up happy. I'm going to give a few reasons here, but I can't guarantee that they cover everything - and, to be clear, many (possibly most) of the reasons I'm going to describe aren't impossible to fix, they're simply not priorities. I'm also going to restrict this discussion to the case of "We run a full graphical environment on the VM, and stream that to the laptop" - an approach that only offers SSH access is much more manageable, but also significantly more restricted in certain ways. With those details mentioned, let's begin.

The first thing to note is that the overall experience is heavily tied to the protocol you use for the remote display. Chrome Remote Desktop is extremely appealing from a simplicity perspective, but is also lacking some extremely key features (eg, letting you use multiple displays on the local system), so from a developer perspective it's suboptimal. If you read the rest of this post and want to try this anyway, spend some time working with your users to find out what their requirements are and figure out which technology best suits them.

Second, let's talk about GPUs. Trying to run a modern desktop environment without any GPU acceleration is going to be a miserable experience. Sure, throwing enough CPU at the problem will get you past the worst of this, but you're still going to end up with users who need to do 3D visualisation, or who are doing VR development, or who expect WebGL to work without burning up every single one of the CPU cores you so graciously allocated to their VM. Cloud providers will happily give you GPU instances, but that's going to cost more and you're going to need to re-run your numbers to verify that this is still a financial win. "But most of my users don't need that!" you might say, and we'll get to that later on.

Next! Video quality! This seems like a trivial point, but if you're giving your users a VM as their primary interface, then they're going to do things like try to use Youtube inside it because there's a conference presentation that's relevant to their interests. The obvious solution here is "Do your video streaming in a browser on the local system, not on the VM" but from personal experience that's a super awkward pain point! If I click on a link inside the VM it's going to open a browser there, and now I have a browser in the VM and a local browser and which of them contains the tab I'm looking for WHO CAN SAY. So your users are going to watch stuff inside their VM, and re-compressing decompressed video is going to look like shit unless you're throwing a huge amount of bandwidth at the problem. And this is ignoring the additional irritation of your browser being unreadable while you're rapidly scrolling through pages, or terminal output from build processes being a muddy blur of artifacts, or the corner case of "I work for Youtube and I need to be able to examine 4K streams to determine whether changes have resulted in a degraded experience" which is a very real job and one that becomes impossible when you pass their lovingly crafted optimisations through whatever codec your remote desktop protocol has decided to pick based on some random guesses about the local network, and look everyone is going to have a bad time.

The browser experience. As mentioned before, you'll have local browsers and remote browsers. Do they have the same security policy? Who knows! Are all the third party services you depend on going to be ok with the same user being logged in from two different IPs simultaneously because they lost track of which browser they had an open session in? Who knows! Are your users going to become frustrated? Who knows oh wait no I know the answer to this one, it's "yes".

Accessibility! More of your users than you expect rely on various accessibility interfaces, be those mechanisms for increasing contrast, screen magnifiers, text-to-speech, speech-to-text, alternative input mechanisms and so on. And you probably don't know this, but most of these mechanisms involve having accessibility software be able to introspect the UI of applications in order to provide appropriate input or expose available options and the like. So, I'm running a local text-to-speech agent. How does it know what's happening in the remote VM? It doesn't because it's just getting an a/v stream, so you need to run another accessibility stack inside the remote VM and the two of them are unaware of each others existence and this works just as badly as you'd think. Alternative input mechanism? Good fucking luck with that, you're at best going to fall back to "Send synthesized keyboard inputs" and that is nowhere near as good as "Set the contents of this text box to this unicode string" and yeah I used to work on accessibility software maybe you can tell. And how is the VM going to send data to a braille output device? Anyway, good luck with the lawsuits over arbitrarily making life harder for a bunch of members of a protected class.

One of the benefits here is supposed to be a security improvement, so let's talk about WebAuthn. I'm a big fan of WebAuthn, given that it's a multi-factor authentication mechanism that actually does a good job of protecting against phishing, but if my users are running stuff inside a VM, how do I use it? If you work at Google there's a solution, but that does mean limiting yourself to Chrome Remote Desktop (there are extremely good reasons why this isn't generally available). Microsoft have apparently just specced a mechanism for doing this over RDP, but otherwise you're left doing stuff like forwarding USB over IP, and that means that your USB WebAuthn no longer works locally. It also doesn't work for any other type of WebAuthn token, such as a bluetooth device, or an Apple TouchID sensor, or any of the Windows Hello support. If you're planning on moving to WebAuthn and also planning on moving to remote VM desktops, you're going to have a bad time.

That's the stuff that comes to mind immediately. And sure, maybe each of these issues is irrelevant to most of your users. But the actual question you need to ask is what percentage of your users will hit one or more of these, because if that's more than an insignificant percentage you'll still be staffing all the teams that dealt with hardware, handling local OS installs, worrying about lost or stolen devices, and the glorious future of just being able to stop worrying about this is going to be gone and the financial benefits you promised would appear are probably not going to work out in the same way.

A lot of this falls back to the usual story of corporate IT - understand the needs of your users and whether what you're proposing actually meets them. Almost everything I've described here is a corner case, but if your company is larger than about 20 people there's a high probability that at least one person is going to fall into at least one of these corner cases. You're going to need to spend a lot of time understanding your user population to have a real understanding of what the actual costs are here, and I haven't seen anyone do that work before trying to launch this and (inevitably) going back to just giving people actual computers.

There are alternatives! Modern IDEs tend to support SSHing out to remote hosts to perform builds there, so as long as you're ok with source code being visible on laptops you can at least shift the "I need a workstation with a bunch of CPU" problem out to the cloud. The laptops are going to need to be more expensive because they're also going to need to run more software locally, but it wouldn't surprise me if this ends up being cheaper than the full-on cloud desktop experience in most cases.

Overall, the most important thing to take into account here is that your users almost certainly have more use cases than you expect, and this sort of change is going to have direct impact on the workflow of every single one of your users. Make sure you know how much that's going to be, and take that into consideration when suggesting it'll save you money.

comment count unavailable comments

October 05, 2022

2022-10-05 Wednesday

  • Up early, off to the bUm for the COOL Days technical day. Lots of interesting lightning talks back to back, and attendees. Missed some due to writing my own talks during other people's:
    COOL days lightning talk on zstd compression wins
    COOL days lightning talk on image delta wins

October 04, 2022

2022-10-04 Tuesday

  • Up early, worked on slides some more. Off to the Nextcloud Enterprise Day; lovely venue. Pleased to give a supportive COOL / partner presentation - and then disappeared to our (unfortunately parallel) partner event nearby.
  • Good to see Eloy & Naomi doing a great job getting that setup and running, missed Peter though. Some presentations, and got lots of interesting feedback from partners & customers - including how to improve next time. Out for dinner with the remainder, and back to the Radisson with it's in-door acquarium over the bar until late to catch up with the Nextcloud staff & customers.

My journey and a begginers guide to Open Source

Namaste, Everyone!

For quite a while I've been receiving multiple questions regarding how to start contributing to open source, how to get into GSoC, My journey, etc...

After repeating myself, again and again, I'm writing this blog, to answer some of the most asked queries 😃

lets start reading

My Journey -

How did I start?

Well, I don't have a definite date, you can assume the day I started coding to be my start :P I started with making some personal websites, they were pretty basic, but they encouraged me to achieve more.

I soon learned about Web development, both backend, and frontend. My main language until then was Java, but due to frameworks like Django and Flask, I became more accustomed to Python instead. Java is still my main language for anything other than Project development (CP folks hail 😛 ) 😄

How I started Open Source?

I started with my own projects, I learned there is a platform called GitHub, where I posted very basic sites which I made ( They are gone now, you don't want to see them lol), they were nothing serious but gave me some idea about how Git works, but it was mostly dormant. I then did courses like CS50 and CS50w which required Git, so that made me more comfortable using it.

I then worked on a project, which didn't solve any problem, but just strengthened my skills, it was my own Penguin-based OS, called Aryan Linux, made by using "Linux from Scratch". On compiling it, I understood what open source really is, it made me so happy.

Aryan Linux

Then I finally made the switch to Penguin (Linux in common folks terms), I used to use that in a VM, but then I realized that I can use it without issues as my main OS, It was hard to convince my brain to make the switch, it took months of consideration to finally dual boot my system, and well, it was one of the best decisions of my life.

I started with Garuda Linux (An Indian distro :D ), and although it was great, the theme started to poke my eyeballs, so I decided to do an experiment, switch back to Windows, Easy Peasy 😄 I took a theme that I like, but didn't like the color scheme of - WhiteSur, and combined it with other themes and added a bit of my own flavor to make a new theme - NewSur (Innovative name, right? ), and like others, I made a setup and flexed on Reddit :P Turns out there were other crazy peeps who liked it and asked me to post it, so I did, on GitHub 😌

why_colors

This made me even cozier with the community and Git. I then did some other projects like - Logo Menu, Modified AndOTP, Modified LBRY, DraculaSur, etc... All these were made by me for myself or someone specific, but through the power of open source, other people also wanted it and strengthened my love for Open Source.

I then also made one commit to GNOME Extensions website. Though it was nothing big, it was a step in the right direction.

How GSoC?

I came to know about GSoC from my sister, at the time it was something that I believed was impossible for me, on opening the website, all the organizations listed there scared me but also inspired me.

Impossible

But, later on, I forgot about it. Once during family dinner, the topic of GSoC came up, and my sister told me that the deadline is now reached, and I was late.

I got a bit sad but assuming it was impossible anyway, I didn't feel too bad. So I thought even if I couldn't participate, I can at least learn from this year's GSoC and maybe crack it in 2023. The first thing I saw was that there was still time, the proposal period was going to start the next day so I had just 16 days.

I then searched for GNOME Foundation, as I love it, then I opened the ideas list and searched through it, and to my surprise, I found two port to GTK4 projects, one of which used the snake language python, and well, that gave me a ton of hope.

I instantly conveyed it to my guru sis and began drafting the proposal. I removed all thoughts of it being impossible and just began working on the proposal. The only guidance I had was from my sister and her junior who achieved GSoC'21 under Chromium. (Most probably the browser you are reading this on, uses Chromium 🙂 )

I then submitted my proposal, and well after some back and forth with my amazing acharya/mentor Aleb, and some PRs to Pitivi, I got the acceptance mail in May 🥳

How can you start?

Where to start

Where to contribute?

Don't contribute just for the sake of GSoC, contribute because of the love for Open Source, and it will become much easier. Start to use open source alternatives for your existing apps, if they lack something then make issues on the repository, and if you can, maybe fix it yourself 😁

get open source

Become active on platforms like Reddit, and join some communities (Although, beware, there are some toxic communities, you just have to ignore them :) Most of the community is not like that ), there are tons of small niche projects where you can contribute a lot, don't just start running after big shiny projects, start from small projects instead.

Unable to understand code?

If you know the programming language with which the code is written, then it becomes easier. If you use the program, then check some unique text in the application and then search for it in the code. Editors like VSCode help in this regard as you can search the whole repository.

Once you find the string, start to expand your view. You will be able to see how the string was declared, and how it was added to the application, and if everything goes right, using this method you will start to understand some of the code. Understanding the whole code at once is not easy, so don't try that, don't throw all of it towards yourself at once, make it digestible first, only understand parts of it, and then start to connect the dots.

divide and conquer

But what to contribute?

Most repositories have "good first issue" labels, these are put on issues that the developer believes could be a good starting point, so start from those. In Pitivi there were some very easy issues, dating back years and were still unsolved, so try to start from those, don't think this is too small or too old, if it is open and has the label, then the developer wants that to be fixed 🤯

In my case, one of them was to just change one "True" to a "False" 😁

Conclusion

Nothing is impossible if you dedicate yourself to it.

Don't just have dreams, have life goals. Because dreams vanish when you wake up and are something you already assume you can't do, but goals are something that you believe you can achieve and work towards it.

If you still have any queries, feel free to reach out to me, hopefully, I can guide you 🤗

End

That's it for this one, hope to see you in the next blog :)

October 03, 2022

Mon 2022/Oct/03

The series on the WPE port by the WebKit team at Igalia grows, with several new articles that go deep into different areas of the engine:

These articles are an interesting read not only if you're working on WebKit, but also if you are curious on how a modern browser engine works and some of the moving parts beneath the surface. So go check them out!

On a related note, the WebKit team is always on the lookout for talent to join us. Experience with WebKit or browsers is not necessarily a must, as we know from experience that anyone with a strong C/C++ background and enough curiosity will be able to ramp up and start contributing soon enough. If these articles spark your curiosity, feel free to reach out to me to find out more or to apply directly!

October 02, 2022

Toolbx — running the same host binary on Arch Linux, Fedora, Ubuntu, etc. containers

This is a deep dive into some of the technical details of Toolbx and is a continuation from the earlier post about bypassing the immutability of OCI containers.

The problem

As we saw earlier, Toolbx uses a special entry point for its containers. It’s the toolbox executable itself.

$ podman inspect --format "{{.Config.Cmd}}" --type container fedora-toolbox-36
toolbox --log-level debug init-container ...

This is achieved by bind mounting the toolbox executable invoked by the user on the hosts to /usr/bin/toolbox inside the containers. While this has some advantages, it opens the door to one big problem. It means that executables from newer or different host operating systems might be running against older or different run-time environments inside the containers. For example, an executable from a Fedora 36 host might be running inside a Fedora 35 Toolbx, or one from an Arch Linux host inside an Ubuntu container.

This is very unusual. We only expect executables from an older version of an OS to keep working on newer versions of the same OS, but never the other way round, and definitely not across different OSes.

When binaries are compiled and linked against newer run-time environments, they may start relying on symbols (ie., non-static global variables, functions, class and struct members, etc.) that are missing in older environments. For example, glibc-2.32 (used in Fedora 33 onwards) added a new version of the pthread_sigmask symbol. If toolbox binaries built and linked against glibc-2.32 are run against older glibc versions, then they will refuse to start.

$ objdump -T /usr/bin/toolbox | grep GLIBC_2.32
0000000000000000      DO *UND*        0000000000000000  GLIBC_2.32  pthread_sigmask

This means that one couldn’t use Fedora 32 Toolbx containers on Fedora 33 hosts, or similarly any containers with glibc older than 2.32 on hosts with newer glibc versions. That’s quite the bummer.

If the executables are not ELF binaries, but carefully written POSIX shell scripts, then this problem goes away. Incidentally, Toolbx used to be implemented in POSIX shell, until it was re-written in Go two years ago, which is how it managed to avoid this problem for a while.

Fortunately, Go binaries are largely statically linked, with the notable exception of the standard C library. The scope of the problem would be much bigger if it involved several other dynamic libraries, like in the case of C or C++ programs.

Potential options

In theory, the easiest solution is to build the toolbox binary against the oldest supported run-time environment so that it doesn’t rely on newer symbols. However, it’s easier said than done.

Usually downstream distributors use build environments that are composed of components that are part of that specific version of the distribution. For example, it will be unusual for an RPM for a certain Fedora version to be deliberately built against a run-time from an older Fedora. Carlos O’Donell had an interesting idea on how to implement this in Fedora by only ever building for the oldest supported branch, adding a noautobuild file to disable the mass rebuild automation, and having newer branches always inherit the builds from the oldest one. However, this won’t work either. Building against the oldest supported Fedora won’t be enough for Fedora’s Toolbx because, by definition, Toolbx is meant to run different kinds of containers on hosts. The oldest supported Fedora hosts might still be too new compared to containers of supported Debian, Red Hat Enterprise Linux, Ubuntu etc. versions.

So, yes, in theory, this is the easiest solution, but, in practice, it requires a non-trivial amount of cross-distribution collaboration, and downstream build system and release engineering effort.

The second option is to have Toolbx containers provide their own toolbox binary that’s compatible with the run-time environment of the container. This would substantially complicate the communication between the toolbox binaries on the hosts and the ones inside the containers, because the binaries on the hosts and containers will no longer be exactly the same. The communication channel between commands like toolbox create and toolbox enter running on the hosts, and toolbox init-container inside the containers can no longer use a private and unstable interface that can be easily modified as necessary. Instead, it would have complicated backwards and forwards compatibility requirements. Other than that, it would complicate bug reports, and every single container on a host may need to be updated separately to fix bugs, with updates needing to be co-ordinated across downstream distributors.

The next option is to either statically link against the standard C library, or disable its use in Go. However, that would prevent us from using glibc’s Name Service Switch to look up usernames and groups, or to resolve host names. The replacement code, written in pure Go, can’t handle enterprise set-ups involving Network Information Service and Lightweight Directory Access Protocol, nor can it talk to host OS services like SSSD, systemd-userdbd or systemd-resolved.

It’s true that Toolbx currently doesn’t support enterprise set-ups with NIS and LDAP, but not using NSS will only make it more difficult to add that support in future. Similarly, we don’t resolve any host names at the moment, but given that we are in the business of pulling content over the network, it can easily become necessary in the future. Disabling the use of NSS will leave the toolbox binary as this odd thing that behaves differently from the rest of the OS for some fundamental operations.

An extension of the previous option is to split the toolbox executable into two. One dynamically linked against the standard C library for the hosts, and another that has no dynamic linkage to run inside the containers as their entry point. This can impact backwards compatibility and affect the developer experience of hacking on Toolbx.

Existing Toolbx containers want to bind mount the toolbox executable from the host to /usr/bin/toolbox inside the containers and run toolbox init-container as their entry point. This can’t be changed because of the immutability of OCI containers, and Toolbx simply can’t afford to break existing containers in a way where they can no longer be entered. This means that the toolbox executable needs to become a shim, without any dynamic linkage, that forwards the invocation to the right executable depending on whether it’s running on the hosts or inside the containers.

That brings us to the developer experience of hacking on Toolbx. The first thing note is that we don’t to go back to using POSIX shell to implement the executable that’s meant to run inside the container. Ondřej spent a lot of effort replacing the POSIX shell implementation of Toolbx, and we don’t want to undo any part of that. Ideally, we would use the same programming language (ie., Go) to implement both executables so that one doesn’t need to learn multiple disparate languages to work on Toolbx. However, even if we do use Go, we would have to be careful not to share code across the two executables, or be aware that they may have subtle differences in behaviour depending on how they might be linked.

Then there’s the developer experience of hacking on Toolbx on Fedora Silverblue and similar OSTree-based OSes, which is what you would do to eat your own dog food. Experiences are always subjective and this one is unique to hacking Toolbx inside a Toolbx. So let’s take a moment to understand the situation.

On OSTree-based OSes, Toolbx containers are used for development, and, generally speaking, it’s better to use container-specific locations invisible to the host as the development prefixes because the generated executables are specific to each container. Executables built on one container may not work on another, and not on the hosts either, because of the run-time problems mentioned above. Plus, it’s good hygiene not to pollute the hosts.

Similar to Flatpak and Podman, Toolbx is a tool that sets up containers. This means that unlike most other executables, toolbox must be on the hosts because, barring the init-container command, it can’t work inside the containers. The easiest way to do this, is to have a separate terminal emulator with a host shell, and invoke toolbox directly from Meson’s build directory in $HOME that’s shared between the hosts and the Toolbx containers, instead of installing toolbox to the container-specific development prefixes. Note that this only works because toolbox has always been implemented in programming languages with none to minimal dynamic linking, and only if you ensure that the Toolbx containers for hacking on Toolbx matches the hosts. Otherwise, you might run into the run-time problems mentioned above.

The moment there is one executable invoking another, the executables need to be carefully placed on the file system so that one can find the other one. This means that either the executables need to be installed into development prefixes or that the shim should have special logic to work out the location of the other binary when invoked directly from Meson’s build directory.

The former is a problem because the development prefixes will likely default to container-specific locations invisible from the hosts, preventing the built executables from being trivially invoked from the host. One could have a separate development prefix only for Toolbx that’s shared between the containers and the hosts. However, I suspect that a lot of existing and potential Toolbx contributors would find that irksome. They either don’t know or want to set up a prefix manually, but instead use something like jhbuild to do it for them.

The latter requires two different sets of logic depending on whether the shim was invoked directly from Meson’s build directory or from a development prefix. At the very least this would involve locating the second executable from the shim, but could grow into other areas as well. These separate code paths would be crucial enough that they would need to be thoroughly tested. Otherwise, Toolbx hackers and users won’t share the same reality. We could start by running our test suite in both modes, and then meticulously increase coverage, but that would come at the cost of a lengthier test suite.

Failed attempts

Since glibc uses symbol versioning, it’s sometimes possible to use some .symver hackery to avoid linking against newer symbols even when building against a newer glibc. This is what Toolbox used to do to ensure that binaries built against newer glibc versions still ran against older ones. However, this doesn’t defend against changes to the start-up code in glibc, like the one in glibc-2.34 that performed some security hardening.

Current solution

Alexander Larsson and Ray Strode pointed out that all non-ancient Toolbx containers have access to the hosts’ /usr at /run/host/usr. In other words, Toolbx containers have access to the host run-time environments. So, we decided to ensure that toolbox binaries always run against the host run-time environments.

The toolbox binary has a rpath pointing to the hosts’ libc.so somewhere under /run/host/usr and it’s dynamic linker (ie., PT_INTERP) is changed to the one inside /run/host/usr. Unfortunately, there can only be one PT_INTERP entry inside the binary, so there must be a /run/host on the hosts too for the binary to work on the hosts. Therefore, a /run/host symbolic link is also created on the host pointing to the hosts’ /.

The toolbox binary now looks like this, both on the hosts and inside the Toolbx containers:

$ ldd /usr/bin/toolbox
    linux-vdso.so.1 (0x00007ffea01f6000)
    libc.so.6 => /run/host/usr/lib64/libc.so.6 (0x00007f6bf1c00000)
    /run/host/usr/lib64/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007f6bf289a000)

It’s been almost a year and thus far this approach has held its own. I am mildly bothered by the presence of the /run/host symbolic link on the hosts, but not enough to lose sleep over it.

Other options

Recently, Robert McQueen brought up the idea of possibly using the Linux kernel’s binfmt_misc mechanism to modify the toolbox binary on the fly. I haven’t explored this in any seriousness, but maybe I will if the current set-up doesn’t work out.

September 30, 2022

The Fedora Project Remains Community Driven

Introduction

Recently, the Fedora Project removed all patented codecs from their Mesa builds, without the rest of the community’s input. This decision was heavily criticized from the community. For that decision, some even asked the Fedora Project to remove “community driven” from its official description. I’d like to spend some time to explain why, in my opinion, this decision was completely justified, and how the Fedora Project remains community driven.

Law Compliance Cannot Be Voted

Massive organizations, like the Fedora Project, must comply with laws to avoid lawsuits as much as possible. Patent trolls are really common and will target big organizations. Let’s not forget that, in 2019, GNOME was sued by a patent troll. Unfortunately, patent trolling is quite common. And even worse, patent trolling against open source projects has considerably increased since early this year. So, this decision had to be acted quickly, to avoid potential lawsuits as soon as possible.

Complying with laws is not up to the community to decide. For example, Arch Linux, another community driven distribution, cannot and will not redistribute proprietary software, unless they have the permission to do so from the authors of the software. And this is not something that can be voted on, but must be complied with. This doesn’t mean that Arch Linux is not community driven whatsoever; it only means that it’s legally bound, just like how the Fedora Project cannot ship these patented codecs.

Even if the Fedora Project wasn’t sued in the past years, it doesn’t mean that they will continue to be free of lawsuits in the future. The increase in patent trolling is a good reason for the Fedora Project to quickly react on this. If they ever get sued, is the community going to pay for lawyers?

Community Driven

As a volunteer of the Fedora Project who is unaffiliated with Red Hat, I believe that the Fedora Project remains community driven. I am currently a part of Fedora Websites & Apps Team with the “developer” role of the upcoming website revamp repository. This is mainly a volunteer effort, as the majority of us contributing to it are unaffiliated with Red Hat and unpaid developers.

Since we (volunteers) are the ones in control with the decision, we could intentionally make the website look displeasing and appalling. Of course, we care about the Fedora Project, so we want it to look appealing for potential users, contributors and even enterprises that are willing to switch to open source.

Recently, I proposed to unify Silverblue, Kinoite, CoreOS, and other pages’ layouts into one that looks uniform and consistent when navigating, e.g. same navigation bar, footer, color palette, etc. Some developers are considering joining the effort, but some disagree. Of course, this is merely a proposal, but if everyone is on board, then we volunteers will be the ones leading this initiative.

This is one example from personal experience, but many initiatives were (and will be) proposed by independent contributors, and can also lead the effort. Nonlegal proposals are still democratically voted and surveys are still taken seriously. Currently, the Fedora Project is in the process of migrating from Pagure to GitLab, and from IRC to Matrix. That is because the community voted on it. I voiced my opinion and was one of the people who proposed both of those changes in the surveys.

Conclusion

I completely agree with the Fedora Project’s decision on disabling patented codecs from Mesa. These changes cannot and should not be asked by the community, as this is a legal discussion about potential lawsuits. Anything that is nonlegal remains democratically voted by the community, as long as you comply with US laws (unfortunately) and the Fedora Code of Conduct.


Edit 1: Use Arch Linux as an example instead of Gentoo Linux, as it is a binary based distribution

Edit 2: Mention exception for Arch Linux (Credit to array and u/Ursa_Solaris)

September 29, 2022

Progress Update For GNOME 43

GNOME 43 is out the door now, and I want to use this post to share what I’ve done since my post about my plans.

Adaptive Nautilus

The main window of Nautilus is now adaptive, working at mobile sizes. This change required multiple steps:

  • I moved the sidebar from the old GtkPaned widget to the new AdwFlap widget.
  • I added a button to reveal the sidebar.
  • I refactored the toolbar widgetry to seamlessly support multiple toolbars (allowing me to add a bottom toolbar without code duplication).
  • I ported most of the message dialogs to the new AdwMessageDialog widget.
  • I made the empty views use AdwStatusPage.

There are a few issues left before I can call Nautilus fully adaptive, though. The biggest issue is that the Other Locations view does not scale down to mobile sizes. The Other Locations review is currently in the process of being redesigned, so that should be resolved in the near future. Next, the new Properties dialog does not get small enough vertically for landscape mode. Finally, a few message dialogs don’t use AdwMessageDialog and will require special care to port.

Screenshot of Nautilus with a narrow width
Screenshot of Nautilus with a narrow width

In addition to the adaptive widgetry, I also landed some general cleanups to the codebase after the GTK4 port.

Loupe

Since my post in April, Loupe has received many changes. Allan Day provided a new set of mockups for me to work from, and I’ve implemented the new look and a sidebar for the properties. There are some open questions about how the properties should be shown on mobile sizes, so for now Loupe doesn’t fit on phones with the properties view open.

Screenshot of Loupe with the properties sidebar open
Screenshot of Loupe with the properties sidebar open

 

I’ve also reworked the navigation and page loading. Back in April, Loupe only loaded one image at a time, and pressing the arrow keys would load the next image. This could lead to freezes when loading large images. Now Loupe uses AdwCarousel and buffers multiple images on both sides of the current image, and loads the buffered images on a different thread.

Loupe also now has code for translations in place, so that once it’s hooked up to GNOME’s translation infrastructure contributors will be able to translate the UI.

Libadwaita

Some exciting new widgets landed in libadwaita this cycle: AdwAboutWindow, AdwMessageDialog, AdwEntryRow, and AdwPasswordEntryRow. I made an effort to have these new widgets adopted in core applications where possible.

I ported the following apps to use AdwAboutWindow:

  • Text Editor
  • Weather
  • Disk Usage Analyzer
  • Font Viewer
  • Characters
  • Nautilus
  • Calendar
  • Clocks
  • Calculator
  • Logs
  • Maps
  • Extensions
  • Music

Now every single core app that uses GTK4 uses AdwAboutWindow.

Screenshot of Text Editor's about window
Screenshot of Text Editor’s about window

I ported Nautilus and Maps to AdwMessageDialog where possible, and adjusted Contacts and Calendar to use AdwEntryRow. Contacts needed some extra properties on AdwEntryRow, so I implemented those.

I also started work on a new widget, AdwSpinRow. Hopefully it will land this cycle.

Calendar

In addition to the changes mentioned in the libadwaita section, I also made Calendar fit at small widths with AdwLeaflet. The app received a large redesign already, and it was only a few small changes away from being mobile-ready. There are still a few issues with fit, but those should hopefully be resolved soon.

Calendar 44 will hopefully use AdwMessageDialog and a new date selector in the event editor – I have open merge requests for both changes.

Misc. Changes

  • Minor fixups for GNOME Music’s empty state
  • Updated core app screenshots for Disk Usage Analyzer, Text Editor, Contacts, Calendar, and Nautilus
  • Ported Sound Recorder to Typescript

Conclusion

Overall I made a lot of progress, and I hope to make much more this cycle. The GNOME 43 cycle overlapped a very busy time in my life, and now things have cooled down. With your help, I would love to be able to focus more of my time on implementing things you care about.

I have three places you can support me:

That’s all for now. Thank you for reading to the end, and I look forward to reporting more progress at the end of the GNOME 44 cycle.

September 27, 2022

Fractal security audit

Projects that receive funding from NLnet are required to have their code audited for potential security issues. Ours was performed by Radically Open Security, a Non-Profit Computer Security Consultancy from the Netherlands.
Since Fractal, by design, doesn’t include much security critical code the security researcher extended the quick scan somewhat also to the matrix-rust-sdk.

I have been in direct contact with the security researcher and they kept me up-to-date about their findings. This way, I could already during the audit start to fix identified security issues. Luckily, no major security issue was identified.

The issues found were addressed by us in the following way:

  • 4.1CLN-013 — Fractal client stores images containing malware on filesystem

This is mainly a problem of the matrix server that it doesn’t sanitize images. Images downloaded from the server are stored in the encrypted store. This was initially an issue but was resolved. Videos on the other hand are currently downloaded and stored in the cache unencrypted because of this issue.

  • 4.2CLN-012 — Fractal’s markdown implementation hides URLs to possible malicious websites

To address this we now show the full URL when the user hovers a link in the room history. This was introduced in this merge request.

  • 4.3CLN-011 — Fractal allows opening of .html and .htm files

This is a problem with any file downloaded from an untrusted source. The researchers suggested adding a warning dialog to ask if the user is sure they want to open the file. I don’t think adding a warning is sufficient to prevent users from opening files containing malicious code, especially since users often don’t read things and just click continue or end up confused. Also we recommend using Fractal inside a Flatpak sandbox, that uses a portal that asks with which application to use to open the file.

Additionally, we decided to remove the open file button from the room history to make sure that user can’t open them easily by mistake in this merge request.

  • 4.4CLN-010 — Matrix server does not sanitize uploaded images

The matrix server should address this and we can’t really do anything about it locally.

  • 4.5CLN-009 — Images are stored on disk unencrypted

Now all data is stored encrypted. See issue for more details.

  • 4.6CLN-008 — Security impact not sufficiently documented

We documented this in our README in this merge request.

  • 4.7CLN-007 — Sensitive data can be extracted from database

Now all data is stored encrypted. See issue for more details.

  • 4.8CLN-006 — Fractal client supports weak TLS cipher suites

This would be something nice to have, unfortunately currently not possible. See this issue for more details.

  • 4.9CLN-005 — Fractal client is able to connect with insecure TLS versions

See issue 4.9CLN-006.

 

You can read the full report of the security audit here.

Enforcing pull request workflow and green CI status of PRs on Flathub repositories

This blog post was originally posted on Flathub Discourse. Re-posting it here for these sweet sweet fake Internet points publicity.

Starting from 2022-10-04, we’re going to tighten the branch protection settings by disabling direct pushes to protected branches (i.e. master, beta, and the ones starting with branch/) and requiring status checks to pass. This means that all changes will need to go through a regular pull request workflow and require the build tests to pass (i.e. that they be green) on Buildbot before being merged.

As part of this change, we’re introducing two new checks as well. Manifests will be linted with flatpak-builder-lint to ensure compliance with best practices we suggest doing the initial review phase. If your app should be exempted from specific linter rules, please open an issue with an explanation why.

Additionally, if a manifest contains a stanza for flatpak-external-data-checker, it will be validated to ensure update detection works correctly.

September 23, 2022

Introducing Compiano

I previously introduced Minuit. Later I got notified that there was also a music education application for KDE named Minuet. So it was natural to yield the name. It's relatively easy to do when you haven't had a release.

I decided to rename my application Compiano, a portemanteau word for Computer Piano.

Since I talked last about it a lot of time as passed. I ported it to Gtk4, added some libadwaita support to make it more GNOME, reworked some of the UI, and more importantly implemented a mechanism to download the optional "soundbanks" to implement some of the instruments that use the biggest data set.

I have drawn an icon, in Inkscape, which exhibit my poor artistic skills. Icon

I am currently nearing an actual public release, at least as a preview as I expect it to be a situation of "it works on my machine". At least the flatpak should aleviate most of the issues, and I will be submitting it to Flathub.

Here is a screenshot: Main window

Beside the few blockers for the release, there won't much else going into 0.9. I have a list for the next one up to maybe 1.0. This include adding more instrument using LV2. I have an implementation that is glitchy, and I don't want to delay more a release.

I also made a website.

One more thing: the source code is on GNOME gitlab.

September 22, 2022

GNOME Builder 43.0

After about 5 months of keeping myself very busy, GNOME Builder 43 is out!

This is the truly the largest release of Builder yet, with nearly every aspect of the application improved. It’s pretty neat to see all this come together after having spent the past couple years doing a lot more things outside of Builder like modernizing GTKs OpenGL renderer, writing the new macOS GDK backend, shipping a new Text Editor for GNOME, and somehow getting married during all that.

Modern and Expressive Theming

The most noticeable change, of course, is the port to GTK 4. Builder now uses WebKit, VTE, libadwaita, libpanel, GtkSourceView, and many other libraries recently updated to support GTK 4.

Like we did for GNOME Text Editor, Builder will restyle the application window based on the syntax highlighting scheme. In practice this feels much less jarring as you use the application for hours.

a screenshot of the editor with code completion

a screenshot of the syntax color selector

The Foundry

Behind the scenes, the “Foundry” behind Builder has been completely revamped to make better use of SDKs and runtimes. This gives precise control over how processes are created and run. Such control is important when doing development inside container technologies.

Users can now define custom “Commands” which are used to run your project and can be mapped to keyboard shortcuts. This allows for the use of Builder in situations where it traditionally fell short. For example, you can open a project without a build system and use commands to emulate a build system.

a screenshot of creating a new run command

Furthermore, those commands can be used to run your application and integrate with tooling such as the GNU debugger, Valgrind, Sysprof, and more. Controlling how the debugger was spawned has been a long requested feature by users.

a screenshot of the gdb debugger integration, stopped on a breakpoint

You can control what signal is sent to stop your application. I suspect that will be useful for tooling that does cleanup on signals like SIGHUP. It took some work but this is even plugged into “run tools” so things like Sysprof can deliver the signal to the right process.

If you’re using custom run commands to build your project you can now toggle-off installation-before-run and likely still get what you want out of the application. This can be useful for very large projects where you’re working on a small section and want to cheat a little bit.

application preferences

Unit Testing

In previous version of Builder, plugins were responsible for how Unit Tests were run. Now, they also use Run Commands which allows users to run their Unit Tests with the debugger or other tooling.

Keyboard Shortcuts

Keyboard shortcuts were always a sore spot in GTK 3. With the move to GTK 4 we redesigned the whole system to give incredible control to users and plugin authors. Similar to VS Code, Builder has gained support for a format similar to “keybindings.json” which allows for embedding GObject Introspection API scripting. The syntax matches the template engine in Builder which can also call into GObject Introspection.

keyboard shortcuts

Command Bar and Project Search

We’ve unified the Command Bar and Project Search into one feature. Use Ctrl+Enter to display the new Global Search popover.

We do expect this feature to be improved and expanded upon in upcoming releases as some features necessary are still to land within a future GTK release.

A screenshot of the search panel

Movable Panels and Sessions

Panels can be dragged around the workspace window and placed according to user desire. The panel position will persist across future openings of the project.

Additionally, Builder will try to save the state of various pages including editors, terminals, web browsers, directory listings, and more. When you re-open your project with Builder, you can expect to get back reasonably close to where you left off.

Closing the primary workspace will now close the project. That means that the state of secondary workspaces (such as those created for an additional monitor) will be automatically saved and restored the next time the project is launched.

A screenshot of panels rearranged in builder

GtkSourceView

Core editing features have been polished considerably as part of my upstream work on maintaining GtkSourceView. Completion feels as smooth as ever. Interactive tooltips are polished and working nicely. Snippets too have been refined and performance improved greatly.

Not all of our semantic auto-indenters have been ported to GtkSourceIndenter, but we expect them (and more) to come back in time.

There is more work to be done here, particularly around hover providers and what can be placed in hover popovers with expectation that it will not break input/grabs.

Redesigned Preferences

Preferences have been completely redesigned and integrated throughout Builder. Many settings can be tweaked at either the application-level as a default, or on a per-project basis. See “Configure Project” in the new “Build Menu” to see some of those settings. Many new settings were added to allow for more expressive control and others improved open.

Use Ctrl+, to open application preferences, and Alt+, to open your project’s preferences and configurations.

A screenshot showing app preferences vs project preferences

Document Navigation

Since the early versions of Builder, users have requested tabs to navigate documents. Now that we’re on GTK 4 supporting that in a maintainable fashion is trivial and so you can choose between tabs or the legacy “drop down” selector. Navigation tabs are enabled by default.

Some of the UI elements that were previously embedded in the document frame can be found in the new workspace statusbar on the bottom right. Additionally, controls for toggling indentation, syntax, and encoding have been added.

Switching between similar files is easy with Ctrl+Shift+O. You’ll be displayed a popover with similarly named files to the open document.

The symbol tree is also still available, but moved to the statusbar. Ctrl+Shift+K will bring it up and allow for quick searching.

a screenshot of the similar file popover

A screenshot of the symbol selector

WebKit

A new web browser plugin was added allowing you to create new browser tabs using Ctrl+Shift+B. It is minimal in features but can be useful for quick viewing of information or documentation.

Additionally, the html-preview, markdown-preview, and sphinx-preview plugins have been rewritten to build upon this WebKit integration.

Integrated webkit browser within Builder

Plugin Removals

Some features have been removed from Builder due to the complexity and time necessary for a proper redesign or port. The Glade plugin (which targets GTK 3 only) has been removed for obvious reasons. A new designer will replace it and is expected as part of GNOME 44.

Devhelp has also been removed but may return after it moves to supporting GTK 4. Additionally, other tooling may supersede this plugin in time.

The code beautifier and color-picker were also removed and will likely return in a different form in future releases. However, language servers providing format capabilities can be enabled in preferences to format-on-save.

Project Templates

Project templates have been simplified and improved for GTK 4 along with a new look and feel for creating them. You’ll see the new project template workflow from the application greeter by clicking on “Create New Project”.

project creation assistant

Top Matches

Heavy users of code completion will notice a new completion result which contains a large star (★) next to it. This indicates that the proposal is a very close match for the typed text and is getting resorted to the top of the completion results. This serves as an alternative to sorting among completion providers which is problematic due to lack of common scoring algorithms across different data sources.

a screenshot of top matches support

Sysprof Integration

Tooling such as Sysprof went through a lot of revamp too. As part of this process I had to port Sysprof to GTK 4 which was no small task in it’s own right.

Additionally, I created new tooling in the form of sysprof-agent which allows us to have more control when profiling across container boundaries. Tools which need to inject LD_PRELOAD (such as memory profilers) now work when combined with an appropriate SDK.

A screenshot of sysprof integration

Language Servers

Language servers have become a part of nearly everyone’s development toolbox at this point. Builder is no different. We’ve added support for a number of new language servers including jdtls (Java), bash-language-server (Bash), gopls (Golang) and improved many others such as clangd (C/C++), jedi-language-server (Python), ts-language-server (JavaScript/Typescript), vls (Vala), rust-analyzer (Rust), blueprint, and intelephense (PHP).

Many language servers are easier to install and run given the new design for how cross-container processes are spawned.

A screenshot of the rust-analyzer language server providing completion results

Quick Settings

From the Run Menu, many new quick settings are available to tweak how the application runs as well as well as configure tooling.

For example, you can now toggle various Valgrind options from the Leak Detector sub-menu. Sysprof integration also follows suit here by allowing you to toggle what instruments will be used when recording system state.

To make it easier for developers to ensure their software is inclusive, we’ve added options to toggle High Contrast themes, LTR vs RTL, and light vs dark styling.

A screenshot of the build menu

Refactory

For language tooling that supports it, you can do things like rename symbols. This has been in there for years, but few knew about it. We’ve elevated the visibility a bit now in the context menu.

Renaming a symbol using clang-rename

Vim Emulation

In GTK 3, we were very much stuck with deep hacks to make something that looked like Vim work. Primarily because we wanted to share as much of the movements API with other keybinding systems.

That changed with GtkSourceView 5. Part of my upstream maintainer work on GtkSourceView included writing a new Vim emulator. It’s not perfect, by any means, but it does cover a majority of what I’ve used in more than two decades as a heavy Vim user. It handles registers, marks, and tries to follow some of the same pasteboard semantics as Vim (“+y for system clipboard, for example).

I made this available in GNOME Text Editor for GNOME 42 as well. Those that wonder why we didn’t an external engines to synchronize with, can read the code to find out.

Plugins

We have been struggling with our use of PyGObject for sometime. It’s a complex and difficult integration project and I felt like I spent more time debugging issues than I was comfortable with. So this port also included a rewrite of every Python-based plugin to C. We still enable the Python 3 plugin loader from libpeas (for third-party plugins), but in the future we may switch to another plugin language.

Maintainers Corner

So…

A special thanks to all those that sent me merge requests, modernized bits of software I maintain, fixed bugs, or sent words of encouragement.

I’m very proud of where we’ve gotten. However, it’s been an immense amount of work. Builder could be so much more than it is today with your help with triage of bugs, designing and writing features, project and product management, writing documentation, maintaining plugins, improving GNOME OS, and everything in-between.

The biggest lesson of this cycle is how a strong design language is transformative. I hope Builder’s transformation serves as an example for other GNOME applications and the ecosystem at large. We can make big leaps in short time if we have the right tooling and vision.

September 21, 2022

Came Full Circle

As mentioned in the previous post I’ve been creating these short pixel art animations for twitter and mastodon to promote the lovely apps that sprung up under the umbrella of the GNOME Circle project.

I was surprised the video gets actually quite long. It’s true that a little something every day really adds up. The music was composed on the Dirtywave M8 and the composite and sequence assembled in Blender.

GNOME Circle Pixels

Please take the time to enjoy this fullscreen for that good’ol CRT feel. If you’re a maintainer or contributer to any of the apps featured, thank you!

Status update 21/09/22

Last week I attended OSSEU 2022 in Dublin, gave a talk about BuildStream 2.0 and the REAPI, and saw some new and old faces. Good times apart from the common cold I picked up on the way — I was glad that the event mandated face-masks for everyone so I could cover my own face without being the “odd one out”. (And so that we were safer from the 3+ COVID-19 cases reported at the event).

Being in the same room as Javier allowed some progress on our slightly “skunkworks” project to bring OpenQA testing to upstream GNOME. There was enough time to fix the big regressions that had halted testing completely since last year, one being an expired API key and the other, removal of virtio VGA support in upstream’s openqa_worker container. We prefer using the upstream container over maintaining our own fork, in the hope that our limited available time can go on maintaining tests instead, but the containers are provided on a “best effort” basis and since our tests are different to openqa.opensuse.org, regressions like this are to be expected.

I am also hoping to move the tests out of gnome-build-meta into a separate openqa-tests repo. We initially put them in gnome-build-meta because ultimately we’d like to be able to do pre-merge testing of gnome-build-meta branches, but since it takes hours to produce an ISO image from a given commit, it is painfully slow to create and update the OpenQA tests themselves. Now that Gitlab supports child pipelines, we can hopefully satisfy both use cases: one pipeline that quickly runs tests against the prebuilt “s3-image” from os.gnome.org, and a second that is triggered for a specific gnome-build-meta build pipeline and validates that.

First though, we need to update all the existing tests for the visual changes that occurred in the meantime, which are mostly due to gnome-initial-setup now using GTK4. That’s still a slow process as there are many existing needles (screenshots), and each time the tests are run, the Web UI allows updating only the first one to fail. That’s something else we’ll need to figure out before this could be called “production ready”, as any non-trivial style change to Adwaita would imply rerunning this whole update process.

All in all, for now openqa.gnome.org remains an interesting experiment. Perhaps by GUADEC next year there may be something more useful to report.

Team Codethink in the OSSEU 2022 lobby

My main fascination this month besides work has been exploring “AI” image generation. It’s amazing how quickly this technology has spread – it seems we had a big appetite for generative digital images.

I am really interested in the discussion about whether such things are “art”, because I this discussion is soon going to encompass music as well. We know that both OpenAI and Spotify are researching machine-generated music, and it’s particularly convenient for Spotify if they can continue to charge you £10 a month while progressively serving you more music that they generated in-house – and therefore reducing their royalty payments to record labels.

There are two related questions: whether AI-generated content is art, and whether something generated by an AI has the same monetary value as something a human made “by hand”. In my mind the answer is clear, but at the same time not quantifiable. Art is a form of human communication. Whether you use a neural network, a synthesizer, a microphone or a wax cylinder to produce that art is not relevant. Whether you use DALL-E 2 or a paintbrush is not relevant. Whether your art is any good depends on how it makes people feel.

I’ve been using Stable Diffusion to try and illustrate some of sound worlds from my songs, and my favourite results so far are for Don’t Go Into The Zone:

And finally, a teaser for an upcoming song release…

An elephant with a yellow map background