September 21 2021

Experimental binary Gentoo package hosting (amd64)

Andreas K. Hüttel (dilfridge) September 21, 2021, 16:34

♦As an experiment, I've started assembling a simple binary package hosting mechanism for Gentoo. Right now this comes with some serious limitations and should not be used for security or mission critical applications (more on this below). The main purpose of this experiment is to find out how well it works and where we need improvements in Portage's binary package handling.

So what do we have, and how can you use it?

  • The server builds an assortment of stable amd64 packages, with the use-flags as present in an unmodified 17.1/desktop/plasma/systemd profile (the only necessary change is USE=bindist).
  • The packages can be used on all amd64 profiles that differ from desktop/plasma/systemd only by use-flag settings. This includes 17.1, 17.1/desktop/*, 17.1/no-multilib, 17.1/systemd, but not anything containing selinx, hardened, developer, musl, or a different profile version such as 17.0.
  • Right now, the package set includes kde-plasma/plasma-meta, kde-apps/kde-apps-meta, app-office/libreoffice, media-gfx/gimp, media-gfx/inkscape, and of course all their dependencies. More will possibly be added.
  • CFLAGS are chosen such that the packages will be usable on all amd64 (i.e., x86-64) machines. 

To use the packages, I recommend the following steps: First, create a file /etc/portage/binrepos.conf with the following content:

[binhost]
priority = 9999
sync-uri = gentoo.osuosl.org/experimental/amd64/binpkg/default/linux/17.1/x86-64/

You can pick a different mirror according to your preferences (but also see the remarks below). Then, edit /etc/portage/make.conf, and add the following EMERGE_DEFAULT_OPTS (in addition to flags that you might already have there):

EMERGE_DEFAULT_OPTS="--binpkg-respect-use=y --getbinpkg=y"

And that's it. Your next update should download the package index and use binary packages whenever the versions and use-flag settings match. Everything else is compiled as usual.

What is still missing, and what are the limitations and caveats?

  • Obviously, the packages are not optimized for your processor.
  • Right now, the server only carries packages for the use-flag settings in an unmodified 17.1/desktop/plasma/systemd profile. If you use other settings, you will end up compiling part of your packages (which is not really a probem, you just lose the benefit of the binary download). It is technically possible to provide binary packages for different use-flag settings at the same URL, and eventually it will be implemented if this experiment succeeds.
  • At the moment, no cryptographic signing of the binary packages is in place yet. This is the main reason why I'm talking about an experiment. Effectively you trust our mirror admins and the https protocol. Package signing and verification is in preparation, and before the binary package hosting "moves into production", it will be enforced.
That's it. Enjoy! And don't forget to leave feedback in the comments.

As an experiment, I've started assembling a simple binary package hosting mechanism for Gentoo. Right now this comes with some serious limitations and should not be used for security or mission critical applications (more on this below). The main purpose of this experiment is to find out how well it works and where we need improvements in Portage's binary package handling.

So what do we have, and how can you use it?

  • The server builds an assortment of stable amd64 packages, with the use-flags as present in an unmodified 17.1/desktop/plasma/systemd profile (the only necessary change is USE=bindist).
  • The packages can be used on all amd64 profiles that differ from desktop/plasma/systemd only by use-flag settings. This includes 17.1, 17.1/desktop/*, 17.1/no-multilib, 17.1/systemd, but not anything containing selinx, hardened, developer, musl, or a different profile version such as 17.0.
  • Right now, the package set includes kde-plasma/plasma-meta, kde-apps/kde-apps-meta, app-office/libreoffice, media-gfx/gimp, media-gfx/inkscape, and of course all their dependencies. More will possibly be added.
  • CFLAGS are chosen such that the packages will be usable on all amd64 (i.e., x86-64) machines. 

To use the packages, I recommend the following steps: First, create a file /etc/portage/binrepos.conf with the following content:

[binhost]
priority = 9999
sync-uri = https://gentoo.osuosl.org/experimental/amd64/binpkg/default/linux/17.1/x86-64/

You can pick a different mirror according to your preferences (but also see the remarks below). Then, edit /etc/portage/make.conf, and add the following EMERGE_DEFAULT_OPTS (in addition to flags that you might already have there):

EMERGE_DEFAULT_OPTS="--binpkg-respect-use=y --getbinpkg=y"

And that's it. Your next update should download the package index and use binary packages whenever the versions and use-flag settings match. Everything else is compiled as usual.

What is still missing, and what are the limitations and caveats?

  • Obviously, the packages are not optimized for your processor.
  • Right now, the server only carries packages for the use-flag settings in an unmodified 17.1/desktop/plasma/systemd profile. If you use other settings, you will end up compiling part of your packages (which is not really a probem, you just lose the benefit of the binary download). It is technically possible to provide binary packages for different use-flag settings at the same URL, and eventually it will be implemented if this experiment succeeds.
  • At the moment, no cryptographic signing of the binary packages is in place yet. This is the main reason why I'm talking about an experiment. Effectively you trust our mirror admins and the https protocol. Package signing and verification is in preparation, and before the binary package hosting "moves into production", it will be enforced.
That's it. Enjoy! And don't forget to leave feedback in the comments.

August 16 2021

The stablereq workflow for Python packages

Michał Górny (mgorny) August 16, 2021, 13:07

I have been taking care of periodic mass stabilization of Python packages in Gentoo for some time already. Per Guilherme Amadio‘s suggestion, I’d like to share the workflow I use for this. I think it could be helpful to others dealing with large sets of heterogeneous packages.

The workflow requires:

– app-portage/mgorny-dev-scripts, v10
– dev-util/pkgcheck

Grabbing candidate list from pkgcheck

One of the features of pkgcheck is that it can report ebuilds that haven’t been changed in 30 days and therefore are due for stabilization. This isn’t perfect but in the end, it gets the job done.

I start by opening two terminals side-by-side and entering the clone of ::gentoo on both. On one of them, I run:

stablereq-eshowkw 'dev-python/*'

On the other, I do:

stablereq-find-pkg-bugs 'dev-python/*'
stablereq-make-list 'dev-python/*'



This gets me three things:

1. An open Bugzilla search for all stabilization candidates.
2. A script to call file-stablereq for all stabilization candidates open in the editor.
3. eshowkw output for all stabilization candidates in the other terminal.

The three scripts pass their arguments through to pkgcheck. Instead of passing package specifications directly, you can use a simple pipeline to grab all packages with a specific maintainer:

git grep -l [email protected] '**/metadata.xml' | cut -d/ -f1-2 | xargs stablereq-eshowkw
Filtering the candidate list

The candidate list given by pkgcheck is pretty rough. Now it’s time to mangle it a bit.

For a start, I go through the eshowkw list to see if the packages have any newer versions that can be stabilized. Roughly speaking, I ignore all packages that have only one stabilization candidate and I check the rest.

Checking usually means looking at git log and/or pkgdiff to see if a newer version would not be a better stabilization candidate. I update the list in the editor accordingly, either changing the desired version or removing some packages altogether (e.g. because they are release candidates or to go straight for a newer version later).

I close the eshowkw results then and do the next round of filtering via Bugzilla search. I look at the Bugzilla search for bugs affecting the stabilization candidates. Once again, I update the list accordingly. Most importantly, this means removing packages that have their stablereq filed already. This is also a good opportunity to resolve obsolete bugs.

I close the search result tabs but leave the browser open (e.g. with an empty tab) for the next step.

Filing the stablereqs

Now I save the list into a file, and run it via shell. This generally involves a lot of file-stablereq calls that open lots of browser tabs with pre-filled stablereqs. I suppose it would be much better to use Bugzilla API to file bugs directly but I’ve never gotten around to implement that.

I use bug-assign-user-js to assign the bugs, then submit them. With some skill, you can do it pretty fast. Point the mouse at the ‘A’ box for the package, click, shift-tab, enter, ctrl-tab, repeat.

If everything went correctly, you get a lot of new bugs filed. Now it’s a good time to look into your e-mail client and mark the mails for newly filed bugs read, before NATTkA starts processing them.

Post-processing the bugs

The last step is to go through bug mail resulting from NATTkA operations.

If sanity check fails, it is necessary to either add dependencies on other bugs already filed, add additional packages to the package list or file additional stablereqs.

For more complex problems, app-portage/nattka 0.2.15 provides a nattka make-package-list -s CATEGORY/PACKAGE-VERSION subcommand that can prepare a package list with dependencies. However, note that it unconditionally takes newest versions available, so you will need to verify the result and replace versions whenever necessary.

Additionally, I generally look at ALLARCHES keyword being added to bugs. If a bug is missing it, I verify whether the package is suitable, and add <stabilize-allarches/> to its metadata.xml.

I have been taking care of periodic mass stabilization of Python packages in Gentoo for some time already. Per Guilherme Amadio‘s suggestion, I’d like to share the workflow I use for this. I think it could be helpful to others dealing with large sets of heterogeneous packages.

The workflow requires:

app-portage/mgorny-dev-scripts, v10
dev-util/pkgcheck

Grabbing candidate list from pkgcheck

One of the features of pkgcheck is that it can report ebuilds that haven’t been changed in 30 days and therefore are due for stabilization. This isn’t perfect but in the end, it gets the job done.

I start by opening two terminals side-by-side and entering the clone of ::gentoo on both. On one of them, I run:

stablereq-eshowkw 'dev-python/*'

On the other, I do:

stablereq-find-pkg-bugs 'dev-python/*'
stablereq-make-list 'dev-python/*'


Screenshot of desktop with the described three windows open

This gets me three things:

1. An open Bugzilla search for all stabilization candidates.
2. A script to call file-stablereq for all stabilization candidates open in the editor.
3. eshowkw output for all stabilization candidates in the other terminal.

The three scripts pass their arguments through to pkgcheck. Instead of passing package specifications directly, you can use a simple pipeline to grab all packages with a specific maintainer:

git grep -l [email protected] '**/metadata.xml' | cut -d/ -f1-2 | xargs stablereq-eshowkw

Filtering the candidate list

The candidate list given by pkgcheck is pretty rough. Now it’s time to mangle it a bit.

For a start, I go through the eshowkw list to see if the packages have any newer versions that can be stabilized. Roughly speaking, I ignore all packages that have only one stabilization candidate and I check the rest.

Checking usually means looking at git log and/or pkgdiff to see if a newer version would not be a better stabilization candidate. I update the list in the editor accordingly, either changing the desired version or removing some packages altogether (e.g. because they are release candidates or to go straight for a newer version later).

I close the eshowkw results then and do the next round of filtering via Bugzilla search. I look at the Bugzilla search for bugs affecting the stabilization candidates. Once again, I update the list accordingly. Most importantly, this means removing packages that have their stablereq filed already. This is also a good opportunity to resolve obsolete bugs.

I close the search result tabs but leave the browser open (e.g. with an empty tab) for the next step.

Filing the stablereqs

Now I save the list into a file, and run it via shell. This generally involves a lot of file-stablereq calls that open lots of browser tabs with pre-filled stablereqs. I suppose it would be much better to use Bugzilla API to file bugs directly but I’ve never gotten around to implement that.

I use bug-assign-user-js to assign the bugs, then submit them. With some skill, you can do it pretty fast. Point the mouse at the ‘A’ box for the package, click, shift-tab, enter, ctrl-tab, repeat.

If everything went correctly, you get a lot of new bugs filed. Now it’s a good time to look into your e-mail client and mark the mails for newly filed bugs read, before NATTkA starts processing them.

Post-processing the bugs

The last step is to go through bug mail resulting from NATTkA operations.

If sanity check fails, it is necessary to either add dependencies on other bugs already filed, add additional packages to the package list or file additional stablereqs.

For more complex problems, app-portage/nattka 0.2.15 provides a nattka make-package-list -s CATEGORY/PACKAGE-VERSION subcommand that can prepare a package list with dependencies. However, note that it unconditionally takes newest versions available, so you will need to verify the result and replace versions whenever necessary.

Additionally, I generally look at ALLARCHES keyword being added to bugs. If a bug is missing it, I verify whether the package is suitable, and add <stabilize-allarches/> to its metadata.xml.

July 25 2021

Getting DTS 5.1+ sound via S/PDIF or HDMI using PulseAudio

Michał Górny (mgorny) July 25, 2021, 17:16

While PCs still usually provide a full set of analog jacks capable of outputting a 5.1 audio, other modern hardware (such as TVs) is usually limited to digital audio outputs (and sometimes analog outputs limited to stereo sound). These outputs are either S/PDIF (coaxial or optical) or HDMI. When the PC is connected to a TV, a pretty logical setup is to carry the sound via HDMI to the TV, and from there via S/PDIF or HDMI ARC to a 5.1 amplifier. However, it isn’t always as simple as it sounds.

For a start, S/PDIF is a pretty antiquated interface originally designed to carry stereo PCM audio. The modern versions of the interface have sufficient bandwidth for up to 192 kHz sampling rate and up to 24 bit audio depth. However, in order to support more than two audio channels, the transmitted sound needs to be compressed. S/PDIF hardware usually supports MPEG, AC3 and DTS formats.

HDMI is better there. HDMI 1.2 technically supports up to 8 channels of PCM audio, 2.0 up to 32 channels. However, not all hardware actually supports that. In particular, my TV seems to only support stereo PCM input, and ignores additional channels when passed 5.1 audio. Fortunately, additional audio channels work when compressed input is used. HDMI supports more audio formats, including DTS-HD MA and TrueHD.

In this post, I’d like to shortly explore our options for making a PulseAudio-enabled Linux system output compressed 5.1 over S/PDIF or HDMI (apparently both are treated the same from ALSA/PulseAudio perspective).

Enabling S/PDIF / HDMI passthrough in mpv

It’s rather unlikely that you’ll be playing uncompressed audio these days. When playing movies, you’ll often find that the audio tracks are encoded using one of the formats supported by S/PDIF or HDMI. Rather than having mpv decode them just to have ALSA compress them again (naturally with a quality loss), why not pass the encoded audio through to the output?

If you’re using HDMI, the first prerequisite is to set the PulseAudio’s configuration profile to digital stereo (found on Configuration tab of pavucontrol). This could be a bit confusing but it actually enables you to transfer compressed surround sound. Of course, this implies that you’ll no longer be able to output surround PCM sound via HDMI but if you’re going to enable compressed audio output anyway, it doesn’t matter.

Then, you need to enable support for additional output formats. If you’re using pavucontrol, the relevant checkboxes can be found on Output Devices tab, hidden under Advanced. Tick off all that your connected device supports (usually all).

Finally, you have to enable S/PDIF passthrough (the same option is used for HDMI) in mpv, via ~/.config/mpv/mpv.conf:

audio-spdif=ac3,dts,eac3
audio-channels=5.1

The full list of formats can be found in mpv(1) manpage.

If everything works fine, you’re going to see something like the following in mpv output:

AO: [alsa] 48000Hz stereo 2ch spdif-ac3

(ignore the stereo part, it is shown like this when passing compressed surround sound through)

Note that audio passthrough requires exclusive access to the sound card, i.e. you won’t be able to use it simultaneously with sound from other apps.

Enabling transparent AC3/DTS compression of audio output

While passthrough is often good enough for watching movies, it is not a universal solution. If, say, you’d like to play a game with surround sound, you need the regular audio output to support it. Fortunately, there is a relatively easy way to use ALSA plugins to enable transparent compression and make your S/PDIF / HDMI output 5.1-friendly.

For a start, you need to install an appropriate ALSA plugin. If you’d like to use AC3 audio, the plugin is found in media-plugins/alsa-plugins[ffmpeg]. For DTS audio, the package is media-sound/dcaenc[alsa].

The next step is adding the appropriate configuration to /etc/asound.conf. The snippet for AC3 is:

pcm.a52 {
  @args [CARD]
  @args.CARD {
    type string
  }
  type rate
  slave {
    pcm {
      type a52
      bitrate 448
      channels 6
      card $CARD
    }
  rate 48000
  }
}

The version modified for DTS is:

pcm.dca {
  @args [CARD]
  @args.CARD {
    type string
  }
  type rate
  slave {
    pcm {
      type dca
      channels 6
      card $CARD
    }
  rate 48000
  }
}

Honestly, it’s some black magic how it works but somehow PulseAudio just picks it up and starts accepting 5.1 sound, and the TV happily plays it.

Finally, the Ubuntu Community wiki suggests explicitly setting sampling rate in PA to avoid compatibility issues. In /etc/pulse/daemon.conf:

default-sample-rate = 48000
References
  • The Well-Tempered Computer: S/PDIF
  • Wikipedia: HDMI
  • Kodi Wiki: PulseAudio
  • Reddit: HD Audio HDMI passthrough setup
  • Ubuntu Community Help Wiki; DigitalAC-3Pulseaudio

While PCs still usually provide a full set of analog jacks capable of outputting a 5.1 audio, other modern hardware (such as TVs) is usually limited to digital audio outputs (and sometimes analog outputs limited to stereo sound). These outputs are either S/PDIF (coaxial or optical) or HDMI. When the PC is connected to a TV, a pretty logical setup is to carry the sound via HDMI to the TV, and from there via S/PDIF or HDMI ARC to a 5.1 amplifier. However, it isn’t always as simple as it sounds.

For a start, S/PDIF is a pretty antiquated interface originally designed to carry stereo PCM audio. The modern versions of the interface have sufficient bandwidth for up to 192 kHz sampling rate and up to 24 bit audio depth. However, in order to support more than two audio channels, the transmitted sound needs to be compressed. S/PDIF hardware usually supports MPEG, AC3 and DTS formats.

HDMI is better there. HDMI 1.2 technically supports up to 8 channels of PCM audio, 2.0 up to 32 channels. However, not all hardware actually supports that. In particular, my TV seems to only support stereo PCM input, and ignores additional channels when passed 5.1 audio. Fortunately, additional audio channels work when compressed input is used. HDMI supports more audio formats, including DTS-HD MA and TrueHD.

In this post, I’d like to shortly explore our options for making a PulseAudio-enabled Linux system output compressed 5.1 over S/PDIF or HDMI (apparently both are treated the same from ALSA/PulseAudio perspective).

Enabling S/PDIF / HDMI passthrough in mpv

It’s rather unlikely that you’ll be playing uncompressed audio these days. When playing movies, you’ll often find that the audio tracks are encoded using one of the formats supported by S/PDIF or HDMI. Rather than having mpv decode them just to have ALSA compress them again (naturally with a quality loss), why not pass the encoded audio through to the output?

pavuconfig configuration tab

If you’re using HDMI, the first prerequisite is to set the PulseAudio’s configuration profile to digital stereo (found on Configuration tab of pavucontrol). This could be a bit confusing but it actually enables you to transfer compressed surround sound. Of course, this implies that you’ll no longer be able to output surround PCM sound via HDMI but if you’re going to enable compressed audio output anyway, it doesn’t matter.

pavuconfig output devices tab

Then, you need to enable support for additional output formats. If you’re using pavucontrol, the relevant checkboxes can be found on Output Devices tab, hidden under Advanced. Tick off all that your connected device supports (usually all).

Finally, you have to enable S/PDIF passthrough (the same option is used for HDMI) in mpv, via ~/.config/mpv/mpv.conf:

audio-spdif=ac3,dts,eac3
audio-channels=5.1

The full list of formats can be found in mpv(1) manpage.

If everything works fine, you’re going to see something like the following in mpv output:

AO: [alsa] 48000Hz stereo 2ch spdif-ac3

(ignore the stereo part, it is shown like this when passing compressed surround sound through)

Note that audio passthrough requires exclusive access to the sound card, i.e. you won’t be able to use it simultaneously with sound from other apps.

Enabling transparent AC3/DTS compression of audio output

While passthrough is often good enough for watching movies, it is not a universal solution. If, say, you’d like to play a game with surround sound, you need the regular audio output to support it. Fortunately, there is a relatively easy way to use ALSA plugins to enable transparent compression and make your S/PDIF / HDMI output 5.1-friendly.

For a start, you need to install an appropriate ALSA plugin. If you’d like to use AC3 audio, the plugin is found in media-plugins/alsa-plugins[ffmpeg]. For DTS audio, the package is media-sound/dcaenc[alsa].

The next step is adding the appropriate configuration to /etc/asound.conf. The snippet for AC3 is:

pcm.a52 {
  @args [CARD]
  @args.CARD {
    type string
  }
  type rate
  slave {
    pcm {
      type a52
      bitrate 448
      channels 6
      card $CARD
    }
  rate 48000
  }
}

The version modified for DTS is:

pcm.dca {
  @args [CARD]
  @args.CARD {
    type string
  }
  type rate
  slave {
    pcm {
      type dca
      channels 6
      card $CARD
    }
  rate 48000
  }
}

Honestly, it’s some black magic how it works but somehow PulseAudio just picks it up and starts accepting 5.1 sound, and the TV happily plays it.

Finally, the Ubuntu Community wiki suggests explicitly setting sampling rate in PA to avoid compatibility issues. In /etc/pulse/daemon.conf:

default-sample-rate = 48000

References

July 20 2021

Additional stage downloads for amd64, ppc, x86, arm available

Gentoo News (GentooNews) July 20, 2021, 5:00

Following some technical reorganization and the introduction of new hardware, the Gentoo Release Engineering team is happy to offer a much-expanded set of stage files for download. Highlights are in particular the inclusion of musl-based stages and of POWER9-optimized ppc64 downloads, as well as additional systemd-based variants for many architectures.

For amd64, Hardened/SELinux stages are now available directly from the download page, as are stages based on the lightweight C standard library musl. Note that musl requires using the musl overlay, as described on the page of the Hardened musl project.

For ppc, little-endian stages optimized for the POWER9 CPU series have been added, as have been big- and little-endian Hardened musl downloads.

Additionally, for all of amd64, ppc64, x86, and arm, stages are now available in both an OpenRC and a systemd init system / service manager variant wherever that makes sense.

This all has become possible via the introduction of new build hosts. The amd64, x86 (natively), arm (via QEMU), and riscv (via QEMU) archives are built on an AMD Ryzen™ 7 3700X 8-core machine with 64GByte of RAM, located in Hetzner’s Helsinki datacentre. The ppc, ppc64, and ppc64le / power9le builds are handled by two 16-core POWER9 machines with 32GByte of RAM, provided by OSUOSL POWER Development Hosting.

Further, at the moment an arm64 (aka aarch64) machine with an 80-core Ampere Altra CPU and 256GByte of RAM, provided by Equinix through the Works On Arm program, is being prepared for improved native arm64 and arm support, so expect updates there soon!

Gentoo logo

Following some technical reorganization and the introduction of new hardware, the Gentoo Release Engineering team is happy to offer a much-expanded set of stage files for download. Highlights are in particular the inclusion of musl-based stages and of POWER9-optimized ppc64 downloads, as well as additional systemd-based variants for many architectures.

For amd64, Hardened/SELinux stages are now available directly from the download page, as are stages based on the lightweight C standard library musl. Note that musl requires using the musl overlay, as described on the page of the Hardened musl project.

For ppc, little-endian stages optimized for the POWER9 CPU series have been added, as have been big- and little-endian Hardened musl downloads.

Additionally, for all of amd64, ppc64, x86, and arm, stages are now available in both an OpenRC and a systemd init system / service manager variant wherever that makes sense.

This all has become possible via the introduction of new build hosts. The amd64, x86 (natively), arm (via QEMU), and riscv (via QEMU) archives are built on an AMD Ryzen™ 7 3700X 8-core machine with 64GByte of RAM, located in Hetzner’s Helsinki datacentre. The ppc, ppc64, and ppc64le / power9le builds are handled by two 16-core POWER9 machines with 32GByte of RAM, provided by OSUOSL POWER Development Hosting.

Further, at the moment an arm64 (aka aarch64) machine with an 80-core Ampere Altra CPU and 256GByte of RAM, provided by Equinix through the Works On Arm program, is being prepared for improved native arm64 and arm support, so expect updates there soon!

June 16 2021

The ultimate guide to EAPI 8

Michał Górny (mgorny) June 16, 2021, 22:23

Three years ago, I had the pleasure of announcing EAPI 7 as a major step forward in our ebuild language. It introduced preliminary support for cross-compilation, it finally provided good replacements for the last Portagisms in ebuilds and it included many small changes that made ebuilds simpler.

Only a year and a half later, I have started working on the initial EAPI 8 feature set. Similarly to EAPI 6, EAPI 8 was supposed to focus on small changes and improvements. The two killer features listed below were already proposed at the time. I have prepared a few patches to the specification, as well as the initial implementation of the respective features for Portage. Unfortunately, the work stalled at the time.

Finally, as a result of surplus of free time last month, I was able to resume the work. Along with Ulrich Müller, we have quickly prepared the EAPI 8 feature set, got it pre-approved, prepared the specification and implemented all the features in Portage and pkgcore. Last Sunday, the Council has approved EAPI 8 and it’s now ready for ~arch use.

What’s there in EAPI 8? Well, for a start we have install-time dependencies (IDEPEND) that fill a gap in our cross-compilation design. Then, selective fetch/mirror restriction make it easier to combine proprietary and free distfiles in a single package. PROPERTIES and RESTRICT are now accumulated across eclasses reducing confusion for eclass writers. There’s dosym -r to create relative symlinks conveniently from dynamic paths. Plus bunch of other improvements, updates and cleanups.

Read the full article

Three years ago, I had the pleasure of announcing EAPI 7 as a major step forward in our ebuild language. It introduced preliminary support for cross-compilation, it finally provided good replacements for the last Portagisms in ebuilds and it included many small changes that made ebuilds simpler.

Only a year and a half later, I have started working on the initial EAPI 8 feature set. Similarly to EAPI 6, EAPI 8 was supposed to focus on small changes and improvements. The two killer features listed below were already proposed at the time. I have prepared a few patches to the specification, as well as the initial implementation of the respective features for Portage. Unfortunately, the work stalled at the time.

Finally, as a result of surplus of free time last month, I was able to resume the work. Along with Ulrich Müller, we have quickly prepared the EAPI 8 feature set, got it pre-approved, prepared the specification and implemented all the features in Portage and pkgcore. Last Sunday, the Council has approved EAPI 8 and it’s now ready for ~arch use.

What’s there in EAPI 8? Well, for a start we have install-time dependencies (IDEPEND) that fill a gap in our cross-compilation design. Then, selective fetch/mirror restriction make it easier to combine proprietary and free distfiles in a single package. PROPERTIES and RESTRICT are now accumulated across eclasses reducing confusion for eclass writers. There’s dosym -r to create relative symlinks conveniently from dynamic paths. Plus bunch of other improvements, updates and cleanups.

Read the full article

June 03 2021

Retiring the multilib project

Michał Górny (mgorny) June 03, 2021, 20:41

I created the Multilib project back in November 2013 (though the effort itself started roughly a year earlier) with the goal of maintaining the multilib eclasses and porting Gentoo packages to them. Back in the day, we were even requested to co-maintain a few packages whose maintainers were opposed to multilib ports. In June 2015, last of the emul-linux-x86 packages were removed and our work has concluded.

The project continued to exist for the purpose of maintaining the eclasses and providing advice. Today, I can say that the project has served its purpose and it is time to retire it. Most of the team members have already left, the multilib knowledge that we advised on before is now common developer knowledge. I am planning to take care of the project-maintained eclasses personally, and move the relevant documentation to the general wiki space.

At the same time, I would like to take this opportunity to tell the history of our little multilib project.

Gentoo before gx86-multilib

In the old days, the multilib as seen by the majority of Gentoo users consisted of two components: multilib toolchain packages and emul-linux-x86 packages.

The toolchain multilib support exists pretty much in its original form to this day. It consists of a multilib USE flag and an ABI list stored in the selected profile. The rough idea is that bootstrapping a toolchain with a superset of its current ABIs is non-trivial, so the users generally choose a particular multilib or non-multilib variant when installing Gentoo, and do not change it afterwards. The multilib project didn’t really touch this part.

The emul-linux-x86 packages were specifically focused on non-toolchain packages. Back in the day, they consisted of a few sets of precompiled 32-bit libraries for amd64. If you needed to run a proprietary 32-bit app or compile wine, they had to depend on a few sets, e.g.:

amd64? (
    app-emulation/emul-linux-x86-xlibs
    app-emulation/emul-linux-x86-soundlibs
)

The sets generally included the current stable versions of packages and were rebuilt every few months.

Simultaneously, an alternative to this solution was developed (and is being developed to this day): multilib-portage, a Portage fork that was designed specifically to build all packages for multiple ABIs. Unlike the other solutions, multilib-portage minimized development effort and worked on top of regular Gentoo packages. However, it never reached production readiness.

The gx86-multilib design

The gx86-multilib effort was intended to provide a multilib solution entirely within the scope of the Gentoo repository (still named gentoo-x86 at the time, hence the name), i.e. without having to modify EAPI or package managers. It was supposed to be something between emul-linux-x86 and multilib-portage, building all non-native libraries from source but requiring explicit support from packages.

It only seemed natural to utilize USE_EXPAND the same way as PYTHON_TARGETS did for Python. At the same time, splitting ABIs per architecture made it possible to use USE_EXPAND_HIDDEN to hide irrelevant flags from users. So e.g. amd64 multilib users see only ABI_X86, PPC64 users see ABI_PPC and so on.

The default ABI for a given platform is always forced on. This made it possible to keep things working for non-multilib packages without adding any multilib awareness to them, and at the same time cleanly handle profiles that do not do multilib at all. Multilib packages use ${MULTILIB_USEDEP} to enforce ABI match on their multilib depdencies; non-multilib packages just use plain deps and can expect the native ABI to be always enabled.

Eclasses were a natural place to implement all this logic. In the end, they formed a hierarchical structure. The pre-existing multilib.eclass already provided a few low-level functions needed to set up multilib builds. On top of it, multilib-build.eclass was created that provided low-level functions specific to the gx86-multilib — handling USE flags, running the builds and some specific helper functions. On top of it, high-level sub-phase-based multilib-minimal.eclass was created that made writing generic ebuilds easy. Then, on top of that the specific autotools-multilib.eclass and cmake-multilib.eclass existed.

Historically, the order was a little different. autotools-multilib.eclass came first. Then, the common logic was split into multilib-build.eclass and cmake-multilib.eclass came to be. Finally, multilib-minimal.eclass was introduced and a few months later the other eclasses started reusing it.

The reception and porting efforts

The eclasses had a mixed reception. They followed my philosophy of getting things done, today. This disagreed with purists who believed we should look for a better solution. Many of the developers believed that multilib-portage was the way forward (after all, it did not require changing ebuilds), though they did not seem to be that much concerned about having a clear plan of action. When I’ve pointed out that things need to be formally specified, the answer was roughly to dump whatever’s in multilib-portage at the time into the spec. As you can guess, no spec was ever written.

Nevertheless, porting ebuilds to the new framework proceeded over time. In some cases, we had to deal with varying level of opposition. In the most extreme cases, we had to work out a compromise and become co-maintainers of these packages in order to provide direct support for any port-related problems. However, as time went by more people joined the cause, and today it is pretty natural that maintainers add multilib support themselves. In fact, I believe that things went a bit out of control, as multilib is being added to packages where it is not really needed.

In its early years, gx86-multilib had to coexist with the older emul-linux-x86 packages. Since both groups of packages installed the same files, collisions were inevitable. Every few converted packages, we had to revbump the respective emul-linux-x86 sets dropping the colliding libraries. Later on, we had to start replacing old dependencies on emul-linux-x86 packages (now metapackages) with the actual lists of needed libraries. Naturally, this meant that someone actually had to figure out what the dependencies were — often for fetch-restricted packages that we simply didn’t have distfiles for.

In the end, everything went fine. All relevant packages were ported, emul-linux-x86 sets were retired. The team stayed around for a few years, updating the eclasses as need be. Many new packages gained multilib support even though it wasn’t strictly needed for anything. Multilib-foo became common knowledge.

The future

Our multilib effort is still alive and kicking. At the very least, it serves as the foundation for 32-bit Wine. While the Multilib project itself has been disbanded, its legacy lives on and it is not likely to become obsolete anytime soon. From a quick grep, there are around 600 multilib-enabled packages in ::gentoo at the moment and it is quite likely that there will be more.

The multilib-portage project is still being developed but it does not seem likely to be able to escape its niche. The eclass approach is easier, more portable and more efficient. You don’t have to modify the package manager, you don’t have to build everything multiple times; ideally, you only build library parts for all ABIs.

Support for multilib on non-x86 platforms is an open question. After all, the whole multilib effort was primarily focused on providing compatibility with old 32-bit executables on x86. While some platforms technically can provide multilib, it is not clear how much of that is actually useful to the users, and how much is a cargo cult. Support for additional targets has historically proven troublesome by causing exponential explosion of USE flags.

Some people were proposing switching to Debian-style multiarch layout (e.g. /usr/lib/x86_64-linux-gnu instead of /usr/lib64). However, I have never seen a strong reason to do that. After all, traditional libdirs are well-defined in the ABI specifications while multiarch is a custom Debian invention. In the end, it would be about moving things around and then patching packages into supporting non-standard locations. It would go against one of the primary Gentoo principles of providing a vanilla development environment. And that only shortly after we’ve finally gotten rid of the custom /usr/lib32 in favor of backwards-compatible /usr/lib.

So, while the Multilib project has been retired now, multilib itself is all but dead. We still use it, we still need it and we will probably still work on it in the future.

I created the Multilib project back in November 2013 (though the effort itself started roughly a year earlier) with the goal of maintaining the multilib eclasses and porting Gentoo packages to them. Back in the day, we were even requested to co-maintain a few packages whose maintainers were opposed to multilib ports. In June 2015, last of the emul-linux-x86 packages were removed and our work has concluded.

The project continued to exist for the purpose of maintaining the eclasses and providing advice. Today, I can say that the project has served its purpose and it is time to retire it. Most of the team members have already left, the multilib knowledge that we advised on before is now common developer knowledge. I am planning to take care of the project-maintained eclasses personally, and move the relevant documentation to the general wiki space.

At the same time, I would like to take this opportunity to tell the history of our little multilib project.

Gentoo before gx86-multilib

In the old days, the multilib as seen by the majority of Gentoo users consisted of two components: multilib toolchain packages and emul-linux-x86 packages.

The toolchain multilib support exists pretty much in its original form to this day. It consists of a multilib USE flag and an ABI list stored in the selected profile. The rough idea is that bootstrapping a toolchain with a superset of its current ABIs is non-trivial, so the users generally choose a particular multilib or non-multilib variant when installing Gentoo, and do not change it afterwards. The multilib project didn’t really touch this part.

The emul-linux-x86 packages were specifically focused on non-toolchain packages. Back in the day, they consisted of a few sets of precompiled 32-bit libraries for amd64. If you needed to run a proprietary 32-bit app or compile wine, they had to depend on a few sets, e.g.:

amd64? (
    app-emulation/emul-linux-x86-xlibs
    app-emulation/emul-linux-x86-soundlibs
)

The sets generally included the current stable versions of packages and were rebuilt every few months.

Simultaneously, an alternative to this solution was developed (and is being developed to this day): multilib-portage, a Portage fork that was designed specifically to build all packages for multiple ABIs. Unlike the other solutions, multilib-portage minimized development effort and worked on top of regular Gentoo packages. However, it never reached production readiness.

The gx86-multilib design

The gx86-multilib effort was intended to provide a multilib solution entirely within the scope of the Gentoo repository (still named gentoo-x86 at the time, hence the name), i.e. without having to modify EAPI or package managers. It was supposed to be something between emul-linux-x86 and multilib-portage, building all non-native libraries from source but requiring explicit support from packages.

It only seemed natural to utilize USE_EXPAND the same way as PYTHON_TARGETS did for Python. At the same time, splitting ABIs per architecture made it possible to use USE_EXPAND_HIDDEN to hide irrelevant flags from users. So e.g. amd64 multilib users see only ABI_X86, PPC64 users see ABI_PPC and so on.

The default ABI for a given platform is always forced on. This made it possible to keep things working for non-multilib packages without adding any multilib awareness to them, and at the same time cleanly handle profiles that do not do multilib at all. Multilib packages use ${MULTILIB_USEDEP} to enforce ABI match on their multilib depdencies; non-multilib packages just use plain deps and can expect the native ABI to be always enabled.

Eclasses were a natural place to implement all this logic. In the end, they formed a hierarchical structure. The pre-existing multilib.eclass already provided a few low-level functions needed to set up multilib builds. On top of it, multilib-build.eclass was created that provided low-level functions specific to the gx86-multilib — handling USE flags, running the builds and some specific helper functions. On top of it, high-level sub-phase-based multilib-minimal.eclass was created that made writing generic ebuilds easy. Then, on top of that the specific autotools-multilib.eclass and cmake-multilib.eclass existed.

Historically, the order was a little different. autotools-multilib.eclass came first. Then, the common logic was split into multilib-build.eclass and cmake-multilib.eclass came to be. Finally, multilib-minimal.eclass was introduced and a few months later the other eclasses started reusing it.

The reception and porting efforts

The eclasses had a mixed reception. They followed my philosophy of getting things done, today. This disagreed with purists who believed we should look for a better solution. Many of the developers believed that multilib-portage was the way forward (after all, it did not require changing ebuilds), though they did not seem to be that much concerned about having a clear plan of action. When I’ve pointed out that things need to be formally specified, the answer was roughly to dump whatever’s in multilib-portage at the time into the spec. As you can guess, no spec was ever written.

Nevertheless, porting ebuilds to the new framework proceeded over time. In some cases, we had to deal with varying level of opposition. In the most extreme cases, we had to work out a compromise and become co-maintainers of these packages in order to provide direct support for any port-related problems. However, as time went by more people joined the cause, and today it is pretty natural that maintainers add multilib support themselves. In fact, I believe that things went a bit out of control, as multilib is being added to packages where it is not really needed.

In its early years, gx86-multilib had to coexist with the older emul-linux-x86 packages. Since both groups of packages installed the same files, collisions were inevitable. Every few converted packages, we had to revbump the respective emul-linux-x86 sets dropping the colliding libraries. Later on, we had to start replacing old dependencies on emul-linux-x86 packages (now metapackages) with the actual lists of needed libraries. Naturally, this meant that someone actually had to figure out what the dependencies were — often for fetch-restricted packages that we simply didn’t have distfiles for.

In the end, everything went fine. All relevant packages were ported, emul-linux-x86 sets were retired. The team stayed around for a few years, updating the eclasses as need be. Many new packages gained multilib support even though it wasn’t strictly needed for anything. Multilib-foo became common knowledge.

The future

Our multilib effort is still alive and kicking. At the very least, it serves as the foundation for 32-bit Wine. While the Multilib project itself has been disbanded, its legacy lives on and it is not likely to become obsolete anytime soon. From a quick grep, there are around 600 multilib-enabled packages in ::gentoo at the moment and it is quite likely that there will be more.

The multilib-portage project is still being developed but it does not seem likely to be able to escape its niche. The eclass approach is easier, more portable and more efficient. You don’t have to modify the package manager, you don’t have to build everything multiple times; ideally, you only build library parts for all ABIs.

Support for multilib on non-x86 platforms is an open question. After all, the whole multilib effort was primarily focused on providing compatibility with old 32-bit executables on x86. While some platforms technically can provide multilib, it is not clear how much of that is actually useful to the users, and how much is a cargo cult. Support for additional targets has historically proven troublesome by causing exponential explosion of USE flags.

Some people were proposing switching to Debian-style multiarch layout (e.g. /usr/lib/x86_64-linux-gnu instead of /usr/lib64). However, I have never seen a strong reason to do that. After all, traditional libdirs are well-defined in the ABI specifications while multiarch is a custom Debian invention. In the end, it would be about moving things around and then patching packages into supporting non-standard locations. It would go against one of the primary Gentoo principles of providing a vanilla development environment. And that only shortly after we’ve finally gotten rid of the custom /usr/lib32 in favor of backwards-compatible /usr/lib.

So, while the Multilib project has been retired now, multilib itself is all but dead. We still use it, we still need it and we will probably still work on it in the future.

May 26 2021

Gentoo Freenode channels have been hijacked

Gentoo News (GentooNews) May 26, 2021, 5:00

Today (2021-05-26) a large number of Gentoo channels have been hijacked by Freenode staff, including channels that were not yet migrated to Libera.chat. We cannot perceive this otherwise than as an open act of hostility and we have effectively left Freenode.

Please note that at this point the only official Gentoo IRC channels, as well as developer accounts, can be found on Libera Chat.

2021-06-15 update

As a part of an unannounced switch to a different IRC daemon, the Freenode staff has removed all channel and nickname registrations. Since many Gentoo developers have left Freenode permanently and are not interested in registering their nicknames again, this opens up further possibilities of malicious impersonation.

Today (2021-05-26) a large number of Gentoo channels have been hijacked by Freenode staff, including channels that were not yet migrated to Libera.chat. We cannot perceive this otherwise than as an open act of hostility and we have effectively left Freenode.

Please note that at this point the only official Gentoo IRC channels, as well as developer accounts, can be found on Libera Chat.

2021-06-15 update

As a part of an unannounced switch to a different IRC daemon, the Freenode staff has removed all channel and nickname registrations. Since many Gentoo developers have left Freenode permanently and are not interested in registering their nicknames again, this opens up further possibilities of malicious impersonation.

Gentoo IRC presence moving to Libera Chat

Gentoo News (GentooNews) May 23, 2021, 5:00

The Gentoo Council held an emergency single agenda item meeting today. At this meeting, we have decided to move the official IRC presence of Gentoo to the Libera Chat IRC network. We intend to have this move complete at latest by 13/June/2021. A full log of the meeting will be available for download soon.

At the moment it is unclear whether we will retain any presence on Freenode at all; we urge all users of the #gentoo channel namespace to move to Libera Chat immediately. IRC channel names will (mostly) remain identical. You will be able to recognize Gentoo developers on Libera Chat by their IRC cloak in the usual form gentoo/developer/*. All other technical aspects will feel rather familiar to all of us as well. Detailed instructions for setting up various IRC clients can be found on the help pages of the IRC network.

Libera.Chat logo

The Gentoo Council held an emergency single agenda item meeting today. At this meeting, we have decided to move the official IRC presence of Gentoo to the Libera Chat IRC network. We intend to have this move complete at latest by 13/June/2021. A full log of the meeting will be available for download soon.

At the moment it is unclear whether we will retain any presence on Freenode at all; we urge all users of the #gentoo channel namespace to move to Libera Chat immediately. IRC channel names will (mostly) remain identical. You will be able to recognize Gentoo developers on Libera Chat by their IRC cloak in the usual form gentoo/developer/*. All other technical aspects will feel rather familiar to all of us as well. Detailed instructions for setting up various IRC clients can be found on the help pages of the IRC network.

May 21 2021

From build-dir to venv — testing Python packages in Gentoo

Michał Górny (mgorny) May 21, 2021, 19:40

A lot of Python packages assume that their tests will be run after installing the package. This is quite a reasonable assumption if you take that the tests are primarily run in dedicated testing environments such as CI deployments or test runners such as tox. However, this does not necessarily fit the Gentoo packaging model where packages are installed system-wide, and the tests are run between compile and install phases.

In great many cases, things work out of the box (because the modules are found relatively to the current directory), or require only minimal PYTHONPATH adjustments. In others, we found it necessary to put a varying amount of effort to create a local installation of the package that is suitable for testing.

In this post, I would like to shortly explore the various solutions to the problem we’ve used over the years, from simple uses of build directory to the newest ideas based on virtual environments.

Testing against the build directory

As I have indicated above, great many packages work just fine with the correct PYTHONPATH setting. However, not all packages provide ready-to-use source trees and even if they do, there’s the matter of either having to manually specify the path to them or have more or less reliable automation guess it. Fortunately, there’s a simple solution.

The traditional distutils/setuptools build process consists of two phases: the build phase and the install phase. The build phase is primarily about copying the files from their respective source directories to a unified package tree in a build directory, while the install phase is generally about installing the files found in the build directory. Besides just reintegrating sources, the build phase may also involve other important taks: compiling the extensions written in C or converting sources from Python 2 to Python 3 (which is becoming rare). Given that the build command is run in src_compile, this makes the build directory a good candidate for use in tests.

This is precisely what distutils-r1.eclass does out of the box. It ensures that the build commands write to a predictable location, and it adds that location to PYTHONPATH. This ensures that the just-built package is found by Python when trying to import its modules. That is — unless the package residing in the current directory takes precedence. In either case, it means that most of the time things just work, and sometimes just have to restort to simple hacks such as changing the current directory.

distutils_install_for_testing (home layout)

While the build directory method worked for many packages, it had its limitation. To list a few I can think of:

  • Script wrappers for entry points were not created (and even regular scripts were not added to PATH due to a historical mistake), so tests that relied on being able to call installed executables did not work.
  • Package metadata (.egg-info) was not included, so pkg_resources (and now the more modern importlib.metadata) modules may have had trouble finding the package.
  • Namespace packages were not handled properly.

The last point was the deal breaker here. Remember that we’re talking of the times when Python 2.7 was still widely supported. If we were testing a zope.foo package that happened to depend on zope.bar, then we were in trouble. The top-level zope package that we’ve just added to PYTHONPATH had only the foo submodule but bar had to be gotten from system site-packages!

Back in the day, I did not know much about the internals of these things. I was looking for an easy working solution, and I have found one. I have discovered that using setup.py install --home=... (vs setup.py install --root=... that we used to install into D) happened to install a layout that made namespaces just work! This was just great!

This how the original implementation of distutils_install_for_testing came around. The rough idea was to put this –home install layout on PYTHONPATH and reap all the benefits of having the package installed before running tests.

Root layout

The original dift layout was good while it worked. But then it stopped. I don’t know the exact version of setuptools or the exact change but the magic just stopped working. Good news is that it was just a few months ago, and we were already deep in removing Python 2.7, so we did not have to worry about namespaces that much (namespaces are much easier in Python 3 as they work via empty directories without special magic).

The simplest solution I could think of was to stop relying on the home layout, and instead use the same root layout as used for our regular installs. This did not include as much magic but solved the important problems nevertheless. Entry point wrappers were installed, namespaces worked of their own accord most of the time.

I’ve added a new --via-root parameter to change dift mode, and --via-home to force the old behavior. By the end of January, I have flipped the default and we were happily using the new layout since then. Except that it didn’t really solve all the problems.

Virtualenv layout

The biggest limitations of the both dift layouts is that they’ve relied on PYTHONPATH. However, not everything in the Python world respects path overrides. To list just two examples: the test suite of werkzeug relies on overwriting PYTHONPATH for spawned processes, and tox fails to find its own installed package.

I have tried various hacks to resolve this, to no avail. The solution that somewhat worked was to require the package to be actually installed before running the tests but that was really inconvenient. Interestingly enough, virtualenvs rely on some internal Python magic to actually override module search path without relying on PYTHONPATH.

The most recent dift --via-venv variant that I’ve just submitted for mailing list review uses exactly this. That is, it uses the built-in Python 3 venv module (not to be confused with the third-party virtualenv).

Now, normally a virtualenv creates an isolated environment where all dependencies have to be installed explicitly. However, there is a --system-site-packages option that avoids this. The packages installed inside the virtualenv (i.e. the tested package) will take precedence but other packages will be imported from the system site-packages directory. That’s just what we need!

I have so far tested this new method on two problematic packages (werkzeug and tox). It might be just the thing that resolves all the problems that were previously resolved via the home layout. Or it might not. I do not know yet whether we’ll be switching default again. Time will tell.

A lot of Python packages assume that their tests will be run after installing the package. This is quite a reasonable assumption if you take that the tests are primarily run in dedicated testing environments such as CI deployments or test runners such as tox. However, this does not necessarily fit the Gentoo packaging model where packages are installed system-wide, and the tests are run between compile and install phases.

In great many cases, things work out of the box (because the modules are found relatively to the current directory), or require only minimal PYTHONPATH adjustments. In others, we found it necessary to put a varying amount of effort to create a local installation of the package that is suitable for testing.

In this post, I would like to shortly explore the various solutions to the problem we’ve used over the years, from simple uses of build directory to the newest ideas based on virtual environments.

Testing against the build directory

As I have indicated above, great many packages work just fine with the correct PYTHONPATH setting. However, not all packages provide ready-to-use source trees and even if they do, there’s the matter of either having to manually specify the path to them or have more or less reliable automation guess it. Fortunately, there’s a simple solution.

The traditional distutils/setuptools build process consists of two phases: the build phase and the install phase. The build phase is primarily about copying the files from their respective source directories to a unified package tree in a build directory, while the install phase is generally about installing the files found in the build directory. Besides just reintegrating sources, the build phase may also involve other important taks: compiling the extensions written in C or converting sources from Python 2 to Python 3 (which is becoming rare). Given that the build command is run in src_compile, this makes the build directory a good candidate for use in tests.

This is precisely what distutils-r1.eclass does out of the box. It ensures that the build commands write to a predictable location, and it adds that location to PYTHONPATH. This ensures that the just-built package is found by Python when trying to import its modules. That is — unless the package residing in the current directory takes precedence. In either case, it means that most of the time things just work, and sometimes just have to restort to simple hacks such as changing the current directory.

distutils_install_for_testing (home layout)

While the build directory method worked for many packages, it had its limitation. To list a few I can think of:

  • Script wrappers for entry points were not created (and even regular scripts were not added to PATH due to a historical mistake), so tests that relied on being able to call installed executables did not work.
  • Package metadata (.egg-info) was not included, so pkg_resources (and now the more modern importlib.metadata) modules may have had trouble finding the package.
  • Namespace packages were not handled properly.

The last point was the deal breaker here. Remember that we’re talking of the times when Python 2.7 was still widely supported. If we were testing a zope.foo package that happened to depend on zope.bar, then we were in trouble. The top-level zope package that we’ve just added to PYTHONPATH had only the foo submodule but bar had to be gotten from system site-packages!

Back in the day, I did not know much about the internals of these things. I was looking for an easy working solution, and I have found one. I have discovered that using setup.py install --home=... (vs setup.py install --root=... that we used to install into D) happened to install a layout that made namespaces just work! This was just great!

This how the original implementation of distutils_install_for_testing came around. The rough idea was to put this –home install layout on PYTHONPATH and reap all the benefits of having the package installed before running tests.

Root layout

The original dift layout was good while it worked. But then it stopped. I don’t know the exact version of setuptools or the exact change but the magic just stopped working. Good news is that it was just a few months ago, and we were already deep in removing Python 2.7, so we did not have to worry about namespaces that much (namespaces are much easier in Python 3 as they work via empty directories without special magic).

The simplest solution I could think of was to stop relying on the home layout, and instead use the same root layout as used for our regular installs. This did not include as much magic but solved the important problems nevertheless. Entry point wrappers were installed, namespaces worked of their own accord most of the time.

I’ve added a new --via-root parameter to change dift mode, and --via-home to force the old behavior. By the end of January, I have flipped the default and we were happily using the new layout since then. Except that it didn’t really solve all the problems.

Virtualenv layout

The biggest limitations of the both dift layouts is that they’ve relied on PYTHONPATH. However, not everything in the Python world respects path overrides. To list just two examples: the test suite of werkzeug relies on overwriting PYTHONPATH for spawned processes, and tox fails to find its own installed package.

I have tried various hacks to resolve this, to no avail. The solution that somewhat worked was to require the package to be actually installed before running the tests but that was really inconvenient. Interestingly enough, virtualenvs rely on some internal Python magic to actually override module search path without relying on PYTHONPATH.

The most recent dift --via-venv variant that I’ve just submitted for mailing list review uses exactly this. That is, it uses the built-in Python 3 venv module (not to be confused with the third-party virtualenv).

Now, normally a virtualenv creates an isolated environment where all dependencies have to be installed explicitly. However, there is a --system-site-packages option that avoids this. The packages installed inside the virtualenv (i.e. the tested package) will take precedence but other packages will be imported from the system site-packages directory. That’s just what we need!

I have so far tested this new method on two problematic packages (werkzeug and tox). It might be just the thing that resolves all the problems that were previously resolved via the home layout. Or it might not. I do not know yet whether we’ll be switching default again. Time will tell.

May 20 2021

Freenode IRC and Gentoo

Gentoo News (GentooNews) May 20, 2021, 5:00

According to the information published recently, there have been major changes in the way the Freenode IRC network is administered. This has resulted in a number of staff members raising concerns about the new administration and/or resigning. A large number of open source projects have already announced the transition to other IRC networks, or are actively discussing it.

It is not yet clear whether and how these changes will affect Gentoo. We are observing as the situation develops. It is possible that we will decide to move the official Gentoo channels to another network in the best interest of our users. At the same time, we realize that such a move will be an inconvenience to them.

At the same time, it has came to our attention that certain individuals have been using the situation to impersonate Gentoo developers on other IRC networks. The official Gentoo developers can be identified on Freenode by their gentoo/developer cloak. If we move to another network, we will announce claiming a respective cloak.

Please check this page for future updates.

More information on the Freenode situation can be found at:

  • Christian (Fuchs)’s Freenode resignation
  • @freenodestaff tweet
  • Open Letter On freenode’s independence
  • Andrew Lee, We grew up with IRC. Let’s take it further.
2021-05-22 update

The Gentoo Council will be meeting tomorrow (Sunday, 2021-05-23) at 19:00 UTC to discuss the problem and the possible solutions.

The Gentoo Group Contacts team has been taking steps in order to ensure readiness for the most likely options.

According to the information published recently, there have been major changes in the way the Freenode IRC network is administered. This has resulted in a number of staff members raising concerns about the new administration and/or resigning. A large number of open source projects have already announced the transition to other IRC networks, or are actively discussing it.

It is not yet clear whether and how these changes will affect Gentoo. We are observing as the situation develops. It is possible that we will decide to move the official Gentoo channels to another network in the best interest of our users. At the same time, we realize that such a move will be an inconvenience to them.

At the same time, it has came to our attention that certain individuals have been using the situation to impersonate Gentoo developers on other IRC networks. The official Gentoo developers can be identified on Freenode by their gentoo/developer cloak. If we move to another network, we will announce claiming a respective cloak.

Please check this page for future updates.

More information on the Freenode situation can be found at:

2021-05-22 update

The Gentoo Council will be meeting tomorrow (Sunday, 2021-05-23) at 19:00 UTC to discuss the problem and the possible solutions.

The Gentoo Group Contacts team has been taking steps in order to ensure readiness for the most likely options.

May 18 2021

Google Summer of Code 2021 students welcome

Gentoo News (GentooNews) May 18, 2021, 5:00

We are glad to welcome Leo and Mark to the Google Summer of Code 2021.

Mark will work on improving Catalyst, our release building tool. Leo will work on improving our Java packaging support, with a special focus on big-data and scientific software.

We are glad to welcome Leo and Mark to the Google Summer of Code 2021.

Mark will work on improving Catalyst, our release building tool. Leo will work on improving our Java packaging support, with a special focus on big-data and scientific software.

May 06 2021

10 Years’ Perspective on Python in Gentoo

Michał Górny (mgorny) May 06, 2021, 7:55

I’m a Gentoo developer for over 10 years already. I’ve been doing a lot of different things throughout that period. However, Python was pretty much always somewhere within my area of interest. I don’t really recall how it all started. Maybe it had something to do with Portage being written in Python. Maybe it was the natural next step after programming in Perl.

I feel like the upcoming switch to Python 3.9 is the last step in the prolonged effort of catching up with Python. Over the last years, we’ve been working real hard to move Python support forward, to bump neglected packages, to enable testing where tests are available, to test packages on new targets and unmask new targets as soon as possible. We have improved the processes a lot. Back when we were switching to Python 3.4, it took almost a year from the first false start attempt to the actual change. We started using Python 3.5 by default after upstream dropped bugfix support for it. In a month from now, we are going to start using Python 3.9 even before 3.10 final is released.

I think this is a great opportunity to look back and see what changed in the Gentoo Python ecosystem, in the last 10 years.

Python package ebuilds 10 years ago

Do you know how a Python package ebuild looked like 10 years ago? Let’s take gentoopm-0.1 as an example (reformatted to fit the narrow layout better):


# Copyright 1999-2011 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
# $Header: /var/cvsroot/gentoo-x86/app-portage/gentoopm/gentoopm-0.1.ebuild,v 1.1 2011/07/15 19:05:26 mgorny Exp $

EAPI=3

PYTHON_DEPEND='*:2.6'
SUPPORT_PYTHON_ABIS=1
RESTRICT_PYTHON_ABIS='2.4 2.5'
DISTUTILS_SRC_TEST=setup.py

inherit base distutils

DESCRIPTION="A common interface to Gentoo package managers"
HOMEPAGE="github.com/gentoopm/"
SRC_URI="cloud.github.com/downloads/mgorny/${PN}/${P}.tar.bz2"

LICENSE="BSD-2"
SLOT="0"
KEYWORDS="~amd64 ~x86"
IUSE="doc"

RDEPEND="
  || (
    >=sys-apps/portage-2.1.8.3
    sys-apps/pkgcore
    >=sys-apps/paludis-0.64.2[python-bindings]
  )"
DEPEND="dev-python/epydoc"
PDEPEND="app-admin/eselect-package-manager"

src_prepare() {
  base_src_prepare
  distutils_src_prepare
}

src_compile() {
  distutils_src_compile

  if use doc; then
    "$(PYTHON -2)" setup.py doc || die
  fi
}

src_install() {
  distutils_src_install

  if use doc; then
    dohtml -r doc/* || die
  fi
}

This ebuild is actually using the newer API of python.eclass that is enabled via SUPPORT_PYTHON_ABIS. It provides support for installing for multiple implementations (like the modern python-r1 eclass). PYTHON_DEPEND is used to control the dependency string added to ebuild. The magical syntax here means that the ebuild supports both Python 2 and Python 3, from Python 2.6 upwards. RESTRICT_PYTHON_ABIS opts out support for Python versions prior to 2.6. Note the redundancy — PYTHON_DEPEND controls the dependency, specified as a range of Python 2 and/or Python 3 versions, RESTRICT_PYTHON_ABIS controls versions used at build time and needs to explicitly exclude all unsupported branches.

Back then, there were no PYTHON_TARGETS to control what was built. Instead, the eclass defaulted to using whatever was selected via eselect python, with the option to override it via setting USE_PYTHON in make.conf. Therefore, there were no cross-package USE dependencies and you had to run python-updater to verify whether all packages are built for the current interpreter, and rebuild these that were not.

Still, support for multiple ABIs, as the eclass called different branches/implementations of Python, was a major step forward. It was added around the time that the first releases of Python 3 were published, and our users have been relying on it to support a combination of Python 2 and Python 3 for years. Today, we’re primarily using it to aid developers in testing their packages and to provide a safer upgrade experience.

The python.eclass stalemate

Unfortunately, things at the time were not all that great. Late 2010 marks a conflict between the primary Python developer and the rest of the community, primarily due to the major breakage being caused by the changes in Python support. By mid-2011, it was pretty clear that there is no chance to resolve the conflict. The in-Gentoo version of python.eclass was failing to get EAPI 4 support for 6 months already, while an incompatible version continued being developed in the (old) Python overlay. As Dirkjan Ochtman related in his mail:

I guess by now pretty much everyone knows that the python eclass is rather complex, and that this poses some problems. This has also been an important cause for the disagreements between Arfrever and some of the other developers. Since it appears that Arfrever won’t be committing much code to gentoo-x86 in the near future, I’m trying to figure out where we should go with the python.eclass. […]

Dirkjan Ochtman, 2011-06-27, [gentoo-dev] The Python problem

Eventually, some of the changes from the Python overlay were backported to the eclass and EAPI 4 support was added. Nevertheless, at this point it was pretty clear that we need a new way forward. Unfortunately, the discussions were leading nowhere. With the primary eclass maintainer retired, nobody really comprehended most of the eclass, nor were able to afford the time to figure it out. At the same time, involved parties wanted to preserve backwards compatibility while moving forward.

The tie breaker: python-distutils-ng

Some of you might find it surprising that PYTHON_TARGETS are not really a python-r1 invention. Back in March 2012, when Python team was still unable to find a way forward with python.eclass, Krzysztof Pawlik (nelchael) has committed a new python-distutils-ng.eclass. It has never grown popular, and it has been replaced by the python-r1 suite before it ever started being a meaningful replacement for python.eclass. Still, it served an important impulse that made what came after possible.

Here’s a newer gentoopm ebuild using the new eclass (again reformatted):


# Copyright 1999-2012 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
# $Header: /var/cvsroot/gentoo-x86/app-portage/gentoopm/gentoopm-0.2.5-r1.ebuild,v 1.1 2012/05/26 10:11:21 mgorny Exp $

EAPI=4
PYTHON_COMPAT='python2_6 python2_7 python3_1 python3_2'

inherit base python-distutils-ng

DESCRIPTION="A common interface to Gentoo package managers"
HOMEPAGE="github.com/mgorny/gentoopm/"
SRC_URI="mirror://github/mgorny/${PN}/${P}.tar.bz2"

LICENSE="BSD-2"
SLOT="0"
KEYWORDS="~amd64 ~mips ~x86 ~x86-fbsd"
IUSE="doc"

RDEPEND="
  || (
    >=sys-apps/portage-2.1.10.3
    sys-apps/pkgcore
    >=sys-apps/paludis-0.64.2[python-bindings]
  )"
DEPEND="doc? ( dev-python/epydoc )"
PDEPEND="app-admin/eselect-package-manager"

python_prepare_all() {
  base_src_prepare
}

src_compile() {
  python-distutils-ng_src_compile
  if use doc; then
    "${PYTHON}" setup.py doc || die
  fi
}

python_install_all() {
  if use doc; then
    dohtml -r doc/*
  fi
}

Just looking at the code, you may see that python-r1 has inherited a lot after this eclass. python-distutils-ng in turn followed some of the good practices introduced before in the ruby-ng eclass. It introduced PYTHON_TARGETS to provide explicit visible control over implementations used for the build — though notably it did not include a way for packages to depend on matching flags (i.e. the equivalent of ${PYTHON_USEDEP}). It also used the sub-phase approach that makes distutils-r1 and ruby-ng eclasses much more convenient than the traditional python.eclass approach that roughly resembled using python_foreach_impl all the time.

What’s really important is that python-distutils-ng carved a way forward. It’s been a great inspiration and a proof of concept. It has shown that we do not have to preserve compatibility with python.eclass forever, or have to learn its inner workings before starting to solve problems. I can’t say how Python would look today if it did not happen but I can say with certainly that python-r1 would not happen so soon if it weren’t for it.

python-r1

In October 2012, the first version of python-r1 was committed. It combined some of the very good ideas of python-distutils-ng with some of my own.  The goal was not to provide an immediate replacement for python.eclass. Instead, the plan was to start simple and add new features as they turned out to be necessary. Not everything went perfectly but I dare say that the design has stood the test of time. While I feel like the eclasses ended up being more complex than I wished they would be, they still work fine with no replacement in sight and they serve as inspiration to other eclasses.

For completeness, here’s a 2017 gentoopm live ebuild that uses a pretty complete distutils-r1 feature set:


# Copyright 1999-2017 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2

EAPI=6
PYTHON_COMPAT=( python{2_7,3_4,3_5,3_6} pypy )

EGIT_REPO_URI="github.com/mgorny/gentoopm.git"
inherit distutils-r1 git-r3

DESCRIPTION="A common interface to Gentoo package managers"
HOMEPAGE="github.com/mgorny/gentoopm/"
SRC_URI=""

LICENSE="BSD-2"
SLOT="0"
KEYWORDS=""
IUSE="doc"

RDEPEND="
  || (
    >=sys-apps/pkgcore-0.9.4[${PYTHON_USEDEP}]
    >=sys-apps/portage-2.1.10.3[${PYTHON_USEDEP}]
    >=sys-apps/paludis-3.0.0_pre20170219[python,${PYTHON_USEDEP}]
  )"
DEPEND="
  doc? (
    dev-python/epydoc[$(python_gen_usedep python2_7)]
  )"
PDEPEND="app-eselect/eselect-package-manager"

REQUIRED_USE="
  doc? ( $(python_gen_useflags python2_7) )"

src_configure() {
  use doc && DISTUTILS_ALL_SUBPHASE_IMPLS=( python2.7 )
  distutils-r1_src_configure
}

python_compile_all() {
  use doc && esetup.py doc
}

python_test() {
  esetup.py test
}

python_install_all() {
  use doc && local HTML_DOCS=( doc/. )
  distutils-r1_python_install_all
}

The new eclasses have employed the same PYTHON_TARGETS flags for the general implementation choice but also added PYTHON_SINGLE_TARGET to make choosing the implementation more convenient (and predictable at the same time) when the package did not permit choosing more than one. The distutils-r1 eclass reused the great idea of sub-phases to make partial alterations to the phases easier.

The unique ideas included:

  • the split into more eclasses by functionality (vs switching the mode via variables as done in python.eclass)
  • exposing dependencies and REQUIRED_USE constraints via variables to be used in ebuild instead of writing elaborate mechanisms for adding dependencies
  • avoiding command substitution in global scope to keep metadata regeneration fast

The eclasses evolved a lot over the years. Some of the original ideas turned out pretty bad, e.g. trying to run sub-phases in parallel (which broke a lot of stuff for minor performance gain) or the horribly complex original interaction between PYTHON_TARGETS and PYTHON_SINGLE_TARGET. Ebuilds often missed dependencies and REQUIRED_USE constraints, until we finally made the pkgcheck-based CI report that.

The migration to new eclasses took many years. Initially, python-r1 ebuilds were even depending on python.eclass ebuilds. The old eclass was not removed until March 2017, i.e. 4.5 years after introducing the new eclasses.

Testing on new Python targets

We went for the opt-in approach with PYTHON_COMPAT. This means that for every new Python target added, we start with no packages supporting it and have to iterate over all of the packages adding the support. It’s a lot of work and it has repeatedly caused users pain due to packages not being ported in time for the big switch to the next version. Some people have complained about that and suggested that we should go for opt-out instead. However, if you think about it, opt-in is the only way to go.

The big deal is that for any particular package to support a new implementation, all of its Python dependencies need to support it as well. With the opt-in approach, it means that we’re doing the testing dependency-first, and reaching the bigger packages only when we confirm that there’s at least a single version of every dependency that works for it. If we do things right, users don’t even see any regressions.

If we went for the opt-out approach, all packages would suddenly claim to support the new version. Now, this wouldn’t be that bad if we were actually able to do a big CI run for all packages — but we can’t since a lot of them do not have proper tests, and Python version incompatibility often can’t be detected statically. In the end, we would be relying on someone (possibly an user) reporting that something is broken. Then we’d have to investigate where in the dependency chain the culprit is, and either restrict the new target (and then restrict it in all its reverse dependencies) or immediately fix it.

So in the end, opt-out would be worse for both users and developers. Users would hit package issues first hand, and developers would have to spend significant time on the back-and-forth effort of removing support for new targets, and then adding it again. If we are to add new targets early (which is a worthwhile goal), we have to expect incompatible packages. My experience so far shows that Gentoo developers sometimes end up being the first people to submit patches fixing the incompatibility. This can be a real problem given that many Python packages have slow release cycles, and are blocking their reverse dependencies, and these in turn block their reverse dependencies and so on.

Now, things did not always go smoothly. I have prepared a Python release and Gentoo packaging timeline that puts our packaging work into perspective. As you can see, we were always quite fast in packaging new Python interpreters but it took a significant time to actually switch the default targets to them — in fact, we often switched just before or even after upstream stopped providing bug fixes to the version in question.

Our approach has changed over the years. Early on, we generally kept both the interpreter and the target in ~arch, and stabilized them in order to switch targets. The prolonged stable-mask of the new target has resulted in inconsistent presence of support for the new target in stable, and this in turn involved a lot of last-minute stabilization work. Even then, we ended up switching targets before stable was really ready for that. This was particularly bad for Python 3.4 — the period seen as ‘stable’ on the timeline is actually a period following the first unsuccessful switch of the default. It took us over half a year to try again.

Then (around Python 3.6, if I’m not mistaken) we switched to a different approach. Instead of delaying till the expected switch, we’ve tried to unmask the target on stable systems as soon as possible. This way, we started enforcing dependency graph consistency earlier and were able to avoid big last minute stabilizations needed to unmask the target.

Eventually, thanks to pkgcheck’s superior StableRequestCheck, I’ve started proactively stabilizing new versions of Python packages. This was probably the most important improvement of all. The stabilization effort was streamlined, new versions of packages gained stable keywords sooner and along them did the support for new Python implementations.

The effort in package testing and stabilizations have finally made it possible to catch up with upstream. We have basically gone through a target switching sprint, moving through 3.7 and 3.8 in half a year each. The upcoming switch to Python 3.9 concludes this effort. For the first time in years, our default will be supported upstream for over 6 months.

That said, it is worth noting that things are not becoming easier for us. Over time, Python packages keep getting new dependencies, and this means that every new Python version will involve more and more porting work. Unfortunately, some high profile packages such as setuptools and pytest keep creating bigger and bigger dependency loops. At this point, it is no longer reasonable to attempt to port all the cyclic dependencies simultaneously. Instead, I tend to temporarily disable tests for the initial ports to reduce the number of dependencies. I’ve included a list of suggested initial steps in the Python Guide to ease future porting efforts.

Packaging the Python interpreter

CPython is not the worst thing to package but it’s not the easiest one either. Admittedly, the ebuilds were pretty good back when I joined. However, we’ve already carried a pretty large and mostly undocumented set of patches, and most of these patches we carry up to this day, with no chance of upstreaming them.

Some of the involved patches are build fixes and hacks that are either specific to Gentoo, or bring the flexibility Gentoo cares about (such as making some of the USE flags possible). There are also some fixes for old bugs that upstream has not shown any interest in fixing.

Another part of release management is resolving security bugs. Until recently, we did not track vulnerabilities in CPython very well. Thanks to the work of new Security team members, we have started being informed of vulnerabilities earlier. At the same time, we realized that CPython’s treatment of vulnerabilities is a bit suboptimal.

Admittedly, when very bad things happen upstream releases fixes quickly. However, non-critical vulnerability fixes are released as part of the normal release cycle, approximately every two months. For some time already, every new release contained some security fixes and had to be stabilized quickly. At the same time it contained many other changes with their own breakage potential. Old Python branches were even worse — absurdly, even though these versions received security fixes only, the releases were even rarer.

In the end, I’ve decided that it makes more sense to backport security fixes to our patchset as soon as I become aware of them, and stabilize the patch bumps instead. I do this even if upstream makes a new release at the same time, since patch bumps are safer stable targets. Even then, some of the security fixes actually require changing the behavior. To name a few recent changes:

  • urllib.parse.parse_qsl() historically used to split the URL query string on either & or ;. This somewhat surprising behavior was changed to split only on &, with a parameter to change the separator (but no option to restore the old behavior).
  • urllib.parse.urlparse() historically preserved newlines, CRs and tabs. In the latest Python versions this behavior was changed to follow a newer recommendation of stripping these characters. As a side effect, some URL validators (e.g. in Django) suddenly stopped rejecting URLs with newlines.
  • The ipaddress module recently stopped allowing leading zeros in IPv4 addresses. These were accepted before but some of the external libraries were incidentally interpreting them as octal numbers.

Some of these make you scratch your head.

Unsurprisingly, this is also a lot of work. At this very moment, we are maintaining six different slots of CPython (2.7, 3.6, 3.7, 3.8, 3.9, 3.10). For every security backport set, I start by identifying the security-related commits on the newest release branch. This is easy if they’re accompanied by a news entry in Security category — unfortunately, some vulnerability fixes were treated as regular bug fixes in the past. Once I have a list of commits, I cherry-pick them to our patched branch and make a patchset out of that. This is the easy part.

Now I have to iterate over all the older versions. For maintained branches, the first step is to identify if upstream has already backported their fixes. if they did, I cherry-pick the backport commits from the matching branch. If they did not, I need to verify if the vulnerability applies to this old version, and cherry-pick from a newer version. Sometimes it requires small changes.

The hardest part is Python 2.7 — it is not supported upstream but still used as a build-time dependency by a few projects. The standard library structure in Python 2 differs a lot from the one in Python 3 but many of Python 3 vulnerabilities stem from code dating back to Python 2. In the end, I have to find the relevant piece of original code in Python 2, see if it is vulnerable and usually rewrite the patch for the old code.

I wish this were the end. However, in Gentoo we are also packaging PyPy, the alternative Python implementation. PyPy follows its own release cycle. PyPy2 is based on now-unmaintained Python 2, so most of the vulnerability fixes from our Python 2.7 patchset need to be ported to it. PyPy3 tries to follow Python 3’s standard library but is not very fast at it. In the end, I end up cherry-picking the changes from CPython to PyPy for our patchsets, and at the same time sending them upstream’s way so that they can fix the vulnerabilities without having to sync with CPython immediately.

Historically, we have also supported another alternative implementation, Jython. However, for a very long time upstream’s been stuck on Python 2.5 and we’ve eventually dropped support for Jython as it became unmaintainable. Upstream has eventually caught up with Python 2.7 but you can imagine how useful that is today.

There are other interesting projects out there but I don’t think we have the manpower (or even a very good reason) to work on them.

Now, a fun fact: Victor Stinner indicates in his mail Need help to fix known Python security vulnerabilities that the security fixes CPython receives are only a drop in the ocean. Apparently, at the time of writing there were 78 open problems of various severity.

A bus and a phoenix

Any project where a single developer does a sufficiently large portion of work has a tendency towards a bus factor of one. The Python project was no different. Ten years ago we were relying on a single person doing most of the work. When that person retired, things started going out of hand. Python ended up with complex eclasses that nobody wholly understood and that urgently needed updates. Many packages were getting outdated.

I believe that we did the best that we could at the time. We’ve started the new eclass set because it was much easier to draw a clear line and start over. We ended up removing many of the less important or more problematic packages, from almost the whole net-zope category (around 200 packages) to temporarily removing Django (only to readd it later on). Getting everything at least initially ported took a lot of time. Many of the ports turned out buggy, a large number of packages were outdated, unmaintained, missing test suites.

Today I can finally say that the Python team’s packages standing is reasonably good. However, in order for it to remain good we need to put a lot of effort every day. Packages need to be bumped. Bumps often add new dependencies. New Python versions require testing. Test regressions just keep popping up, and often when you’re working on something else. Stabilizations need to be regularly tracked, and they usually uncover even more obscure test problems. Right now it’s pretty normal for me to spend an hour of my time every day just to take care of the version bumps.

I have tried my best to avoid returning to a bus factor of one. I have tried to keep the eclasses simple (but I can’t call that a complete success), involve more people in the development, keep things well documented and packages clean. Yet I still end up doing a very large portion of work in the team. I know that there are other developers who can stand in for me but I’m not sure if they will be able to take up all the work, given all their other duties.

We really need young blood. We need people who would be able to dedicate a lot of time specifically to Python, and learn all the intricacies of Python packaging. While I’ve done my best to document everything I can think of in the Python Guide, it’s still very little. The Python ecosystem is a very diverse one, and surprisingly hard to maintain without burning out. Of course, there are many packages that are pleasure to maintain, and many upstreams that are pleasure to work with. Unfortunately, there are also many projects that make you really frustrated — with bad quality code, broken tests, lots of NIH dependencies and awful upstreams that simply hate packagers.

Over my 10 (and a half) years as a developer, I have done a lot of different things. However, if I were to point one area of Gentoo where I put most of my effort, that area would be Python. I am proud of all that we’ve been able to accomplish, and how great our team is. However, there are still many challenges ahead of us, as well as a lot of tedious work.

I’m a Gentoo developer for over 10 years already. I’ve been doing a lot of different things throughout that period. However, Python was pretty much always somewhere within my area of interest. I don’t really recall how it all started. Maybe it had something to do with Portage being written in Python. Maybe it was the natural next step after programming in Perl.

I feel like the upcoming switch to Python 3.9 is the last step in the prolonged effort of catching up with Python. Over the last years, we’ve been working real hard to move Python support forward, to bump neglected packages, to enable testing where tests are available, to test packages on new targets and unmask new targets as soon as possible. We have improved the processes a lot. Back when we were switching to Python 3.4, it took almost a year from the first false start attempt to the actual change. We started using Python 3.5 by default after upstream dropped bugfix support for it. In a month from now, we are going to start using Python 3.9 even before 3.10 final is released.

I think this is a great opportunity to look back and see what changed in the Gentoo Python ecosystem, in the last 10 years.

Python package ebuilds 10 years ago

Do you know how a Python package ebuild looked like 10 years ago? Let’s take gentoopm-0.1 as an example (reformatted to fit the narrow layout better):


# Copyright 1999-2011 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
# $Header: /var/cvsroot/gentoo-x86/app-portage/gentoopm/gentoopm-0.1.ebuild,v 1.1 2011/07/15 19:05:26 mgorny Exp $

EAPI=3

PYTHON_DEPEND='*:2.6'
SUPPORT_PYTHON_ABIS=1
RESTRICT_PYTHON_ABIS='2.4 2.5'
DISTUTILS_SRC_TEST=setup.py

inherit base distutils

DESCRIPTION="A common interface to Gentoo package managers"
HOMEPAGE="https://github.com/gentoopm/"
SRC_URI="http://cloud.github.com/downloads/mgorny/${PN}/${P}.tar.bz2"

LICENSE="BSD-2"
SLOT="0"
KEYWORDS="~amd64 ~x86"
IUSE="doc"

RDEPEND="
  || (
    >=sys-apps/portage-2.1.8.3
    sys-apps/pkgcore
    >=sys-apps/paludis-0.64.2[python-bindings]
  )"
DEPEND="dev-python/epydoc"
PDEPEND="app-admin/eselect-package-manager"

src_prepare() {
  base_src_prepare
  distutils_src_prepare
}

src_compile() {
  distutils_src_compile

  if use doc; then
    "$(PYTHON -2)" setup.py doc || die
  fi
}

src_install() {
  distutils_src_install

  if use doc; then
    dohtml -r doc/* || die
  fi
}

This ebuild is actually using the newer API of python.eclass that is enabled via SUPPORT_PYTHON_ABIS. It provides support for installing for multiple implementations (like the modern python-r1 eclass). PYTHON_DEPEND is used to control the dependency string added to ebuild. The magical syntax here means that the ebuild supports both Python 2 and Python 3, from Python 2.6 upwards. RESTRICT_PYTHON_ABIS opts out support for Python versions prior to 2.6. Note the redundancy — PYTHON_DEPEND controls the dependency, specified as a range of Python 2 and/or Python 3 versions, RESTRICT_PYTHON_ABIS controls versions used at build time and needs to explicitly exclude all unsupported branches.

Back then, there were no PYTHON_TARGETS to control what was built. Instead, the eclass defaulted to using whatever was selected via eselect python, with the option to override it via setting USE_PYTHON in make.conf. Therefore, there were no cross-package USE dependencies and you had to run python-updater to verify whether all packages are built for the current interpreter, and rebuild these that were not.

Still, support for multiple ABIs, as the eclass called different branches/implementations of Python, was a major step forward. It was added around the time that the first releases of Python 3 were published, and our users have been relying on it to support a combination of Python 2 and Python 3 for years. Today, we’re primarily using it to aid developers in testing their packages and to provide a safer upgrade experience.

The python.eclass stalemate

Unfortunately, things at the time were not all that great. Late 2010 marks a conflict between the primary Python developer and the rest of the community, primarily due to the major breakage being caused by the changes in Python support. By mid-2011, it was pretty clear that there is no chance to resolve the conflict. The in-Gentoo version of python.eclass was failing to get EAPI 4 support for 6 months already, while an incompatible version continued being developed in the (old) Python overlay. As Dirkjan Ochtman related in his mail:

I guess by now pretty much everyone knows that the python eclass is rather complex, and that this poses some problems. This has also been an important cause for the disagreements between Arfrever and some of the other developers. Since it appears that Arfrever won’t be committing much code to gentoo-x86 in the near future, I’m trying to figure out where we should go with the python.eclass. […]

Dirkjan Ochtman, 2011-06-27, [gentoo-dev] The Python problem

Eventually, some of the changes from the Python overlay were backported to the eclass and EAPI 4 support was added. Nevertheless, at this point it was pretty clear that we need a new way forward. Unfortunately, the discussions were leading nowhere. With the primary eclass maintainer retired, nobody really comprehended most of the eclass, nor were able to afford the time to figure it out. At the same time, involved parties wanted to preserve backwards compatibility while moving forward.

The tie breaker: python-distutils-ng

Some of you might find it surprising that PYTHON_TARGETS are not really a python-r1 invention. Back in March 2012, when Python team was still unable to find a way forward with python.eclass, Krzysztof Pawlik (nelchael) has committed a new python-distutils-ng.eclass. It has never grown popular, and it has been replaced by the python-r1 suite before it ever started being a meaningful replacement for python.eclass. Still, it served an important impulse that made what came after possible.

Here’s a newer gentoopm ebuild using the new eclass (again reformatted):


# Copyright 1999-2012 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
# $Header: /var/cvsroot/gentoo-x86/app-portage/gentoopm/gentoopm-0.2.5-r1.ebuild,v 1.1 2012/05/26 10:11:21 mgorny Exp $

EAPI=4
PYTHON_COMPAT='python2_6 python2_7 python3_1 python3_2'

inherit base python-distutils-ng

DESCRIPTION="A common interface to Gentoo package managers"
HOMEPAGE="https://github.com/mgorny/gentoopm/"
SRC_URI="mirror://github/mgorny/${PN}/${P}.tar.bz2"

LICENSE="BSD-2"
SLOT="0"
KEYWORDS="~amd64 ~mips ~x86 ~x86-fbsd"
IUSE="doc"

RDEPEND="
  || (
    >=sys-apps/portage-2.1.10.3
    sys-apps/pkgcore
    >=sys-apps/paludis-0.64.2[python-bindings]
  )"
DEPEND="doc? ( dev-python/epydoc )"
PDEPEND="app-admin/eselect-package-manager"

python_prepare_all() {
  base_src_prepare
}

src_compile() {
  python-distutils-ng_src_compile
  if use doc; then
    "${PYTHON}" setup.py doc || die
  fi
}

python_install_all() {
  if use doc; then
    dohtml -r doc/*
  fi
}

Just looking at the code, you may see that python-r1 has inherited a lot after this eclass. python-distutils-ng in turn followed some of the good practices introduced before in the ruby-ng eclass. It introduced PYTHON_TARGETS to provide explicit visible control over implementations used for the build — though notably it did not include a way for packages to depend on matching flags (i.e. the equivalent of ${PYTHON_USEDEP}). It also used the sub-phase approach that makes distutils-r1 and ruby-ng eclasses much more convenient than the traditional python.eclass approach that roughly resembled using python_foreach_impl all the time.

What’s really important is that python-distutils-ng carved a way forward. It’s been a great inspiration and a proof of concept. It has shown that we do not have to preserve compatibility with python.eclass forever, or have to learn its inner workings before starting to solve problems. I can’t say how Python would look today if it did not happen but I can say with certainly that python-r1 would not happen so soon if it weren’t for it.

python-r1

In October 2012, the first version of python-r1 was committed. It combined some of the very good ideas of python-distutils-ng with some of my own.  The goal was not to provide an immediate replacement for python.eclass. Instead, the plan was to start simple and add new features as they turned out to be necessary. Not everything went perfectly but I dare say that the design has stood the test of time. While I feel like the eclasses ended up being more complex than I wished they would be, they still work fine with no replacement in sight and they serve as inspiration to other eclasses.

For completeness, here’s a 2017 gentoopm live ebuild that uses a pretty complete distutils-r1 feature set:


# Copyright 1999-2017 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2

EAPI=6
PYTHON_COMPAT=( python{2_7,3_4,3_5,3_6} pypy )

EGIT_REPO_URI="https://github.com/mgorny/gentoopm.git"
inherit distutils-r1 git-r3

DESCRIPTION="A common interface to Gentoo package managers"
HOMEPAGE="https://github.com/mgorny/gentoopm/"
SRC_URI=""

LICENSE="BSD-2"
SLOT="0"
KEYWORDS=""
IUSE="doc"

RDEPEND="
  || (
    >=sys-apps/pkgcore-0.9.4[${PYTHON_USEDEP}]
    >=sys-apps/portage-2.1.10.3[${PYTHON_USEDEP}]
    >=sys-apps/paludis-3.0.0_pre20170219[python,${PYTHON_USEDEP}]
  )"
DEPEND="
  doc? (
    dev-python/epydoc[$(python_gen_usedep python2_7)]
  )"
PDEPEND="app-eselect/eselect-package-manager"

REQUIRED_USE="
  doc? ( $(python_gen_useflags python2_7) )"

src_configure() {
  use doc && DISTUTILS_ALL_SUBPHASE_IMPLS=( python2.7 )
  distutils-r1_src_configure
}

python_compile_all() {
  use doc && esetup.py doc
}

python_test() {
  esetup.py test
}

python_install_all() {
  use doc && local HTML_DOCS=( doc/. )
  distutils-r1_python_install_all
}

The new eclasses have employed the same PYTHON_TARGETS flags for the general implementation choice but also added PYTHON_SINGLE_TARGET to make choosing the implementation more convenient (and predictable at the same time) when the package did not permit choosing more than one. The distutils-r1 eclass reused the great idea of sub-phases to make partial alterations to the phases easier.

The unique ideas included:

  • the split into more eclasses by functionality (vs switching the mode via variables as done in python.eclass)
  • exposing dependencies and REQUIRED_USE constraints via variables to be used in ebuild instead of writing elaborate mechanisms for adding dependencies
  • avoiding command substitution in global scope to keep metadata regeneration fast

The eclasses evolved a lot over the years. Some of the original ideas turned out pretty bad, e.g. trying to run sub-phases in parallel (which broke a lot of stuff for minor performance gain) or the horribly complex original interaction between PYTHON_TARGETS and PYTHON_SINGLE_TARGET. Ebuilds often missed dependencies and REQUIRED_USE constraints, until we finally made the pkgcheck-based CI report that.

The migration to new eclasses took many years. Initially, python-r1 ebuilds were even depending on python.eclass ebuilds. The old eclass was not removed until March 2017, i.e. 4.5 years after introducing the new eclasses.

Testing on new Python targets

We went for the opt-in approach with PYTHON_COMPAT. This means that for every new Python target added, we start with no packages supporting it and have to iterate over all of the packages adding the support. It’s a lot of work and it has repeatedly caused users pain due to packages not being ported in time for the big switch to the next version. Some people have complained about that and suggested that we should go for opt-out instead. However, if you think about it, opt-in is the only way to go.

The big deal is that for any particular package to support a new implementation, all of its Python dependencies need to support it as well. With the opt-in approach, it means that we’re doing the testing dependency-first, and reaching the bigger packages only when we confirm that there’s at least a single version of every dependency that works for it. If we do things right, users don’t even see any regressions.

If we went for the opt-out approach, all packages would suddenly claim to support the new version. Now, this wouldn’t be that bad if we were actually able to do a big CI run for all packages — but we can’t since a lot of them do not have proper tests, and Python version incompatibility often can’t be detected statically. In the end, we would be relying on someone (possibly an user) reporting that something is broken. Then we’d have to investigate where in the dependency chain the culprit is, and either restrict the new target (and then restrict it in all its reverse dependencies) or immediately fix it.

So in the end, opt-out would be worse for both users and developers. Users would hit package issues first hand, and developers would have to spend significant time on the back-and-forth effort of removing support for new targets, and then adding it again. If we are to add new targets early (which is a worthwhile goal), we have to expect incompatible packages. My experience so far shows that Gentoo developers sometimes end up being the first people to submit patches fixing the incompatibility. This can be a real problem given that many Python packages have slow release cycles, and are blocking their reverse dependencies, and these in turn block their reverse dependencies and so on.

Now, things did not always go smoothly. I have prepared a Python release and Gentoo packaging timeline that puts our packaging work into perspective. As you can see, we were always quite fast in packaging new Python interpreters but it took a significant time to actually switch the default targets to them — in fact, we often switched just before or even after upstream stopped providing bug fixes to the version in question.

Our approach has changed over the years. Early on, we generally kept both the interpreter and the target in ~arch, and stabilized them in order to switch targets. The prolonged stable-mask of the new target has resulted in inconsistent presence of support for the new target in stable, and this in turn involved a lot of last-minute stabilization work. Even then, we ended up switching targets before stable was really ready for that. This was particularly bad for Python 3.4 — the period seen as ‘stable’ on the timeline is actually a period following the first unsuccessful switch of the default. It took us over half a year to try again.

Then (around Python 3.6, if I’m not mistaken) we switched to a different approach. Instead of delaying till the expected switch, we’ve tried to unmask the target on stable systems as soon as possible. This way, we started enforcing dependency graph consistency earlier and were able to avoid big last minute stabilizations needed to unmask the target.

Eventually, thanks to pkgcheck’s superior StableRequestCheck, I’ve started proactively stabilizing new versions of Python packages. This was probably the most important improvement of all. The stabilization effort was streamlined, new versions of packages gained stable keywords sooner and along them did the support for new Python implementations.

The effort in package testing and stabilizations have finally made it possible to catch up with upstream. We have basically gone through a target switching sprint, moving through 3.7 and 3.8 in half a year each. The upcoming switch to Python 3.9 concludes this effort. For the first time in years, our default will be supported upstream for over 6 months.

That said, it is worth noting that things are not becoming easier for us. Over time, Python packages keep getting new dependencies, and this means that every new Python version will involve more and more porting work. Unfortunately, some high profile packages such as setuptools and pytest keep creating bigger and bigger dependency loops. At this point, it is no longer reasonable to attempt to port all the cyclic dependencies simultaneously. Instead, I tend to temporarily disable tests for the initial ports to reduce the number of dependencies. I’ve included a list of suggested initial steps in the Python Guide to ease future porting efforts.

Packaging the Python interpreter

CPython is not the worst thing to package but it’s not the easiest one either. Admittedly, the ebuilds were pretty good back when I joined. However, we’ve already carried a pretty large and mostly undocumented set of patches, and most of these patches we carry up to this day, with no chance of upstreaming them.

Some of the involved patches are build fixes and hacks that are either specific to Gentoo, or bring the flexibility Gentoo cares about (such as making some of the USE flags possible). There are also some fixes for old bugs that upstream has not shown any interest in fixing.

Another part of release management is resolving security bugs. Until recently, we did not track vulnerabilities in CPython very well. Thanks to the work of new Security team members, we have started being informed of vulnerabilities earlier. At the same time, we realized that CPython’s treatment of vulnerabilities is a bit suboptimal.

Admittedly, when very bad things happen upstream releases fixes quickly. However, non-critical vulnerability fixes are released as part of the normal release cycle, approximately every two months. For some time already, every new release contained some security fixes and had to be stabilized quickly. At the same time it contained many other changes with their own breakage potential. Old Python branches were even worse — absurdly, even though these versions received security fixes only, the releases were even rarer.

In the end, I’ve decided that it makes more sense to backport security fixes to our patchset as soon as I become aware of them, and stabilize the patch bumps instead. I do this even if upstream makes a new release at the same time, since patch bumps are safer stable targets. Even then, some of the security fixes actually require changing the behavior. To name a few recent changes:

  • urllib.parse.parse_qsl() historically used to split the URL query string on either & or ;. This somewhat surprising behavior was changed to split only on &, with a parameter to change the separator (but no option to restore the old behavior).
  • urllib.parse.urlparse() historically preserved newlines, CRs and tabs. In the latest Python versions this behavior was changed to follow a newer recommendation of stripping these characters. As a side effect, some URL validators (e.g. in Django) suddenly stopped rejecting URLs with newlines.
  • The ipaddress module recently stopped allowing leading zeros in IPv4 addresses. These were accepted before but some of the external libraries were incidentally interpreting them as octal numbers.

Some of these make you scratch your head.

Unsurprisingly, this is also a lot of work. At this very moment, we are maintaining six different slots of CPython (2.7, 3.6, 3.7, 3.8, 3.9, 3.10). For every security backport set, I start by identifying the security-related commits on the newest release branch. This is easy if they’re accompanied by a news entry in Security category — unfortunately, some vulnerability fixes were treated as regular bug fixes in the past. Once I have a list of commits, I cherry-pick them to our patched branch and make a patchset out of that. This is the easy part.

Now I have to iterate over all the older versions. For maintained branches, the first step is to identify if upstream has already backported their fixes. if they did, I cherry-pick the backport commits from the matching branch. If they did not, I need to verify if the vulnerability applies to this old version, and cherry-pick from a newer version. Sometimes it requires small changes.

The hardest part is Python 2.7 — it is not supported upstream but still used as a build-time dependency by a few projects. The standard library structure in Python 2 differs a lot from the one in Python 3 but many of Python 3 vulnerabilities stem from code dating back to Python 2. In the end, I have to find the relevant piece of original code in Python 2, see if it is vulnerable and usually rewrite the patch for the old code.

I wish this were the end. However, in Gentoo we are also packaging PyPy, the alternative Python implementation. PyPy follows its own release cycle. PyPy2 is based on now-unmaintained Python 2, so most of the vulnerability fixes from our Python 2.7 patchset need to be ported to it. PyPy3 tries to follow Python 3’s standard library but is not very fast at it. In the end, I end up cherry-picking the changes from CPython to PyPy for our patchsets, and at the same time sending them upstream’s way so that they can fix the vulnerabilities without having to sync with CPython immediately.

Historically, we have also supported another alternative implementation, Jython. However, for a very long time upstream’s been stuck on Python 2.5 and we’ve eventually dropped support for Jython as it became unmaintainable. Upstream has eventually caught up with Python 2.7 but you can imagine how useful that is today.

There are other interesting projects out there but I don’t think we have the manpower (or even a very good reason) to work on them.

Now, a fun fact: Victor Stinner indicates in his mail Need help to fix known Python security vulnerabilities that the security fixes CPython receives are only a drop in the ocean. Apparently, at the time of writing there were 78 open problems of various severity.

A bus and a phoenix

Any project where a single developer does a sufficiently large portion of work has a tendency towards a bus factor of one. The Python project was no different. Ten years ago we were relying on a single person doing most of the work. When that person retired, things started going out of hand. Python ended up with complex eclasses that nobody wholly understood and that urgently needed updates. Many packages were getting outdated.

I believe that we did the best that we could at the time. We’ve started the new eclass set because it was much easier to draw a clear line and start over. We ended up removing many of the less important or more problematic packages, from almost the whole net-zope category (around 200 packages) to temporarily removing Django (only to readd it later on). Getting everything at least initially ported took a lot of time. Many of the ports turned out buggy, a large number of packages were outdated, unmaintained, missing test suites.

Today I can finally say that the Python team’s packages standing is reasonably good. However, in order for it to remain good we need to put a lot of effort every day. Packages need to be bumped. Bumps often add new dependencies. New Python versions require testing. Test regressions just keep popping up, and often when you’re working on something else. Stabilizations need to be regularly tracked, and they usually uncover even more obscure test problems. Right now it’s pretty normal for me to spend an hour of my time every day just to take care of the version bumps.

I have tried my best to avoid returning to a bus factor of one. I have tried to keep the eclasses simple (but I can’t call that a complete success), involve more people in the development, keep things well documented and packages clean. Yet I still end up doing a very large portion of work in the team. I know that there are other developers who can stand in for me but I’m not sure if they will be able to take up all the work, given all their other duties.

We really need young blood. We need people who would be able to dedicate a lot of time specifically to Python, and learn all the intricacies of Python packaging. While I’ve done my best to document everything I can think of in the Python Guide, it’s still very little. The Python ecosystem is a very diverse one, and surprisingly hard to maintain without burning out. Of course, there are many packages that are pleasure to maintain, and many upstreams that are pleasure to work with. Unfortunately, there are also many projects that make you really frustrated — with bad quality code, broken tests, lots of NIH dependencies and awful upstreams that simply hate packagers.

Over my 10 (and a half) years as a developer, I have done a lot of different things. However, if I were to point one area of Gentoo where I put most of my effort, that area would be Python. I am proud of all that we’ve been able to accomplish, and how great our team is. However, there are still many challenges ahead of us, as well as a lot of tedious work.

March 26 2021

LetsEncrypt SSL certificates for vhosts within the mail stack – via Postfix (SMTP) and Dovecot (IMAP)

Nathan Zachary (nathanzachary) March 26, 2021, 4:36

For a long time, I didn’t care about using self-signed SSL certificates for the mail stack because 1) they still secured the connection to the server, and 2) those certificates weren’t seen or utilised by anyone other than me. However, for my datacentre infrastructure which houses clients’ websites and email, using self-signed or generic certificates (even for the mail stack) wasn’t a very good solution as mail clients (e.g Thunderbird or Outlook) notify users of the problematic certs. Clients with dedicated mail servers could use valid certificates (freely from LetsEncrypt) without problem, but those on shared infrastructure posed a different issue—how can mail for different domains all sharing the same IPv4 address use individualised certificates for their SMTP and IMAP connections? This article will explain the method that I used to assign domain-specific certificates for the full mail stack using LetsEncrypt’s certbot for the certs themselves, the Postfix MTA (mail transfer agent [for SMTP]), and the Dovecot IMAP server. This article is tailored to Gentoo Linux, but should be easily applied to nearly any distribution. Two general caveats are:

  • The locations of files may be different for your distribution. Consult your distribution’s documentation for the appropriate file locations
  • I use OpenRC as my init system in Gentoo. If your distribution uses systemd, you will need to replace any reference to code snippets containing /etc/init.d/$SERVICE $ACTION with systemctl $ACTION $SERVICE
    • As an example, I mention restarting Dovecot with /etc/init.d/dovecot restart
    • A systemd user would instead issue systemctl restart dovecot

I will provide a significant amount of ancillary information pertaining to each step of this process. If you are comfortable with the mail stack, some of it may be rudimentary, so feel free to skip ahead. Conversely, if any of the concepts are new or foreign to you, please reference Gentoo’s documentation for setting up a Complete Virtual Mail Server as a prerequisite. For the remainder of the article, I am going to use two hypothetical domains of domain1.com and domain2.com. Any time that they are referenced, you will want to replace them with your actual domains.

BIND (DNS) configurations

In order to keep the web stack and mail stack separate in terms of DNS, I like to have mail.domain1.com & mail.domain2.com subdomains for the MX records, but just have them point to the same A record used for the website. Some may consider this to be unnecessary, but I have found the separation helpful for troubleshooting if any problems should arise. Here are the relevant portions of the zone files for each domain:

# grep -e 'A\|MX ' /etc/bind/pri/domain1.com.external.zone 
domain1.com.		300	IN	A	$IP_ADDRESS
mail.domain1.com.	300	IN	A	$IP_ADDRESS

# grep -e 'A\|MX ' /etc/bind/pri/domain2.com.external.zone 
domain2.com.		300	IN	A	$IP_ADDRESS
mail.domain2.com.	300	IN	A	$IP_ADDRESS

In the above snippets, $IP_ADDRESS should be the actual IPv4 address of the webserver. It should be noted that, in this setup, the web stack and the mail stack reside on the same physical host, so the IP is the same for both stacks.

Apache (webserver) configurations

As mentioned above, I keep the web stack and mail stack separate in terms of DNS. For the LetsEncrypt certificates (covered in the next section), though, I use the same certificate for both stacks. I do so by generating the cert for both the main domain and the ‘mail’ subdomain. In order for this to work, I make the ‘mail’ subdomain a ServerAlias in the Apache vhost configurations:

# grep -e 'ServerName \|ServerAlias ' www.domain1.com.conf 
	ServerName domain1.com
	ServerAlias www.domain1.com mail.domain1.com

# grep -e 'ServerName \|ServerAlias ' www.domain2.com.conf 
	ServerName domain2.com
	ServerAlias www.domain2.com mail.domain2.com

This allows the verification of the ‘mail’ subdomain to be done via the main URL instead of requiring a separate public-facing site directory for it.

LetsEncrypt SSL certificates

LetsEncrypt is a non-profit certificate authority (CA) that provides X.509 (TLS) certificates free-of-charge. The issued certificates are only valid for 90 days, which encourages automated processes to handle renewals. The recommended method is to use the certbot tool for renewals, and there are many plugins available that provide integration with various webservers. Though I run a combination of Apache and NGINX, I prefer to not have certbot directly interact with them. Rather, I choose to rely on certbot solely for the certificate generation & renewal, and to handle the installation thereof via other means. For this tutorial, I will use the ‘certonly’ option with the webroot plugin:

# /usr/bin/certbot certonly --agree-tos --non-interactive --webroot --webroot-path /var/www/domains/$DOMAIN/$HOST/htdocs/ --domains $DOMAIN,$DOMAIN,$DOMAIN

In the code snippet above, you will replace $DOMAIN with the actual domain, and $HOST with the subdomain. So, for our two hypothetical domains, the commands translate as:

# /usr/bin/certbot certonly --agree-tos --non-interactive --webroot --webroot-path /var/www/domains/domain1.com/www/htdocs/ --domains domain1.com,www.domain1.com,mail.domain1.com

# /usr/bin/certbot certonly --agree-tos --non-interactive --webroot --webroot-path /var/www/domains/domain2.com/www/htdocs/ --domains domain2.com,www.domain2.com,mail.domain2.com

The webroot plugin will create a temporary file under ${webroot-path}/.well-known/acme-challenge/ and then check that file via HTTP in order to validate the server. Make sure that the directory is publicly accessible or else the validation will fail. Once certbot validates the listed domains—in this setup, the ‘www’ and ‘mail’ subdomains are just aliases to the primary domain (see the BIND and Apache configurations sections above)—it will generate the SSL certificates and place them under /etc/letsencrypt/live/domain1.com/ and /etc/letsencrypt/live/domain2.com/, respectively. There will be two files for each certificate:

  • fullchain.pem –> the public certificate
  • privkey.pem –> the private key for the certificate

Though these certificates are often used in conjunction with the web stack, we are going to use them for securing the mail stack as well.

Dovecot (IMAP) configurations

Now that we have the certificates for each domain, we’ll start by securing the IMAP server (i.e. Dovecot) so that the users’ Mail User Agent (MUA, or more colloquially, “email client” [like Thunderbird or Outlook]) will no longer require a security exception due to a domain mismatch. Adding the domain-specific SSL certificate to Dovecot is a straightforward process that only requires two directives per domain. For domain1.com and domain2.com, add the following lines to /etc/dovecot/conf.d/10-ssl.conf:

local_name mail.domain1.com {
  ssl_cert = </etc/letsencrypt/live/domain1.com/fullchain.pem
  ssl_key = </etc/letsencrypt/live/domain1.com/privkey.pem
}

local_name mail.domain2.com {
  ssl_cert = </etc/letsencrypt/live/domain2.com/fullchain.pem
  ssl_key = </etc/letsencrypt/live/domain2.com/privkey.pem
}

Those code blocks can be copied and pasted for any additional virtual hosts or domains that are needed. As with any configuration change, make sure to restart the application in order to make the changes active:

/etc/init.d/dovecot restart
Postfix (SMTP) configurations

Configuring Dovecot to use the SSL certificate for securing the IMAP connection from the user’s email client to the server is only one part of the process—namely the connection when the user is retrieving mails from the server. This next part will use the same certificate to secure the SMTP connection (via the Postfix SMTP server) for sending mails.

The first step is to create a file that will be used for mapping each certificate to its respective domain. Postfix can handle this correlation via Server Name Indication (SNI), which is an extension of the TLS protocol that indicates the hostname of the server at the beginning of the handshake process. Though there is no naming requirement for this map file, I chose to create it as /etc/postfix/vmail_ssl. The format of the file is:

$DOMAIN   $PRIVATE_KEY   $PUBLIC_KEY_CHAIN

So for our example of domain1.com and domain2.com, the file would consist of the following entries:

mail.domain1.com /etc/letsencrypt/live/domain1.com/privkey.pem /etc/letsencrypt/live/domain1.com/fullchain.pem

mail.domain2.com /etc/letsencrypt/live/domain2.com/privkey.pem /etc/letsencrypt/live/domain2.com/fullchain.pem

Though this file is in plaintext, Postfix doesn’t understand the mapping in this format. Instead, a base64-encoded Berkeley DB file needs to be used. Thankfully, Postfix makes creating such a file very easy via the postmap utility. Once you have created and populated your /etc/postfix/vmail_ssl file with the entries for each domain, issue the following command:

# postmap -F hash:/etc/postfix/vmail_ssl

which will create the Berkeley DB file (named vmail_ssl.db) in the same directory:

# find /etc/postfix/ -type f -iname '*vmail_ssl*'
/etc/postfix/vmail_ssl.db
/etc/postfix/vmail_ssl

# file /etc/postfix/vmail_ssl
/etc/postfix/vmail_ssl: ASCII text

# file /etc/postfix/vmail_ssl.db 
/etc/postfix/vmail_ssl.db: Berkeley DB (Hash, version 10, native byte-order)

Now that we have created the mapping table, we have to configure Postfix to use it for SMTP connections. It’s acceptable to have both the SNI-mapped certificates AND a generic SSL certificate as the default (for when a domain isn’t listed in the mapping table). Postfix can have both directives specified simultaneously. To do so, the following directives need to be added to /etc/postfix/main.cf (the comments explain both sets of directives):

## Default SSL cert for SMTP if SNI is not enabled
smtpd_tls_cert_file = /etc/ssl/mail/server.pem
smtpd_tls_key_file = /etc/ssl/mail/server.key

## Mappings for SMTP SSL certs when SNI is enabled
tls_server_sni_maps = hash:/etc/postfix/vmail_ssl

After making those modifications to Postfix’s main configuration file, it is required to restart it:

/etc/init.d/postfix restart

That’s it! Now the full mail stack is secured using domain-specific SSL certificates for both IMAP and SMTP connections. The remaining sections below will explain some maintenance-related procedures such as handling the LetsEncrypt certificate renewals & updating the mappings in Postfix automatically, as well as verifying it’s all working as intended (and some troubleshooting tips in case it’s not). ♦

Automatic renewal (cron) configurations

As mentioned in the LetsEncrypt section above, the certificates that they issue are only valid for a period of 90 days. One of the reasons for the relatively short validity period is to encourage automation when it comes to renewing them. I choose to handle the renewals automatically via cron:

# tail -n 4 /var/spool/cron/crontabs/root 

## LetsEncrypt certificate renewals on first of each month
## See /etc/letsencrypt/renewal-hooks/post/ for Postfix & Apache hooks 
0  2  1  *  *   /usr/bin/certbot renew --quiet

This cron entry instructs LetsEncrypt’s certbot to check the validity of ALL certificates at 02:00 (server time) on the first of every month (if that format is unfamiliar to you, see Wikipedia’s article on cron). The renew subcommand will automatically generate a new certificate for any found to expire within the next 30 days, and the quiet option will silence any output except for errors, which is appropriate for use with a cron job.

That’s the procedure for renewing the certificate automatically, but what about then automatically updating the appropriate stack configurations—in particular, Postfix’s vmail_ssl mappings table (and Apache, but that’s outside the scope of this tutorial)? If the certificate is renewed, but it is not updated in Postfix’s hash table, there will be a mismatch error. As mentioned in the comment on the cron entry, I chose to handle those configuration updates automatically via certbot’s ‘renewal hooks’, which can be found under /etc/letsencrypt/renewal-hooks/. In this case, the configuration updates need to happen after certificate renewal, so they are put under the post/ subdirectory.

I have two scripts that run after a certificate renewal, but only the 01_postfix_smtp_ssl.sh one is applicable for the mail stack:

# ls /etc/letsencrypt/renewal-hooks/post/
01_postfix_smtp_ssl.sh  02_apache.sh

# cat /etc/letsencrypt/renewal-hooks/post/01_postfix_smtp_ssl.sh 
#!/bin/bash
/usr/sbin/postmap -F hash:/etc/postfix/vmail_ssl
/etc/init.d/postfix restart
exit 0

The simple script issues the same postmap command from the ‘Postfix (SMTP) configurations‘ section above, and then restarts Postfix. If everything goes smoothly, it will exit cleanly (‘exit 0’). The script ensures that the new certificate is immediately applied to the Postfix configuration so that there aren’t validation errors after the automated renewal process.

Verification of the certificates

If everything went according to plan, valid SSL certificates should be in place for both mail.domain1.com and mail.domain2.com. Like any good engineer, though, we don’t want to just assume that it’s working as intended. So… we should test it! You could just open an email client of your choice and view the certificates for IMAP and SMTP connections. Personally, though, I prefer using terminal-based utilities as I find them to be more efficient. In this case, we can use the openssl command for connecting to each domain as a test, and the basic syntax is:

For SMTP:

openssl s_client -connect mail.domain1.com:25 -servername mail.domain1.com -starttls smtp

For IMAP:

openssl s_client -connect mail.domain1.com:993 -servername mail.domain1.com

These commands will output a lot of information including the full public certificate, the issuing authority (LetsEncrypt), handshake details, SSL session details, and so on. If you’re interested in all of those details, feel free to issue the commands as they are above (obviously swapping out the actual domains and the ports that you use for SMTP and IMAP). If, however, you simply want to confirm that the certificates are valid, you can pipe the commands to grep in order to limit the output:

For SMTP:

$ openssl s_client -connect mail.domain1.com:25 -servername mail.domain1.com -starttls smtp | grep -e 'subject=CN \|Verify return code:'
depth=2 O = Digital Signature Trust Co., CN = DST Root CA X3
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = R3
verify return:1
depth=0 CN = domain1.com
verify return:1
250 CHUNKING
subject=CN = domain1.com
Verify return code: 0 (ok)

For IMAP:

$ openssl s_client -connect mail.domain1.com:993 -servername mail.domain1.com | grep -e 'subject=CN \|Verify return code:'
depth=2 O = Digital Signature Trust Co., CN = DST Root CA X3
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = R3
verify return:1
depth=0 CN = domain1.com
verify return:1
subject=CN = domain1.com
Verify return code: 0 (ok)

If you see output similar to what’s above, then everything is working as it should be. In particular, you want to make sure that references to the ‘CN’ match the domain, and that you see a ‘Verify return code:’ of 0 (ok) Pat yourself on the back and grab your beverage of choice to celebrate a job well done. ♦

Additional information

If you have already been using a domain for a website or other service, chances are that you have already generated a LetsEncrypt SSL certificate for it. Thankfully LetsEncrypt makes it easy to append a new subdomain to an existing certificate instead of having to generate a completely separate one for the ‘mail’ subdomain used in this guide (e.g. mail.domain1.com).

The first step is to find the certificate that you want to modify (in this case, domain1.com) and see which subdomains are covered under it. This can be accomplished using the certbot certificates command. The output will look something like this:

Certificate Name: domain1.com
   Serial Number: $some_alphanumeric_string
   Key Type: RSA
   Domains: domain1.com www.domain1.com staging.domain1.com
   Expiry Date: 2021-05-02 06:03:19+00:00 (VALID: 60 days)
   Certificate Path: /etc/letsencrypt/live/domain1.com/fullchain.pem
   Private Key Path: /etc/letsencrypt/live/domain1.com/privkey.pem

The important part is the list of subdomains on the Domains: line because you need to reference ALL of them when using the --expand flag that follows. Using the output from above, the command would be constructed as:

# /usr/bin/certbot certonly --agree-tos --non-interactive --webroot --webroot-path /var/www/domains/domain1.com/www/htdocs/ --domains domain1.com,www.domain1.com,staging.domain1.com,mail.domain1.com --expand

If certbot indicates that the new certificate has been generated without any errors, you can check it again using the certbot certificates command from above and validate that now the ‘mail’ subdomain is listed as well:

Certificate Name: domain1.com
   Serial Number: $some_alphanumeric_string
   Key Type: RSA
   Domains: domain1.com www.domain1.com staging.domain1.com mail.domain1.com
   Expiry Date: 2021-05-14 06:05:19+00:00 (VALID: 60 days)
   Certificate Path: /etc/letsencrypt/live/domain1.com/fullchain.pem
   Private Key Path: /etc/letsencrypt/live/domain1.com/privkey.pem
Troubleshooting

I can’t anticipate the full gamut of problems that could potentially arise when going through this guide, but I will try to cover some common pitfalls here. If you run into a problem, feel free to comment and I will try to help you through it.

>>> Postfix error about the table hash:

If Postfix won’t start after the modifications from the sections above, and you see a line like this in the mail logs:

[postfix/smtpd] warning: table hash:/etc/postfix/vmail_ssl.db: key mail.domain1.com: malformed BASE64 value: /etc/letsencrypt/live/domain1

then the problem stems from running postmap without the -F flag. Try it again with that flag: postmap -F hash:/etc/postfix/vmail_ssl which should create a syntactically correct hash table, allowing Postfix to properly start up.

For a long time, I didn’t care about using self-signed SSL certificates for the mail stack because 1) they still secured the connection to the server, and 2) those certificates weren’t seen or utilised by anyone other than me. However, for my datacentre infrastructure which houses clients’ websites and email, using self-signed or generic certificates (even for the mail stack) wasn’t a very good solution as mail clients (e.g Thunderbird or Outlook) notify users of the problematic certs. Clients with dedicated mail servers could use valid certificates (freely from LetsEncrypt) without problem, but those on shared infrastructure posed a different issue—how can mail for different domains all sharing the same IPv4 address use individualised certificates for their SMTP and IMAP connections? This article will explain the method that I used to assign domain-specific certificates for the full mail stack using LetsEncrypt’s certbot for the certs themselves, the Postfix MTA (mail transfer agent [for SMTP]), and the Dovecot IMAP server. This article is tailored to Gentoo Linux, but should be easily applied to nearly any distribution. Two general caveats are:

  • The locations of files may be different for your distribution. Consult your distribution’s documentation for the appropriate file locations
  • I use OpenRC as my init system in Gentoo. If your distribution uses systemd, you will need to replace any reference to code snippets containing /etc/init.d/$SERVICE $ACTION with systemctl $ACTION $SERVICE
    • As an example, I mention restarting Dovecot with /etc/init.d/dovecot restart
    • A systemd user would instead issue systemctl restart dovecot

I will provide a significant amount of ancillary information pertaining to each step of this process. If you are comfortable with the mail stack, some of it may be rudimentary, so feel free to skip ahead. Conversely, if any of the concepts are new or foreign to you, please reference Gentoo’s documentation for setting up a Complete Virtual Mail Server as a prerequisite. For the remainder of the article, I am going to use two hypothetical domains of domain1.com and domain2.com. Any time that they are referenced, you will want to replace them with your actual domains.

BIND (DNS) configurations

In order to keep the web stack and mail stack separate in terms of DNS, I like to have mail.domain1.com & mail.domain2.com subdomains for the MX records, but just have them point to the same A record used for the website. Some may consider this to be unnecessary, but I have found the separation helpful for troubleshooting if any problems should arise. Here are the relevant portions of the zone files for each domain:

# grep -e 'A\|MX ' /etc/bind/pri/domain1.com.external.zone 
domain1.com.		300	IN	A	$IP_ADDRESS
mail.domain1.com.	300	IN	A	$IP_ADDRESS

# grep -e 'A\|MX ' /etc/bind/pri/domain2.com.external.zone 
domain2.com.		300	IN	A	$IP_ADDRESS
mail.domain2.com.	300	IN	A	$IP_ADDRESS

In the above snippets, $IP_ADDRESS should be the actual IPv4 address of the webserver. It should be noted that, in this setup, the web stack and the mail stack reside on the same physical host, so the IP is the same for both stacks.

Apache (webserver) configurations

As mentioned above, I keep the web stack and mail stack separate in terms of DNS. For the LetsEncrypt certificates (covered in the next section), though, I use the same certificate for both stacks. I do so by generating the cert for both the main domain and the ‘mail’ subdomain. In order for this to work, I make the ‘mail’ subdomain a ServerAlias in the Apache vhost configurations:

# grep -e 'ServerName \|ServerAlias ' www.domain1.com.conf 
	ServerName domain1.com
	ServerAlias www.domain1.com mail.domain1.com

# grep -e 'ServerName \|ServerAlias ' www.domain2.com.conf 
	ServerName domain2.com
	ServerAlias www.domain2.com mail.domain2.com

This allows the verification of the ‘mail’ subdomain to be done via the main URL instead of requiring a separate public-facing site directory for it.

LetsEncrypt SSL certificates

LetsEncrypt is a non-profit certificate authority (CA) that provides X.509 (TLS) certificates free-of-charge. The issued certificates are only valid for 90 days, which encourages automated processes to handle renewals. The recommended method is to use the certbot tool for renewals, and there are many plugins available that provide integration with various webservers. Though I run a combination of Apache and NGINX, I prefer to not have certbot directly interact with them. Rather, I choose to rely on certbot solely for the certificate generation & renewal, and to handle the installation thereof via other means. For this tutorial, I will use the ‘certonly’ option with the webroot plugin:

# /usr/bin/certbot certonly --agree-tos --non-interactive --webroot --webroot-path /var/www/domains/$DOMAIN/$HOST/htdocs/ --domains $DOMAIN,$DOMAIN,$DOMAIN

In the code snippet above, you will replace $DOMAIN with the actual domain, and $HOST with the subdomain. So, for our two hypothetical domains, the commands translate as:

# /usr/bin/certbot certonly --agree-tos --non-interactive --webroot --webroot-path /var/www/domains/domain1.com/www/htdocs/ --domains domain1.com,www.domain1.com,mail.domain1.com

# /usr/bin/certbot certonly --agree-tos --non-interactive --webroot --webroot-path /var/www/domains/domain2.com/www/htdocs/ --domains domain2.com,www.domain2.com,mail.domain2.com

The webroot plugin will create a temporary file under ${webroot-path}/.well-known/acme-challenge/ and then check that file via HTTP in order to validate the server. Make sure that the directory is publicly accessible or else the validation will fail. Once certbot validates the listed domains—in this setup, the ‘www’ and ‘mail’ subdomains are just aliases to the primary domain (see the BIND and Apache configurations sections above)—it will generate the SSL certificates and place them under /etc/letsencrypt/live/domain1.com/ and /etc/letsencrypt/live/domain2.com/, respectively. There will be two files for each certificate:

  • fullchain.pem –> the public certificate
  • privkey.pem –> the private key for the certificate

Though these certificates are often used in conjunction with the web stack, we are going to use them for securing the mail stack as well.

Dovecot (IMAP) configurations

Now that we have the certificates for each domain, we’ll start by securing the IMAP server (i.e. Dovecot) so that the users’ Mail User Agent (MUA, or more colloquially, “email client” [like Thunderbird or Outlook]) will no longer require a security exception due to a domain mismatch. Adding the domain-specific SSL certificate to Dovecot is a straightforward process that only requires two directives per domain. For domain1.com and domain2.com, add the following lines to /etc/dovecot/conf.d/10-ssl.conf:

local_name mail.domain1.com {
  ssl_cert = </etc/letsencrypt/live/domain1.com/fullchain.pem
  ssl_key = </etc/letsencrypt/live/domain1.com/privkey.pem
}

local_name mail.domain2.com {
  ssl_cert = </etc/letsencrypt/live/domain2.com/fullchain.pem
  ssl_key = </etc/letsencrypt/live/domain2.com/privkey.pem
}

Those code blocks can be copied and pasted for any additional virtual hosts or domains that are needed. As with any configuration change, make sure to restart the application in order to make the changes active:

/etc/init.d/dovecot restart

Postfix (SMTP) configurations

Configuring Dovecot to use the SSL certificate for securing the IMAP connection from the user’s email client to the server is only one part of the process—namely the connection when the user is retrieving mails from the server. This next part will use the same certificate to secure the SMTP connection (via the Postfix SMTP server) for sending mails.

The first step is to create a file that will be used for mapping each certificate to its respective domain. Postfix can handle this correlation via Server Name Indication (SNI), which is an extension of the TLS protocol that indicates the hostname of the server at the beginning of the handshake process. Though there is no naming requirement for this map file, I chose to create it as /etc/postfix/vmail_ssl. The format of the file is:

$DOMAIN   $PRIVATE_KEY   $PUBLIC_KEY_CHAIN

So for our example of domain1.com and domain2.com, the file would consist of the following entries:

mail.domain1.com /etc/letsencrypt/live/domain1.com/privkey.pem /etc/letsencrypt/live/domain1.com/fullchain.pem

mail.domain2.com /etc/letsencrypt/live/domain2.com/privkey.pem /etc/letsencrypt/live/domain2.com/fullchain.pem

Though this file is in plaintext, Postfix doesn’t understand the mapping in this format. Instead, a base64-encoded Berkeley DB file needs to be used. Thankfully, Postfix makes creating such a file very easy via the postmap utility. Once you have created and populated your /etc/postfix/vmail_ssl file with the entries for each domain, issue the following command:

# postmap -F hash:/etc/postfix/vmail_ssl

which will create the Berkeley DB file (named vmail_ssl.db) in the same directory:

# find /etc/postfix/ -type f -iname '*vmail_ssl*'
/etc/postfix/vmail_ssl.db
/etc/postfix/vmail_ssl

# file /etc/postfix/vmail_ssl
/etc/postfix/vmail_ssl: ASCII text

# file /etc/postfix/vmail_ssl.db 
/etc/postfix/vmail_ssl.db: Berkeley DB (Hash, version 10, native byte-order)

Now that we have created the mapping table, we have to configure Postfix to use it for SMTP connections. It’s acceptable to have both the SNI-mapped certificates AND a generic SSL certificate as the default (for when a domain isn’t listed in the mapping table). Postfix can have both directives specified simultaneously. To do so, the following directives need to be added to /etc/postfix/main.cf (the comments explain both sets of directives):

## Default SSL cert for SMTP if SNI is not enabled
smtpd_tls_cert_file = /etc/ssl/mail/server.pem
smtpd_tls_key_file = /etc/ssl/mail/server.key

## Mappings for SMTP SSL certs when SNI is enabled
tls_server_sni_maps = hash:/etc/postfix/vmail_ssl

After making those modifications to Postfix’s main configuration file, it is required to restart it:

/etc/init.d/postfix restart

That’s it! Now the full mail stack is secured using domain-specific SSL certificates for both IMAP and SMTP connections. The remaining sections below will explain some maintenance-related procedures such as handling the LetsEncrypt certificate renewals & updating the mappings in Postfix automatically, as well as verifying it’s all working as intended (and some troubleshooting tips in case it’s not). 🙂

Automatic renewal (cron) configurations

As mentioned in the LetsEncrypt section above, the certificates that they issue are only valid for a period of 90 days. One of the reasons for the relatively short validity period is to encourage automation when it comes to renewing them. I choose to handle the renewals automatically via cron:

# tail -n 4 /var/spool/cron/crontabs/root 

## LetsEncrypt certificate renewals on first of each month
## See /etc/letsencrypt/renewal-hooks/post/ for Postfix & Apache hooks 
0  2  1  *  *   /usr/bin/certbot renew --quiet

This cron entry instructs LetsEncrypt’s certbot to check the validity of ALL certificates at 02:00 (server time) on the first of every month (if that format is unfamiliar to you, see Wikipedia’s article on cron). The renew subcommand will automatically generate a new certificate for any found to expire within the next 30 days, and the quiet option will silence any output except for errors, which is appropriate for use with a cron job.

That’s the procedure for renewing the certificate automatically, but what about then automatically updating the appropriate stack configurations—in particular, Postfix’s vmail_ssl mappings table (and Apache, but that’s outside the scope of this tutorial)? If the certificate is renewed, but it is not updated in Postfix’s hash table, there will be a mismatch error. As mentioned in the comment on the cron entry, I chose to handle those configuration updates automatically via certbot’s ‘renewal hooks’, which can be found under /etc/letsencrypt/renewal-hooks/. In this case, the configuration updates need to happen after certificate renewal, so they are put under the post/ subdirectory.

I have two scripts that run after a certificate renewal, but only the 01_postfix_smtp_ssl.sh one is applicable for the mail stack:

# ls /etc/letsencrypt/renewal-hooks/post/
01_postfix_smtp_ssl.sh  02_apache.sh

# cat /etc/letsencrypt/renewal-hooks/post/01_postfix_smtp_ssl.sh 
#!/bin/bash
/usr/sbin/postmap -F hash:/etc/postfix/vmail_ssl
/etc/init.d/postfix restart
exit 0

The simple script issues the same postmap command from the ‘Postfix (SMTP) configurations‘ section above, and then restarts Postfix. If everything goes smoothly, it will exit cleanly (‘exit 0’). The script ensures that the new certificate is immediately applied to the Postfix configuration so that there aren’t validation errors after the automated renewal process.

Verification of the certificates

If everything went according to plan, valid SSL certificates should be in place for both mail.domain1.com and mail.domain2.com. Like any good engineer, though, we don’t want to just assume that it’s working as intended. So… we should test it! You could just open an email client of your choice and view the certificates for IMAP and SMTP connections. Personally, though, I prefer using terminal-based utilities as I find them to be more efficient. In this case, we can use the openssl command for connecting to each domain as a test, and the basic syntax is:

For SMTP:

openssl s_client -connect mail.domain1.com:25 -servername mail.domain1.com -starttls smtp

For IMAP:

openssl s_client -connect mail.domain1.com:993 -servername mail.domain1.com

These commands will output a lot of information including the full public certificate, the issuing authority (LetsEncrypt), handshake details, SSL session details, and so on. If you’re interested in all of those details, feel free to issue the commands as they are above (obviously swapping out the actual domains and the ports that you use for SMTP and IMAP). If, however, you simply want to confirm that the certificates are valid, you can pipe the commands to grep in order to limit the output:

For SMTP:

$ openssl s_client -connect mail.domain1.com:25 -servername mail.domain1.com -starttls smtp | grep -e 'subject=CN \|Verify return code:'
depth=2 O = Digital Signature Trust Co., CN = DST Root CA X3
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = R3
verify return:1
depth=0 CN = domain1.com
verify return:1
250 CHUNKING
subject=CN = domain1.com
Verify return code: 0 (ok)

For IMAP:

$ openssl s_client -connect mail.domain1.com:993 -servername mail.domain1.com | grep -e 'subject=CN \|Verify return code:'
depth=2 O = Digital Signature Trust Co., CN = DST Root CA X3
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = R3
verify return:1
depth=0 CN = domain1.com
verify return:1
subject=CN = domain1.com
Verify return code: 0 (ok)

If you see output similar to what’s above, then everything is working as it should be. In particular, you want to make sure that references to the ‘CN’ match the domain, and that you see a ‘Verify return code:’ of 0 (ok) Pat yourself on the back and grab your beverage of choice to celebrate a job well done. 🙂

Additional information

If you have already been using a domain for a website or other service, chances are that you have already generated a LetsEncrypt SSL certificate for it. Thankfully LetsEncrypt makes it easy to append a new subdomain to an existing certificate instead of having to generate a completely separate one for the ‘mail’ subdomain used in this guide (e.g. mail.domain1.com).

The first step is to find the certificate that you want to modify (in this case, domain1.com) and see which subdomains are covered under it. This can be accomplished using the certbot certificates command. The output will look something like this:

Certificate Name: domain1.com
   Serial Number: $some_alphanumeric_string
   Key Type: RSA
   Domains: domain1.com www.domain1.com staging.domain1.com
   Expiry Date: 2021-05-02 06:03:19+00:00 (VALID: 60 days)
   Certificate Path: /etc/letsencrypt/live/domain1.com/fullchain.pem
   Private Key Path: /etc/letsencrypt/live/domain1.com/privkey.pem

The important part is the list of subdomains on the Domains: line because you need to reference ALL of them when using the --expand flag that follows. Using the output from above, the command would be constructed as:

# /usr/bin/certbot certonly --agree-tos --non-interactive --webroot --webroot-path /var/www/domains/domain1.com/www/htdocs/ --domains domain1.com,www.domain1.com,staging.domain1.com,mail.domain1.com --expand

If certbot indicates that the new certificate has been generated without any errors, you can check it again using the certbot certificates command from above and validate that now the ‘mail’ subdomain is listed as well:

Certificate Name: domain1.com
   Serial Number: $some_alphanumeric_string
   Key Type: RSA
   Domains: domain1.com www.domain1.com staging.domain1.com mail.domain1.com
   Expiry Date: 2021-05-14 06:05:19+00:00 (VALID: 60 days)
   Certificate Path: /etc/letsencrypt/live/domain1.com/fullchain.pem
   Private Key Path: /etc/letsencrypt/live/domain1.com/privkey.pem

Troubleshooting

I can’t anticipate the full gamut of problems that could potentially arise when going through this guide, but I will try to cover some common pitfalls here. If you run into a problem, feel free to comment and I will try to help you through it.

>>> Postfix error about the table hash:

If Postfix won’t start after the modifications from the sections above, and you see a line like this in the mail logs:

[postfix/smtpd] warning: table hash:/etc/postfix/vmail_ssl.db: key mail.domain1.com: malformed BASE64 value: /etc/letsencrypt/live/domain1

then the problem stems from running postmap without the -F flag. Try it again with that flag: postmap -F hash:/etc/postfix/vmail_ssl which should create a syntactically correct hash table, allowing Postfix to properly start up.

March 14 2021

Gentoo AMD64 Handbook "Preparing the disks" section reworked

Andreas K. Hüttel (dilfridge) March 14, 2021, 9:55

♦Since the text was becoming more and more outdated and also more and more convoluted, I have completely reworked the "Preparing the disks" section of the Gentoo AMD64 handbook. 

  • Since fdisk supports GUID partition tables (GPT) for a long time now, references to parted have been dropped.
  • The text restricts itself now to the combinations 1) UEFI boot and GPT and 2) BIOS / legacy boot and MBR. While mixing and matching is here certainly possible, we should treat it out of the scope of the manual.
  • Hopefully the terminology regarding the boot partition, UEFI system partition, and BIOS boot partition is more clear now (it was horribly mixed up before).

Please proofread and check for mistakes! I'll drop the "work in progress" label in a few days if nothing comes up.

Since the text was becoming more and more outdated and also more and more convoluted, I have completely reworked the "Preparing the disks" section of the Gentoo AMD64 handbook

  • Since fdisk supports GUID partition tables (GPT) for a long time now, references to parted have been dropped.
  • The text restricts itself now to the combinations 1) UEFI boot and GPT and 2) BIOS / legacy boot and MBR. While mixing and matching is here certainly possible, we should treat it out of the scope of the manual.
  • Hopefully the terminology regarding the boot partition, UEFI system partition, and BIOS boot partition is more clear now (it was horribly mixed up before).

Please proofread and check for mistakes! I'll drop the "work in progress" label in a few days if nothing comes up.

March 03 2021

Moving commits between independent git histories

Michał Górny (mgorny) March 03, 2021, 17:55

PyPy is an alternative Python implementation. While it does replace a large part of the interpreter, a large part of the standard library is shared with CPython. As a result, PyPy is frequently affected by the same vulnerabilities as CPython, and we have to backport security fixes to it.

Backporting security fixes inside CPython is relatively easy. All main Python branches are in a single repository, so it’s just a matter of cherry-picking the commits. Normally, you can easily move patches between two related git repositories using git-style patches but this isn’t going to work for two repositories with unrelated histories.

Does this mean manually patching PyPy and rewriting commit messages by hand? Luckily, there’s a relatively simple git am trick that can help you avoid that.

Roughly, the idea is to:

1. Create a git-format-patch of the change to backport.

2. Attempt to apply the change via git am — it will fail and your repository will be left in middle of an am session.

3. Apply the change via patch.

4. git add the changes.

5. Finally, call git am --continue to finish, wrapping your changes in the original commit metadata.

For example, let’s try backporting CVE-2021-23336 (parameter cloaking) fix:

First, grab the relevant patch from the CPython repository:

$ git format-patch -1 d0d4d30882fe3ab9b1badbecf5d15d94326fd13e
0001-3.7-bpo-42967-only-use-as-a-query-string-separator-G.patch

Then, inside the local clone of a random PyPy git mirror:

$ git am -3 ~/git/cpython/0001-3.7-bpo-42967-only-use-as-a-query-string-separator-G.patch
Applying: bpo-42967: only use '&' as a query string separator (GH-24297) (GH-24531)
error: sha1 information is lacking or useless (Doc/library/cgi.rst).
error: could not build fake ancestor
Patch failed at 0001 bpo-42967: only use '&' as a query string separator (GH-24297) (GH-24531)
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Now enter the directory that stdlib resides in, and apply the patch manually, skipping any missing files:

$ patch -p2 < ~/git/cpython/0001-3.7-bpo-42967-only-use-as-a-query-string-separator-G.patch
can't find file to patch at input line 39
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|From d0d4d30882fe3ab9b1badbecf5d15d94326fd13e Mon Sep 17 00:00:00 2001
|From: Senthil Kumaran 
|Date: Mon, 15 Feb 2021 10:34:14 -0800
|Subject: [PATCH] [3.7] bpo-42967: only use '&' as a query string separator
| (GH-24297)  (GH-24531)
|MIME-Version: 1.0
|Content-Type: text/plain; charset=UTF-8
|Content-Transfer-Encoding: 8bit
|
|bpo-42967: [security] Address a web cache-poisoning issue reported in
|urllib.parse.parse_qsl().
|
|urllib.parse will only us "&" as query string separator by default
|instead of both ";" and "&" as allowed in earlier versions. An optional
|argument seperator with default value "&" is added to specify the
|separator.
|
|Co-authored-by: Éric Araujo 
|Co-authored-by: Ken Jin 
|Co-authored-by: Adam Goldschmidt 
|(cherry picked from commit fcbe0cb04d35189401c0c880ebfb4311e952d776)
|---
| Doc/library/cgi.rst                           |  9 ++-
| Doc/library/urllib.parse.rst                  | 23 ++++++-
| Doc/whatsnew/3.6.rst                          | 13 ++++
| Doc/whatsnew/3.7.rst                          | 13 ++++
| Lib/cgi.py                                    | 23 ++++---
| Lib/test/test_cgi.py                          | 29 ++++++--
| Lib/test/test_urlparse.py                     | 68 +++++++++++++------
| Lib/urllib/parse.py                           | 19 ++++--
| .../2021-02-14-15-59-16.bpo-42967.YApqDS.rst  |  1 +
| 9 files changed, 152 insertions(+), 46 deletions(-)
| create mode 100644 Misc/NEWS.d/next/Security/2021-02-14-15-59-16.bpo-42967.YApqDS.rst
|
|diff --git a/Doc/library/cgi.rst b/Doc/library/cgi.rst
|index 0b1aead9dd..f0ec7e8cc6 100644
|--- a/Doc/library/cgi.rst
|+++ b/Doc/library/cgi.rst
--------------------------
File to patch: 
Skip this patch? [y] 
Skipping patch.
3 out of 3 hunks ignored
[...]
patching file cgi.py
patching file test/test_cgi.py
patching file test/test_urlparse.py
patching file urllib/parse.py
patching file NEWS.d/next/Security/2021-02-14-15-59-16.bpo-42967.YApqDS.rst

Adjust the changes as appropriate:

$ rm -r NEWS.d/
$ git status
HEAD detached from release-pypy3.7-v7.3.3
You are in the middle of an am session.
  (fix conflicts and then run "git am --continue")
  (use "git am --skip" to skip this patch)
  (use "git am --abort" to restore the original branch)

Changes not staged for commit:
  (use "git add ..." to update what will be committed)
  (use "git restore ..." to discard changes in working directory)
	modified:   cgi.py
	modified:   test/test_cgi.py
	modified:   test/test_urlparse.py
	modified:   urllib/parse.py

no changes added to commit (use "git add" and/or "git commit -a")
$ git add cgi.py test/test_cgi.py test/test_urlparse.py urllib/parse.py

And finally let git am commit the changes for you:

$ git am --continue
Applying: bpo-42967: only use '&' as a query string separator (GH-24297) (GH-24531)

PyPy is an alternative Python implementation. While it does replace a large part of the interpreter, a large part of the standard library is shared with CPython. As a result, PyPy is frequently affected by the same vulnerabilities as CPython, and we have to backport security fixes to it.

Backporting security fixes inside CPython is relatively easy. All main Python branches are in a single repository, so it’s just a matter of cherry-picking the commits. Normally, you can easily move patches between two related git repositories using git-style patches but this isn’t going to work for two repositories with unrelated histories.

Does this mean manually patching PyPy and rewriting commit messages by hand? Luckily, there’s a relatively simple git am trick that can help you avoid that.

Roughly, the idea is to:

1. Create a git-format-patch of the change to backport.

2. Attempt to apply the change via git am — it will fail and your repository will be left in middle of an am session.

3. Apply the change via patch.

4. git add the changes.

5. Finally, call git am --continue to finish, wrapping your changes in the original commit metadata.

For example, let’s try backporting CVE-2021-23336 (parameter cloaking) fix:

First, grab the relevant patch from the CPython repository:

$ git format-patch -1 d0d4d30882fe3ab9b1badbecf5d15d94326fd13e
0001-3.7-bpo-42967-only-use-as-a-query-string-separator-G.patch

Then, inside the local clone of a random PyPy git mirror:

$ git am -3 ~/git/cpython/0001-3.7-bpo-42967-only-use-as-a-query-string-separator-G.patch
Applying: bpo-42967: only use '&' as a query string separator (GH-24297) (GH-24531)
error: sha1 information is lacking or useless (Doc/library/cgi.rst).
error: could not build fake ancestor
Patch failed at 0001 bpo-42967: only use '&' as a query string separator (GH-24297) (GH-24531)
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Now enter the directory that stdlib resides in, and apply the patch manually, skipping any missing files:

$ patch -p2 < ~/git/cpython/0001-3.7-bpo-42967-only-use-as-a-query-string-separator-G.patch
can't find file to patch at input line 39
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|From d0d4d30882fe3ab9b1badbecf5d15d94326fd13e Mon Sep 17 00:00:00 2001
|From: Senthil Kumaran 
|Date: Mon, 15 Feb 2021 10:34:14 -0800
|Subject: [PATCH] [3.7] bpo-42967: only use '&' as a query string separator
| (GH-24297)  (GH-24531)
|MIME-Version: 1.0
|Content-Type: text/plain; charset=UTF-8
|Content-Transfer-Encoding: 8bit
|
|bpo-42967: [security] Address a web cache-poisoning issue reported in
|urllib.parse.parse_qsl().
|
|urllib.parse will only us "&" as query string separator by default
|instead of both ";" and "&" as allowed in earlier versions. An optional
|argument seperator with default value "&" is added to specify the
|separator.
|
|Co-authored-by: Éric Araujo 
|Co-authored-by: Ken Jin 
|Co-authored-by: Adam Goldschmidt 
|(cherry picked from commit fcbe0cb04d35189401c0c880ebfb4311e952d776)
|---
| Doc/library/cgi.rst                           |  9 ++-
| Doc/library/urllib.parse.rst                  | 23 ++++++-
| Doc/whatsnew/3.6.rst                          | 13 ++++
| Doc/whatsnew/3.7.rst                          | 13 ++++
| Lib/cgi.py                                    | 23 ++++---
| Lib/test/test_cgi.py                          | 29 ++++++--
| Lib/test/test_urlparse.py                     | 68 +++++++++++++------
| Lib/urllib/parse.py                           | 19 ++++--
| .../2021-02-14-15-59-16.bpo-42967.YApqDS.rst  |  1 +
| 9 files changed, 152 insertions(+), 46 deletions(-)
| create mode 100644 Misc/NEWS.d/next/Security/2021-02-14-15-59-16.bpo-42967.YApqDS.rst
|
|diff --git a/Doc/library/cgi.rst b/Doc/library/cgi.rst
|index 0b1aead9dd..f0ec7e8cc6 100644
|--- a/Doc/library/cgi.rst
|+++ b/Doc/library/cgi.rst
--------------------------
File to patch: 
Skip this patch? [y] 
Skipping patch.
3 out of 3 hunks ignored
[...]
patching file cgi.py
patching file test/test_cgi.py
patching file test/test_urlparse.py
patching file urllib/parse.py
patching file NEWS.d/next/Security/2021-02-14-15-59-16.bpo-42967.YApqDS.rst

Adjust the changes as appropriate:

$ rm -r NEWS.d/
$ git status
HEAD detached from release-pypy3.7-v7.3.3
You are in the middle of an am session.
  (fix conflicts and then run "git am --continue")
  (use "git am --skip" to skip this patch)
  (use "git am --abort" to restore the original branch)

Changes not staged for commit:
  (use "git add ..." to update what will be committed)
  (use "git restore ..." to discard changes in working directory)
	modified:   cgi.py
	modified:   test/test_cgi.py
	modified:   test/test_urlparse.py
	modified:   urllib/parse.py

no changes added to commit (use "git add" and/or "git commit -a")
$ git add cgi.py test/test_cgi.py test/test_urlparse.py urllib/parse.py

And finally let git am commit the changes for you:

$ git am --continue
Applying: bpo-42967: only use '&' as a query string separator (GH-24297) (GH-24531)
VIEW

SCOPE

FILTER
  from
  to