December 07, 2021

hackergotchi for Evgeni Golov

Evgeni Golov

The Mocking will continue, until CI improves

One might think, this blog is exclusively about weird language behavior and yelling at computers… Well, welcome to another episode of Jackass!

Today's opponent is Ruby, or maybe minitest , or maybe Mocha. I'm not exactly sure, but it was a rather amusing exercise and I like to share my nightmares ;)

It all started with the classical "you're using old and unmaintained software, please switch to something new".

The first attempt was to switch from the ci_reporter_minitest plugin to the minitest-ci plugin. While the change worked great for Foreman itself, it broke the reporting in Katello - the tests would run but no junit.xml was generated and Jenkins rightfully complained that it got no test results.

While investigating what the hell was wrong, we realized that Katello was already using a minitest reporting plugin: minitest-reporters. Loading two different reporting plugins seemed like a good source for problems, so I tried using the same plugin for Foreman too.

Guess what? After a bit of massaging (mostly to disable the second minitest-reporters initialization in Katello) reporting of test results from Katello started to work like a charm. But now the Foreman tests started to fail. Not fail to report, fail to actually run. WTH‽

The failure was quite interesting too:

test/unit/parameter_filter_test.rb:5:in `block in <class:ParameterFilterTest>':
  Mocha methods cannot be used outside the context of a test (Mocha::NotInitializedError)

Yes, this is a single test file failing, all others were fine.

The failing code doesn't look problematic on first glance:

require 'test_helper'

class ParameterFilterTest < ActiveSupport::TestCase
  let(:klass) do
    mock('Example').tap do |k|
      k.stubs(:name).returns('Example')
    end
  end

  test 'something' do
    something
  end
end

The failing line (5) is mock('Example').tap … and for some reason Mocha thinks it's not initialized here.

This certainly has something to do with how the various reporting plugins inject themselves, but I really didn't want to debug how to run two reporting plugins in parallel (which, as you remember, didn't expose this behavior). So the only real path forward was to debug what's happening here.

Calling the test on its own, with one of the working reporter was the first step:

$ bundle exec rake test TEST=test/unit/parameter_filter_test.rb TESTOPTS=-v

#<Mocha::Mock:0x0000557bf1f22e30>#test_0001_permits plugin-added attribute = 0.04 s = .
#<Mocha::Mock:0x0000557bf12cf750>#test_0002_permits plugin-added attributes from blocks = 0.49 s = .

Wait, what? #<Mocha::Mock:…>? Shouldn't this read more like ParameterFilterTest::… as it happens for every single other test in our test suite? It definitely should! That's actually great, as it tells us that there is really something wrong with the test and the change of the reporting plugin just makes it worse.

What comes next is sheer luck. Well, that, and years of experience in yelling at computers.

We use let(:klass) to define an object called klass and this object is a Mocha::Mock that we'll use in our tests later. Now klass is a very common term in Ruby when talking about classes and needing to store them — mostly because one can't use class which is a keyword. Is something else in the stack using klass and our let is overriding that, making this whole thing explode?

It was! The moment we replaced klass with klass1 (silly, I know, but there also was a klass2 in that code, so it did fit), things started to work nicely.

I really liked Tomer's comment in the PR: "no idea why, but I am not going to dig into mocha to figure that out."

Turns out, I couldn't let (HAH!) the code rest and really wanted to understand what happened there.

What I didn't want to do is to debug the whole Foreman test stack, because it is massive.

So I started to write a minimal reproducer for the issue.

All starts with a Gemfile, as we need a few dependencies:

gem 'rake'
gem 'mocha'
gem 'minitest', '~> 5.1', '< 5.11'

Then a Rakefile:

require 'rake/testtask'

Rake::TestTask.new(:test) do |t|
  t.libs << 'test'
  t.test_files = FileList["test/**/*_test.rb"]
end

task :default => :test

And a test! I took the liberty to replace ActiveSupport::TestCase with Minitest::Test, as the test won't be using any Rails features and I wanted to keep my environment minimal.

require 'minitest/autorun'
require 'minitest/spec'
require 'mocha/minitest'

class ParameterFilterTest < Minitest::Test
  extend Minitest::Spec::DSL

  let(:klass) do
    mock('Example').tap do |k|
      k.stubs(:name).returns('Example')
    end
  end

  def test_lol
    assert klass
  end
end

Well, damn, this passed! Is it Rails after all that breaks stuff? Let's add it to the Gemfile!

$ vim Gemfile
$ bundle install
$ bundle exec rake test TESTOPTS=-v

#<Mocha::Mock:0x0000564bbfe17e98>#test_lol = 0.00 s = .

Wait, I didn't change anything and it's already failing?! Fuck! I mean, cool!

But the test isn't minimal yet. What can we reduce? let is just a fancy, lazy def, right? So instead of let(:klass) we should be able to write def class and achieve a similar outcome and drop that Minitest::Spec.

require 'minitest/autorun'
require 'mocha/minitest'

class ParameterFilterTest < Minitest::Test
  def klass
    mock
  end

  def test_lol
    assert klass
  end
end
$ bundle exec rake test TESTOPTS=-v

/home/evgeni/Devel/minitest-wtf/test/parameter_filter_test.rb:5:in `klass': Mocha methods cannot be used outside the context of a test (Mocha::NotInitializedError)
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/railties-6.1.4.1/lib/rails/test_unit/reporter.rb:68:in `format_line'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/railties-6.1.4.1/lib/rails/test_unit/reporter.rb:15:in `record'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:682:in `block in record'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:681:in `each'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:681:in `record'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:324:in `run_one_method'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:311:in `block (2 levels) in run'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:310:in `each'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:310:in `block in run'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:350:in `on_signal'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:337:in `with_info_handler'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:309:in `run'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:159:in `block in __run'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:159:in `map'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:159:in `__run'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:136:in `run'
    from /home/evgeni/Devel/minitest-wtf/vendor/bundle/ruby/3.0.0/gems/minitest-5.10.3/lib/minitest.rb:63:in `block in autorun'
rake aborted!

Oh nice, this is even better! Instead of the mangled class name, we now get the very same error the Foreman tests aborted with, plus a nice stack trace! But wait, why is it pointing at railties? We're not loading that! Anyways, lets look at railties-6.1.4.1/lib/rails/test_unit/reporter.rb, line 68

def format_line(result)
  klass = result.respond_to?(:klass) ? result.klass : result.class
  "%s#%s = %.2f s = %s" % [klass, result.name, result.time, result.result_code]
end

Heh, this is touching result.klass, which we just messed up. Nice!

But quickly back to railties… What if we only add that to the Gemfile, not full blown Rails?

gem 'railties'
gem 'rake'
gem 'mocha'
gem 'minitest', '~> 5.1', '< 5.11'

Yepp, same failure. Also happens with require => false added to the line, so it seems railties somehow injects itself into rake even if nothing is using it?! "Cool"!

By the way, why are we still pinning minitest to < 5.11? Oh right, this was the original reason to look into that whole topic. And, uh, it's pointing at klass there already! 4 years ago!

So lets remove that boundary and funny enough, now tests are passing again, even if we use klass!

Minitest 5.11 changed how Minitest::Test is structured, and seems not to rely on klass at that point anymore. And I guess Rails also changed a bit since the original pin was put in place four years ago.

I didn't want to go another rabbit hole, finding out what changed in Rails, but I did try with 5.0 (well, 5.0.7.2) to be precise, and the output with newer (>= 5.11) Minitest was interesting:

$ bundle exec rake test TESTOPTS=-v

Minitest::Result#test_lol = 0.00 s = .

It's leaking Minitest::Result as klass now, instead of Mocha::Mock. So probably something along these lines was broken 4 years ago and triggered this pin.

What do we learn from that?

  • klass is cursed and shouldn't be used in places where inheritance and tooling might decide to use it for some reason
  • inheritance is cursed - why the heck are implementation details of Minitest leaking inside my tests?!
  • tooling is cursed - why is railties injecting stuff when I didn't ask it to?!
  • dependency pinning is cursed - at least if you pin to avoid an issue and then forget about said issue for four years
  • I like cursed things!

07 December, 2021 07:39PM by evgeni

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

Rblpapi 0.3.12: Fixes and Updates

The Rblp team is happy to announce a new version 0.3.12 of Rblpapi which just arrived at CRAN. Rblpapi provides a direct interface between R and the Bloomberg Terminal via the C++ API provided by Bloomberg (but note that a valid Bloomberg license and installation is required).

This is the twelveth release since the package first appeared on CRAN in 2016. Changes are detailed below and include both extensions to functionality, actual bug fixes and changes to the package setup. Special thanks goes to Michael Kerber, Yihui Xie and Kai Lin for contributing pull requests!

Changes in Rblpapi version 0.3.12 (2021-12-07)

  • bdh() supports new option returnAs (Michael Kerber and Dirk in #335 fixing #206)

  • Remove extra backtick in vignette (Yihui Xie in #343)

  • Fix a segfault from bulk access with bds (Kai Lin in #347 fixing #253)

  • Support REQUEST_STATUS in bdh (Kai Lin and John in #349 fixing #348)

  • Vignette now uses simplermarkdown (Dirk in #350)

Courtesy of my CRANberries, there is also a diffstat report for the this release. As always, more detailed information is on the Rblpapi page. Questions, comments etc should go to the issue tickets system at the GitHub repo.

If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

07 December, 2021 01:39PM

hackergotchi for Daniel Lange

Daniel Lange

Gradual improvements at the Linux Foundation

After last year's blunder with trying to hide the Adobe toolchain and using hilarious stock photos, the Linux Foundation did much better in their 2021 annual report published Dec. 6, 2021.

Still they are using the Adobe toolchain (InDesign, Acrobat PDF) and my fellow Debian Developer Geert was quick to point that out as the first comment to the LWN note on the publication:

LWN comment from Geert

I think it is important to call the Linux Foundation (LF) out again and again. Adobe is a Silver member of the LF and they can motivate them to publish their applications for Linux. And if that is not an option, there are Free alternatives like Scribus that could well use the exposure and funds of LF to help catch up to the market leading product, Adobe InDesign.

Linux Foundation Annual report 2021, document properties

Personally, as a photographer, I am very happy they used stock images from Unsplash to illustrate the 2021 edition over the cringeworthy Shutterstock footage from last year's report.

And they gave proper credit:

Thank you section for Unsplash from the Linux Foundation 2021 annual report

Now for next year ... find an editor that knows how to spell photographers, please. And consider Scribus. And make Adobe publish their apps for Linux. Thank you.

07 December, 2021 10:11AM by Daniel Lange

Russell Coker

AS400

The IBM i operating system on the AS/400 is a system that runs on PPC for “midrange” systems. I did a bit of reading about it after seeing an AS/400 on ebay for $300, if I had a lot more spare time and energy I might have put in a bid for that if it didn’t look like it had been left out in the rain. It seems that AS/400 is not dead, there are cloud services available, here’s one that provides a VM with 2GM of RAM for “only EUR 251 monthly” [1], wow. I’m not qualified to comment on whether that’s good value, but I think it’s worth noting that a Linux VM running an AMD64 CPU with similar storage and the same RAM can be expected to cost about $10 per month.

There is also a free AS/400 cloud named pub400 [2], this is the type of thing I’d do if I had my own AS/400.

07 December, 2021 03:08AM by etbe

December 06, 2021

hackergotchi for Jonathan Dowland

Jonathan Dowland

Sixth Annual UK System Research Challenges Workshop lightning talk

me looking awkward, thanks [Mark Little](https://twitter.com/nmcl/status/1466148768043126791/photo/1)

me looking awkward, thanks Mark Little

Last week I attended the UK Systems Research 2021 conference in County Durham, my first conference in nearly two years (since FOSDEM 2020, right on the cusp of the Pandemic). The Systems conference community is very pleasant and welcoming and so when I heard it was going to take place "physically" again this year I was so keen to attend I decided to hedge my bets and submit two talk proposals. I wasn't expecting them both to be accepted…

As well as the regular talks (more on those in another post) there is a tradition for people to give short, impromptu lightning talks after dinner on the second night. I've given two of these before, and I'd been considering whether to offer to one this time or not, but with two talks to deliver (and finish writing) I wasn't sure. Usually people talk about something interesting that they have been doing besides their research or day-jobs, but the last two years have been somewhat difficult and I didn't really think I had a topic to talk about. Then I wondered if that was a topic in itself…

During the first day of the conference (and especially one I'd got past one of my talks) I started to outline a lightning talk idea and it seemed to come out well enough that I thought I'd give it a go. Unusually I therefore had something written down and I was surprised how well it was received, so I thought I'd share it. Here it is:


I was anticipating the lightning talks and being cajoled into talking about something. I've done it twice before. So I've been racking my brains to figure out if I've done anything interesting enough to talk about.

in 2018 I talked about some hack I'd made to the classic computer game Doom from 1993. I've done several hacks to Doom that I could probably talk about except I've become a bit uncomfortable about increasingly being thought of as "that doom guy". I'd been reflecting on why it was that I continued to mess about with that game in the first place and I realised it was a form of expression: I was treating Doom like a canvas.

I've spent most of my career thinking about what I do in the frame of either science or engineering. I suffer from the creative urge and I've often expressed (and sated) that through my work. And that's possible because there's a craft in what we do.

In 2019 I talked about a project I'd embarked on to resurrect my childhood computer, a Commodore Amiga 500, in order to rescue my childhood drawings and digital paintings. (There's the artistic thing again). I'd achieved that and I have ambitions to do some more Amiga stuff but again that's a work in progress and there's nothing much to talk about.

In recent years I've been thinking more and more about art and became interested in the works and writings of people like Grayson Perry, Laurie Anderson and Brian Eno. I first learned about Eno through his music but he's also a visual artist. and a music producer. As a producer in the 70s he co-invented a system to try and break out of writer's block called "oblique strategies": A deck of cards with oblique suggestions written on them. When you're stuck, you pull a card and it might help you to reframe what you are working on and think about it in a completely different way.

I love this idea and I think we should use more things like that in software engineering at least.

So back to casting about for something to talk about. What have I been doing in the last couple of years? Frankly, surviving - I've just about managed to keep doing my day job, and keep working on the PhD, at home with two young kids and home schooling and the rest of it. Which is an achievement but makes for a boring lightning talk. But I'd like to say that for anyone here who might have been worrying similarly: I think surviving is more than enough.

I'll close on the subject of thinking like an artist and not an engineer. I brought some of the Oblique Strategies deck with me and I thought I'd draw a card to perhaps help you out of a creative dilemma if you're in one. And I kid you not, the first card I drew was this one:

Card reading 'You are an Engineer'

06 December, 2021 10:04PM

Matthias Klumpp

New things in AppStream 0.15

On the road to AppStream 1.0, a lot of items from the long todo list have been done so far – only one major feature is remaining, external release descriptions, which is a tricky one to implement and specify. For AppStream 1.0 it needs to be present or be rejected though, as it would be a major change in how release data is handled in AppStream.

Besides 1.0 preparation work, the recent 0.15 release and the releases before it come with their very own large set of changes, that are worth a look and may be interesting for your application to support. But first, for a change that affects the implementation and not the XML format:

1. Completely rewritten caching code

Keeping all AppStream data in memory is expensive, especially if the data is huge (as on Debian and Ubuntu with their large repositories generated from desktop-entry files as well) and if processes using AppStream are long-running. The latter is more and more the case, not only does GNOME Software run in the background, KDE uses AppStream in KRunner and Phosh will use it too for reading form factor information. Therefore, AppStream via libappstream provides an on-disk cache that is memory-mapped, so data is only consuming RAM if we are actually doing anything with it.

Previously, AppStream used an LMDB-based cache in the background, with indices for fulltext search and other common search operations. This was a very fast solution, but also came with limitations, LMDB’s maximum key size of 511 bytes became a problem quite often, adjusting the maximum database size (since it has to be set at opening time) was annoyingly tricky, and building dedicated indices for each search operation was very inflexible. In addition to that, the caching code was changed multiple times in the past to allow system-wide metadata to be cached per-user, as some distributions didn’t (want to) build a system-wide cache and therefore ran into performance issues when XML was parsed repeatedly for generation of a temporary cache. In addition to all that, the cache was designed around the concept of “one cache for data from all sources”, which meant that we had to rebuild it entirely if just a small aspect changed, like a MetaInfo file being added to /usr/share/metainfo, which was very inefficient.

To shorten a long story, the old caching code was rewritten with the new concepts of caches not necessarily being system-wide and caches existing for more fine-grained groups of files in mind. The new caching code uses Richard Hughes’ excellent libxmlb internally for memory-mapped data storage. Unlike LMDB, libxmlb knows about the XML document model, so queries can be much more powerful and we do not need to build indices manually. The library is also already used by GNOME Software and fwupd for parsing of (refined) AppStream metadata, so it works quite well for that usecase. As a result, search queries via libappstream are now a bit slower (very much depends on the query, roughly 20% on average), but can be mmuch more powerful. The caching code is a lot more robust, which should speed up startup time of applications. And in addition to all of that, the AsPool class has gained a flag to allow it to monitor AppStream source data for changes and refresh the cache fully automatically and transparently in the background.

All software written against the previous version of the libappstream library should continue to work with the new caching code, but to make use of some of the new features, software using it may need adjustments. A lot of methods have been deprecated too now.

2. Experimental compose support

Compiling MetaInfo and other metadata into AppStream collection metadata, extracting icons, language information, refining data and caching media is an involved process. The appstream-generator tool does this very well for data from Linux distribution sources, but the tool is also pretty “heavyweight” with lots of knobs to adjust, an underlying database and a complex algorithm for icon extraction. Embedding it into other tools via anything else but its command-line API is also not easy (due to D’s GC initialization, and because it was never written with that feature in mind). Sometimes a simpler tool is all you need, so the libappstream-compose library as well as appstreamcli compose are being developed at the moment. The library contains building blocks for developing a tool like appstream-generator while the cli tool allows to simply extract metadata from any directory tree, which can be used by e.g. Flatpak. For this to work well, a lot of appstream-generator‘s D code is translated into plain C, so the implementation stays identical but the language changes.

Ultimately, the generator tool will use libappstream-compose for any general data refinement, and only implement things necessary to extract data from the archive of distributions. New applications (e.g. for new bundling systems and other purposes) can then use the same building blocks to implement new data generators similar to appstream-generator with ease, sharing much of the code that would be identical between implementations anyway.

2. Supporting user input controls

Want to advertise that your application supports touch input? Keyboard input? Has support for graphics tablets? Gamepads? Sure, nothing is easier than that with the new control relation item and supports relation kind (since 0.12.11 / 0.15.0, details):

<supports>
  <control>pointing</control>
  <control>keyboard</control>
  <control>touch</control>
  <control>tablet</control>
</supports>

3. Defining minimum display size requirements

Some applications are unusable below a certain window size, so you do not want to display them in a software center that is running on a device with a small screen, like a phone. In order to encode this information in a flexible way, AppStream now contains a display_length relation item to require or recommend a minimum (or maximum) display size that the described GUI application can work with. For example:

<requires>
  <display_length compare="ge">360</display_length>
</requires>

This will make the application require a display length greater or equal to 300 logical pixels. A logical pixel (also device independent pixel) is the amount of pixels that the application can draw in one direction. Since screens, especially phone screens but also screens on a desktop, can be rotated, the display_length value will be checked against the longest edge of a display by default (by explicitly specifying the shorter edge, this can be changed).

This feature is available since 0.13.0, details. See also Tobias Bernard’s blog entry on this topic.

4. Tags

This is a feature that was originally requested for the LVFS/fwupd, but one of the great things about AppStream is that we can take very project-specific ideas and generalize them so something comes out of them that is useful for many. The new tags tag allows people to tag components with an arbitrary namespaced string. This can be useful for project-internal organization of applications, as well as to convey certain additional properties to a software center, e.g. an application could mark itself as “featured” in a specific software center only. Metadata generators may also add their own tags to components to improve organization. AppStream gives no recommendations as to how these tags are to be interpreted except for them being a strictly optional feature. So any meaning is something clients and metadata authors need to negotiate. It therefore is a more specialized usecase of the already existing custom tag, and I expect it to be primarily useful within larger organizations that produce a lot of software components that need sorting. For example:

<tags>
  <tag namespace="lvfs">vendor-2021q1</tag>
  <tag namespace="plasma">featured</tag>
</tags>

This feature is available since 0.15.0, details.

5. MetaInfo Creator changes

The MetaInfo Creator (source) tool is a very simple web application that provides you with a form to fill out and will then generate MetaInfo XML to add to your project after you have answered all of its questions. It is an easy way for developers to add the required metadata without having to read the specification or any guides at all.

Recently, I added support for the new control and display_length tags, resolved a few minor issues and also added a button to instantly copy the generated output to clipboard so people can paste it into their project. If you want to create a new MetaInfo file, this tool is the best way to do it!

The creator tool will also not transfer any data out of your webbrowser, it is strictly a client-side application.

And that is about it for the most notable changes in AppStream land! Of course there is a lot more, additional tags for the LVFS and content rating have been added, lots of bugs have been squashed, the documentation has been refined a lot and the library has gained a lot of new API to make building software centers easier. Still, there is a lot to do and quite a few open feature requests too. Onwards to 1.0!

06 December, 2021 05:40PM by Matthias

hackergotchi for Paul Tagliamonte

Paul Tagliamonte

Proxying Ethernet Frames to PACKRAT (Part 5/5) 🐀

🐀 This post is part of a series called "PACKRAT". If this is the first post you've found, it'd be worth reading the intro post first and then looking over all posts in the series.

In the last post, we left off at being able to send and recieve PACKRAT frames to and from devices. Since we can transport IPv4 packets over the network, let’s go ahead and see if we can read/write Ethernet frames from a Linux network interface, and on the backend, read and write PACKRAT frames over the air. This has the benifit of continuing to allow Linux userspace tools to work (like cURL, as we’ll try!), which means we don’t have to do a lot of work to implement higher level protocols or tactics to get a connection established over the link.

Given that this post is less RF and more Linuxy, I’m going to include more code snippits than in prior posts, and those snippits are closer to runable Go, but still not complete examples. There’s also a lot of different ways to do this, I’ve just picked the easiest one for me to implement and debug given my existing tooling – for you, you may find another approach easier to implement!

Again, deviation here is very welcome, and since this segment is the least RF centric post in the series, the pace and tone is going to feel different. If you feel lost here, that’s OK. This isn’t the most important part of the series, and is mostly here to give a concrete ending to the story arc. Any way you want to finish your own journy is the best way for you to finish it!

Implement Ethernet conversion code

This assumes an importable package with a Frame struct, which we can use to convert a Frame to/from Ethernet. Given that the PACKRAT frame has a field that Ethernet doesn’t (namely, Callsign), that will need to be explicitly passed in when turning an Ethernet frame into a PACKRAT Frame.

...
// ToPackrat will create a packrat frame from an Ethernet frame.
func ToPackrat(callsign [8]byte, frame *ethernet.Frame) (*packrat.Frame, error) {
var frameType packrat.FrameType
switch frame.EtherType {
case ethernet.EtherTypeIPv4:
frameType = packrat.FrameTypeIPv4
default:
return nil, fmt.Errorf("ethernet: unsupported ethernet type %x", frame.EtherType)
}
return &packrat.Frame{
Destination: frame.Destination,
Source: frame.Source,
Type: frameType,
Callsign: callsign,
Payload: frame.Payload,
}, nil
}
// FromPackrat will create an Ethernet frame from a Packrat frame.
func FromPackrat(frame *packrat.Frame) (*ethernet.Frame, error) {
var etherType ethernet.EtherType
switch frame.Type {
case packrat.FrameTypeRaw:
return nil, fmt.Errorf("ethernet: unsupported packrat type 'raw'")
case packrat.FrameTypeIPv4:
etherType = ethernet.EtherTypeIPv4
default:
return nil, fmt.Errorf("ethernet: unknown packrat type %x", frame.Type)
}
// We lose the Callsign here, which is sad.
 return &ethernet.Frame{
Destination: frame.Destination,
Source: frame.Source,
EtherType: etherType,
Payload: frame.Payload,
}, nil
}

Our helpers, ToPackrat and FromPackrat can now be used to transmorgify PACKRAT into Ethernet, or Ethernet into PACKRAT. Let’s put them into use!

Implement a TAP interface

On Linux, the networking stack can be exposed to userland using TUN or TAP interfaces. TUN devices allow a userspace program to read and write data at the Layer 3 / IP layer. TAP devices allow a userspace program to read and write data at the Layer 2 Data Link / Ethernet layer. Writing data at Layer 2 is what we want to do, since we’re looking to transform our Layer 2 into Ethernet’s Layer 2 Frames. Our first job here is to create the actual TAP interface, set the MAC address, and set the IP range to our pre-coordinated IP range.

...
import (
"net"
"github.com/mdlayher/ethernet"
"github.com/songgao/water"
"github.com/vishvananda/netlink"
)
...
config := water.Config{DeviceType: water.TAP}
config.Name = "rat0"
iface, err := water.New(config)
...
netIface, err := netlink.LinkByName("rat0")
...
// Pick a range here that works for you!
 //
 // For my local network, I'm using some IPs
 // that AMPR (ampr.org) was nice enough to
 // allocate to me for ham radio use. Thanks,
 // AMPR!
 //
 // Let's just use 10.* here, though.
 //
 ip, cidr, err := net.ParseCIDR("10.0.0.1/24")
...
cidr.IP = ip
err = netlink.AddrAdd(netIface, &netlink.Addr{
IPNet: cidr,
Peer: cidr,
})
...
// Add all our neighbors to the ARP table
 for _, neighbor := range neighbors {
netlink.NeighAdd(&netlink.Neigh{
LinkIndex: netIface.Attrs().Index,
Type: netlink.FAMILY_V4,
State: netlink.NUD_PERMANENT,
IP: neighbor.IP,
HardwareAddr: neighbor.MAC,
})
}
// Pick a MAC that is globally unique here, this is
 // just used as an example!
 addr, err := net.ParseMAC("FA:DE:DC:AB:LE:01")
...
netlink.LinkSetHardwareAddr(netIface, addr)
...
err = netlink.LinkSetUp(netIface)
var frame = &ethernet.Frame{}
var buf = make([]byte, 1500)
for {
n, err := iface.Read(buf)
...
err = frame.UnmarshalBinary(buf[:n])
...
// process frame here (to come)
 }
...

Now that our network stack can resolve an IP to a MAC Address (via ip neigh according to our pre-defined neighbors), and send that IP packet to our daemon, it’s now on us to send IPv4 data over the airwaves. Here, we’re going to take packets coming in from our TAP interface, and marshal the Ethernet frame into a PACKRAT Frame and transmit it. As with the rest of the RF code, we’ll leave that up to the implementer, of course, using what was built during Part 2: Transmitting BPSK symbols and Part 4: Framing data.

...
for {
// continued from above

n, err := iface.Read(buf)
...
err = frame.UnmarshalBinary(buf[:n])
...
switch frame.EtherType {
case 0x0800:
// ipv4 packet
 pack, err := ToPackrat(
// Add my callsign to all Frames, for now
 [8]byte{'K', '3', 'X', 'E', 'C'},
frame,
)
...
err = transmitPacket(pack)
...
}
}
...

Now that we have transmitting covered, let’s go ahead and handle the recieve path here. We’re going to listen on frequency using the code built in Part 3: Receiving BPSK symbols and Part 4: Framing data. The Frames we decode from the airwaves are expected to come back from the call packratReader.Next in the code below, and the exact way that works is up to the implementer.

...
for {
// pull the next packrat frame from
 // the symbol stream as we did in the
 // last post
 packet, err := packratReader.Next()
...
// check for CRC errors and drop invalid
 // packets
 err = packet.Check()
...
if bytes.Equal(packet.Source, addr) {
// if we've heard ourself transmitting
 // let's avoid looping back
 continue
}
// create an ethernet frame
 frame, err := FromPackrat(packet)
...
buf, err := frame.MarshalBinary()
...
// and inject it into the tap
 err = iface.Write(buf)
...
}
...

Phew. Right. Now we should be able to listen for PACKRAT frames on the air and inject them into our TAP interface.

Putting it all Together

After all this work – weeks of work! – we can finally get around to putting some real packets over the air. For me, this was an incredibly satisfying milestone, and tied together months of learning!

I was able to start up a UDP server on a remote machine with an RTL-SDR dongle attached to it, listening on the TAP interface’s host IP with my defined MAC address, and send UDP packets to that server via PACKRAT using my laptop, /dev/udp and an Ettus B210, sending packets into the TAP interface.

Now that UDP was working, I was able to get TCP to work using two PlutoSDRs, which allowed me to run the cURL command I pasted in the first post (both simultaneously listen and transmit on behalf of my TAP interface).

It’s my hope that someone out there will be inspired to implement their own Layer 1 and Layer 2 as a learning exercise, and gets the same sense of gratification that I did! If you’re reading this, and at a point where you’ve been able to send IP traffic over your own Layer 1 / Layer 2, please get in touch! I’d be thrilled to hear all about it. I’d love to link to any posts or examples you publish here!

06 December, 2021 04:00PM

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

tidyCpp 0.0.6 on CRAN: Package Maintenance

Another small release of the tidyCpp package arrived on CRAN this morning. The packages offers a clean C++ layer (as well as one small C++ helper class) on top of the C API for R which aims to make use of this robust (if awkward) C API a little easier and more consistent. See the vignette for motivating examples.

This release makes a tiny code change, remove a YAML file for the disgraced former continuous integration service we shall not name (yet that we all used to use). And just like digest five days ago, drat four days ago, littler three days ago, RcppAPT two days ago, and RcppSpdlog yesterday, we converted the vignettes from using the minidown package to the (fairly new) simplermarkdown package which is so much more appropriate for our use of the minimal water.css style.

The NEWS entry follows.

Changes in tidyCpp version 0.0.6 (2021-12-06)

  • Assign nullptr in dtor for Protect class

  • Switch vignette engine to simplermarkdown

Thanks to my CRANberries, there is also a diffstat report for this release.

For questions, suggestions, or issues please use the issue tracker at the GitHub repo.

If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

06 December, 2021 02:16PM

December 05, 2021

RcppSpdlog 0.0.7 on CRAN: Package Maintenance

A new version 0.0.7 of RcppSpdlog is now on CRAN. RcppSpdlog bundles spdlog, a wonderful header-only C++ logging library with all the bells and whistles you would want that was written by Gabi Melman, and also includes fmt by Victor Zverovich.

This release brings upstream bugfix releases 1.9.1 and 1.9.2 of spdlog. We also removed the YAML file (and badge) for the disgraced former continuous integration service we shall not name (yet that we all used to use). And just like digest four days ago, drat three days ago, littler two days ago, and RcppAPT yesterday, we converted the vignettes from using the minidown package to the (fairly new) simplermarkdown package which is so much more appropriate for our use of the minimal water.css style.

The (minimal) NEWS entry for this release follows.

Changes in RcppSpdlog version 0.0.7 (2021-12-05)

  • Upgraded to upstream bug fix releases spdlog 1.9.1 and 1.9.2

  • Travis artifacts and badges have been pruned

  • Vignette now uses simplermarkdown

Courtesy of my CRANberries, there is also a diffstat report. More detailed information is on the RcppSpdlog page, or the package documention site.

If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

05 December, 2021 07:34PM

Reproducible Builds

Reproducible Builds in November 2021

Welcome to the November 2021 report from the Reproducible Builds project.

As a quick recap, whilst anyone may inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries. The motivation behind the reproducible builds effort is therefore to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. If you are interested in contributing to our project, please visit our Contribute page on our website.


On November 6th, Vagrant Cascadian presented at this year’s edition of the SeaGL conference, giving a talk titled Debugging Reproducible Builds One Day at a Time:

I’ll explore how I go about identifying issues to work on, learn more about the specific issues, recreate the problem locally, isolate the potential causes, dissect the problem into identifiable parts, and adapt the packaging and/or source code to fix the issues.

A video recording of the talk is available on archive.org.


Fedora Magazine published a post written by Zbigniew Jędrzejewski-Szmek about how to Use Diffoscope in packager workflows, specifically around ensuring that new versions of a package do not introduce breaking changes:

In the role of a packager, updating packages is a recurring task. For some projects, a packager is involved in upstream maintenance, or well written release notes make it easy to figure out what changed between the releases. This isn’t always the case, for instance with some small project maintained by one or two people somewhere on GitHub, and it can be useful to verify what exactly changed. Diffoscope can help determine the changes between package releases. []


kpcyrd announced the release of rebuilderd version 0.16.3 on our mailing list this month, adding support for builds to generate multiple artifacts at once.


Lastly, we held another IRC meeting on November 30th. As mentioned in previous reports, due to the global events throughout 2020 etc. there will be no in-person summit event this year.


diffoscope

diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb made the following changes, including preparing and uploading versions 190, 191, 192, 193 and 194 to Debian:

  • New features:

    • Continue loading a .changes file even if the referenced files do not exist, but include a comment in the returned diff. []
    • Log the reason if we cannot load a Debian .changes file. []
  • Bug fixes:

    • Detect XML files as XML files if file(1) claims if they are XML files or if they are named .xml. (#999438)
    • Don’t duplicate file lists at each directory level. (#989192)
    • Don’t raise a traceback when comparing nested directories with non-directories. []
    • Re-enable test_android_manifest. []
    • Don’t reject Debian .changes files if they contain non-printable characters. []
  • Codebase improvements:

    • Avoid aliasing variables if we aren’t going to use them. []
    • Use isinstance over type. []
    • Drop a number of unused imports. []
    • Update a bunch of %-style string interpolations into f-strings or str.format. []
    • When pretty-printing JSON, mark the difference as being reformatted, additionally avoiding including the full path. []
    • Import itertools top-level module directly. []

Chris Lamb also made an update to the command-line client to trydiffoscope, a web-based version of the diffoscope in-depth and content-aware diff utility, specifically only waiting for 2 minutes for try.diffoscope.org to respond in tests. (#998360)

In addition Brandon Maier corrected an issue where parts of large diffs were missing from the output [], Zbigniew Jędrzejewski-Szmek fixed some logic in the assert_diff_startswith method [] and Mattia Rizzolo updated the packaging metadata to denote that we support both Python 3.9 and 3.10 [] as well as a number of warning-related changes[][]. Vagrant Cascadian also updated the diffoscope package in GNU Guix [][].


Distribution work

In Debian, Roland Clobus updated the wiki page documenting Debian reproducible ‘Live’ images to mention some new bug reports and also posted an in-depth status update to our mailing list.

In addition, 90 reviews of Debian packages were added, 18 were updated and 23 were removed this month adding to our knowledge about identified issues. Chris Lamb identified a new toolchain issue, `absolute_path_in_cmake_file_generated_by_meson.


Work has begun on classifying reproducibility issues in packages within the Arch Linux distribution. Similar to the analogous effort within Debian (outlined above), package information is listed in a human-readable packages.yml YAML file and a sibling README.md file shows how to classify packages too.

Finally, Bernhard M. Wiedemann posted his monthly reproducible builds status report for openSUSE and Vagrant Cascadian updated a link on our website to link to the GNU Guix reproducibility testing overview [].


Software development

The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Elsewhere, in software development, Jonas Witschel updated strip-nondeterminism, our tool to remove specific non-deterministic results from a completed build so that it did not fail on JAR archives containing invalid members with a .jar extension []. This change was later uploaded to Debian by Chris Lamb.

reprotest is the Reproducible Build’s project end-user tool to build the same source code twice in widely different environments and checking whether the binaries produced by the builds have any differences. This month, Mattia Rizzolo overhauled the Debian packaging [][][] and fixed a bug surrounding suffixes in the Debian package version [], whilst Stefano Rivera fixed an issue where the package tests were broken after the removal of diffoscope from the package’s strict dependencies [].


Testing framework

The Reproducible Builds project runs a testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:

  • Holger Levsen:

    • Document the progress in setting up snapshot.reproducible-builds.org. []
    • Add the packages required for debian-snapshot. []
    • Make the dstat package available on all Debian based systems. []
    • Mark virt32b-armhf and virt64b-armhf as down. []
  • Jochen Sprickerhof:

    • Add SSH authentication key and enable access to the osuosl168-amd64 node. [][]
  • Mattia Rizzolo:

    • Revert “reproducible Debian: mark virt(32 64)b-armhf as down” - restored. []
  • Roland Clobus (Debian “live” image generation):

    • Rename sid internally to unstable until an issue in the snapshot system is resolved. []
    • Extend testing to include Debian bookworm too.. []
    • Automatically create the Jenkins ‘view’ to display jobs related to building the Live images. []
  • Vagrant Cascadian:

    • Add a Debian ‘package set’ group for the packages and tools maintained by the Reproducible Builds maintainers themselves. []



If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

05 December, 2021 06:33PM

hackergotchi for Paul Tagliamonte

Paul Tagliamonte

Framing data (Part 4/5) 🐀

🐀 This post is part of a series called "PACKRAT". If this is the first post you've found, it'd be worth reading the intro post first and then looking over all posts in the series.

In the last post, we we were able to build a functioning Layer 1 PHY where we can encode symbols to transmit, and receive symbols on the other end, we’re now at the point where we can encode and decode those symbols as bits and frame blocks of data, marking them with a Sender and a Destination for routing to the right host(s). This is a “Layer 2” scheme in the OSI model, which is otherwise known as the Data Link Layer. You’re using one to view this website right now – I’m willing to bet your data is going through an Ethernet layer 2 as well as WiFi or maybe a cellular data protocol like 5G or LTE.

Given that this entire exercise is hard enough without designing a complex Layer 2 scheme, I opted for simplicity in the hopes this would free me from the complexity and research that has gone into this field for the last 50 years. I settled on stealing a few ideas from Ethernet Frames – namely, the use of MAC addresses to identify parties, and the EtherType field to indicate the Payload type. I also stole the idea of using a CRC at the end of the Frame to check for corruption, as well as the specific CRC method (crc32 using 0xedb88320 as the polynomial).

Lastly, I added a callsign field to make life easier on ham radio frequencies if I was ever to seriously attempt to use a variant of this protocol over the air with multiple users. However, given this scheme is not a commonly used scheme, it’s best practice to use a nearby radio to identify your transmissions on the same frequency while testing – or use a Faraday box to test without transmitting over the airwaves. I added the callsign field in an effort to lean into the spirit of the Part 97 regulations, even if I relied on a phone emission to identify the Frames.

As an aside, I asked the ARRL for input here, and their stance to me over email was I’d be OK according to the regs if I were to stick to UHF and put my callsign into the BPSK stream using a widely understood encoding (even with no knowledge of PACKRAT, the callsign is ASCII over BPSK and should be easily demodulatable for followup with me). Even with all this, I opted to use FM phone to transmit my callsign when I was active on the air (specifically, using an SDR and a small bash script to automate transmission while I watched for interference or other band users).

Right, back to the Frame:

sync
dest
source
callsign
type
payload
crc

With all that done, I put that layout into a struct, so that we can marshal and unmarshal bytes to and from our Frame objects, and work with it in software.

type FrameType [2]byte
type Frame struct {
Destination net.HardwareAddr
Source net.HardwareAddr
Callsign [8]byte
Type FrameType
Payload []byte
CRC uint32
}

Time to pick some consts

I picked a unique and distinctive sync sequence, which the sender will transmit before the Frame, while the receiver listens for that sequence to know when it’s in byte alignment with the symbol stream. My sync sequence is [3]byte{'U', 'f', '~'} which works out to be a very pleasant bit sequence of 01010101 01100110 01111110. It’s important to have soothing preambles for your Frames. We need all the good energy we can get at this point.

var (
FrameStart = [3]byte{'U', 'f', '~'}
FrameMaxPayloadSize = 1500
)

Next, I defined some FrameType values for the type field, which I can use to determine what is done with that data next, something Ethernet was originally missing, but has since grown to depend on (who needs Length anyway? Not me. See below!)

FrameType Description Bytes
Raw Bytes in the Payload field are opaque and not to be parsed. [2]byte{0x00, 0x01}
IPv4 Bytes in the Payload field are an IPv4 packet. [2]byte{0x00, 0x02}

And finally, I decided on a maximum length of the Payload, and decided on limiting it to 1500 bytes to align with the MTU of Ethernet.

var (
FrameTypeRaw = FrameType{0, 1}
FrameTypeIPv4 = FrameType{0, 2}
)

Given we know how we’re going to marshal and unmarshal binary data to and from Frames, we can now move on to looking through the bit stream for our Frames.

Why is there no Length field?

I was initially a bit surprised that Ethernet Frames didn’t have a Length field in use, but the more I thought about it, the more it seemed like a big ole' failure mode without a good implementation outcome. Either the Length is right (resulting in no action and used bits on every packet) or the Length is not the length of the Payload and the driver needs to determine what to do with the packet – does it try and trim the overlong payload and ignore the rest? What if both the end of the read bytes and the end of the subset of the packet denoted by Length have a valid CRC? Which is used? Will everyone agree? What if Length is longer than the Payload but the CRC is good where we detected a lost carrer?

I decided on simplicity. The end of a Frame is denoted by the loss of the BPSK carrier – when the signal is no longer being transmitted (or more correctly, when the signal is no longer received), we know we’ve hit the end of a packet. Missing a single symbol will result in the Frame being finalized. This can cause some degree of corruption, but it’s also a lot easier than doing tricks like bit stuffing to create an end of symbol stream delimiter.

Finding the Frame start in a Symbol Stream

First thing we need to do is find our sync bit pattern in the symbols we’re receiving from our BPSK demodulator. There’s some smart ways to do this, but given that I’m not much of a smart man, I again decided to go for simple instead. Given our incoming vector of symbols (which are still float values) prepend one at a time to a vector of floats that is the same length as the sync phrase, and compare against the sync phrase, to determine if we’re in sync with the byte boundary within the symbol stream.

The only trick here is that because we’re using BPSK to modulate and demodulate the data, post phaselock we can be 180 degrees out of alignment (such that a +1 is demodulated as -1, or vice versa). To deal with that, I check against both the sync phrase as well as the inverse of the sync phrase (both [1, -1, 1] as well as [-1, 1, -1]) where if the inverse sync is matched, all symbols to follow will be inverted as well. This effectively turns our symbols back into bits, even if we’re flipped out of phase. Other techniques like NRZI will represent a 0 or 1 by a change in phase state – which is great, but can often cascade into long runs of bit errors, and is generally more complex to implement. That representation isn’t ambiguous, given you look for a phase change, not the absolute phase value, which is incredibly compelling.

Here’s a notional example of how I’ve been thinking about the phrase sliding window – and how I’ve been thinking of the checks. Each row is a new symbol taken from the BPSK receiver, and pushed to the head of the sliding window, moving all symbols back in the vector by one.

 var (
sync = []float{ ... }
buf = make([]float, len(sync))
incomingSymbols = []float{ ... }
)
for _, el := range incomingSymbols {
copy(buf, buf[1:])
buf[len(buf)-1] = el
if compare(sync, buf) {
// we're synced!
 break
}
}

Given the pseudocode above, let’s step through what the checks would be doing at each step:

Buffer Sync Inverse Sync
[…]float{0,…,0} ❌ […]float{-1,…,-1} ❌ […]float{1,…,1}
[…]float{0,…,1} ❌ […]float{-1,…,-1} ❌ […]float{1,…,1}
[more bits in] ❌ […]float{-1,…,-1} ❌ […]float{1,…,1}
[…]float{1,…,1} ❌ […]float{-1,…,-1} ✅ […]float{1,…,1}

After this notional set of comparisons, we know that at the last step, we are now aligned to the frame and byte boundary – the next symbol / bit will be the MSB of the 0th Frame byte. Additionally, we know we’re also 180 degrees out of phase, so we need to flip the symbol’s sign to get the bit. From this point on we can consume 8 bits at a time, and re-assemble the byte stream. I don’t know what this technique is called – or even if this is used in real grown-up implementations, but it’s been working for my toy implementation.

Next Steps

Now that we can read/write Frames to and from PACKRAT, the next steps here are going to be implementing code to encode and decode Ethernet traffic into PACKRAT, coming next in Part 5!

05 December, 2021 04:00PM

hackergotchi for Steinar H. Gunderson

Steinar H. Gunderson

Leaving MySQL

Today was my last day at Oracle, and thus also in the MySQL team.

When a decision comes to switch workplaces, there's always the question of “why”, but that question always has multiple answers, and perhaps the simplest one is that I found another opportunity, and and as a whole, it was obvious it was time to move on when that arrived.

But it doesn't really explain why I did go looking for that somewhere else in the first place. The reasons for that are again complex, and it's not possible to reduce to a single thing. But nevertheless, let me point out something that I've been saying both internally and externally for the last five years (although never on a stage—which explains why I've been staying away from stages talking about MySQL): MySQL is a pretty poor database, and you should strongly consider using Postgres instead.1

Coming to MySQL was like stepping into a parallel universe, where there were lots of people genuinely believing that MySQL was a state-of-the-art product. At the same time, I was attending orientation and told how the optimizer worked internally, and I genuinely needed shock pauses to take in how primitive nearly everything was. It felt bizarre, but I guess you soon get used to it. In a sense, it didn't bother me that much; lots of bad code means there's plenty of room for opportunity for improvement, and management was strongly supportive of large refactors. More jarring were the people who insisted everything was OK (it seems most MySQL users and developers don't really use other databases); even obviously crazy things like the executor, where everything was one big lump and everything interacted with everything else2, was hailed as “efficient” (it wasn't).

Don't get me wrong; I am genuinely proud of the work I have been doing, and MySQL 8.0 (with its ever-increasing minor version number) is a much better product than 5.7 was—and it will continue to improve. But there is only so much you can do; the changes others and I have been doing take the MySQL optimizer towards a fairly standard early-2000s design with some nice tweaks, but that's also where it ends. (Someone called it “catching up, one decade at a time”, and I'm not sure if it was meant positively or negatively, but I thought a bit of it as a badge of honor.) In the end, there's just not enough resources that I could see it turn into a competitive product, no matter how internal company communications tried to spin that Oracle is filled with geniuses and WE ARE WINNING IN THE CLOUD. And that's probably fine (and again, not really why I quit); if you're using MySQL and it works for you, sure, go ahead. But perhaps consider taking a look at the other side of that fence at some point, past the “OMG vacuum” memes.

My new role will be in the Google Chrome team. It was probably about time; my T-shirt collection was getting a bit worn.

1 Don't believe for a second that MariaDB is any better. Monty and his merry men left because they were unhappy about the new governance, not because they suddenly woke up one day and realized what a royal mess they had created in the code.

2 For instance, the sorter literally had to care whether its input came from a table scan or a range scan, because there was no modularity. Anything that wasn't either of those two, including joins, required great contortions. Full outer joins were simply impossible to execute in the given design without rewriting the query (MySQL still doesn't support them, but at least now it's not hampered by the old we-can-do-left-deep-plans-only design). And don't even get me started on the “slice” system, which is perhaps the single craziest design I've ever seen in any real-world software.

05 December, 2021 03:41PM

Reproducible Builds (diffoscope)

diffoscope 195 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 195. This version includes the following changes:

[ Chris Lamb ]
* Don't use the runtime platform's native endianness when unpacking .pyc
  files to fix test failures on big-endian machines.

You find out more by visiting the project homepage.

05 December, 2021 12:00AM

December 04, 2021

hackergotchi for Jonathan Dowland

Jonathan Dowland

Haskell mortgage calculator

A few months ago I was trying to compare two mortgage offers, and ended up writing a small mortgage calculator to help me. Both mortgages were fixed-term for the same time period (5 years). One of the mortgages had a lower rate than the other, but much higher arrangement fees.

A broker recommended the mortgage with the higher rate but lower fee, on an affordability basis for the fixed term: over all, we would spend less money within the fixed term on that deal than the other. (I thought) this left one bit of information missing: what remaining balance would there be at the end of the term?

The mortgages I want to model are defined in terms of a monthly repayment figure and an annual interest rate for the fixed period. I think interest is usually recalculated on a daily basis, so I convert the annual rate down to a daily rate.

Repayments only happen once a month. Months are not all the same size. Using mod 30 on the 'day' approximates a monthly payment. Over 5 years, there would be 60 months, meaning 60 repayments. (I'm ignoring leap years)

λ> length . filter id .take (5*365) $ [ x`mod`30==0 | x <- [1..]]
60

Here's what I came up with. I was a little concerned the repayment approximation was too far out so I compared the output with a more precise (but boring) spreadsheet and they agreed to within an acceptable tolerance.

The numbers that follow are all made up to illustrate the function and don't reflect my actual mortgage. :)

borrowed = 1000000 -- day 0 amount outstanding

aer   = 0.89
repay = 1000
der   = aer / 36

owed n | n == 0          = borrowed
       | n `mod` 30 == 0 = last + interest - repay
       | otherwise       = last + interest
    where
        last     = owed (n - 1)
        interest = last * der

04 December, 2021 10:01PM

hackergotchi for Paul Tagliamonte

Paul Tagliamonte

Receiving BPSK symbols (Part 3/5) 🐀

🐀 This post is part of a series called "PACKRAT". If this is the first post you've found, it'd be worth reading the intro post first and then looking over all posts in the series.

In the last post, we worked through how to generate a BPSK signal, and hopefully transmit it using one of our SDRs. Let’s take that and move on to Receiving BPSK and turning that back into symbols!

Demodulating BPSK data is a bit more tricky than transmitting BPSK data, mostly due to tedious facts of life such as space, time, and hardware built with compromises because not doing that makes the problem impossible. Unfortunately, it’s now our job to work within our imperfect world to recover perfect data. We need to handle the addition of noise, differences in frequency, clock synchronization and interference in order to recover our information. This makes life a lot harder than when we transmit information, and as a result, a lot more complex.

Coarse Sync

Our starting point for this section will be working from a capture of a number of generated PACKRAT packets as heard by a PlutoSDR at (xz compressed interleaved int16, 2,621,440 samples per second)

Every SDR has its own oscillator, which eventually controls a number of different components of an SDR, such as the IF (if it’s a superheterodyne architecture) and the sampling rate. Drift in oscillators lead to drifts in frequency – such that what one SDR may think is 100MHz may be 100.01MHz for another radio. Even if the radios were perfectly in sync, other artifacts such as doppler time dilation due to motion can cause the frequency to appear higher or lower in frequency than it was transmitted.

All this is a long way of saying, we need to determine when we see a strong signal that’s close-ish to our tuned frequency, and take steps to roughly correct it to our center frequency (in the order of 100s of Hz to kHz) in order to acquire a phase lock on the signal to attempt to decode information contained within.

The easiest way of detecting the loudest signal of interest is to use an FFT. Getting into how FFTs work is out of scope of this post, so if this is the first time you’re seeing mention of an FFT, it may be a good place to take a quick break to learn a bit about the time domain (which is what the IQ data we’ve been working with so far is), frequency domain, and how the FFT and iFFT operations can convert between them.

Lastly, because FFTs average power over the window, swapping phases such that the transmitted wave has the same number of in-phase and inverted-phase symbols the power would wind up averaging to zero. This is not helpful, so I took a tip from Dr. Marc Lichtman’s PySDR project and used complex squaring to drive our BPSK signal into a single detectable carrier by squaring the IQ data. Because points are on the unit circle and at tau/2 (specifically, tau/(2^1) for BPSK, 2^2 for QPSK) angles, and given that squaring has the effect of doubling the angle, and angles are all mod tau, this will drive our wave comprised of two opposite phases back into a continuous wave – effectively removing our BPSK modulation, making it much easier to detect in the frequency domain. Thanks to Tom Bereknyei for helping me with that!

...
var iq []complex{}
var freq []complex{}
for i := range iq {
iq[i] = iq[i] * iq[i]
}
// perform an fft, computing the frequency
 // domain vector in `freq` given the iq data
 // contained in `iq`.
 fft(iq, freq)
// get the array index of the max value in the
 // freq array given the magnitude value of the
 // complex numbers.
 var binIdx = max(abs(freq))
...

Now, most FFT operations will lay the frequency domain data out a bit differently than you may expect (as a human), which is that the 0th element of the FFT is 0Hz, not the most negative number (like in a waterfall). Generally speaking, “zero first” is the most common frequency domain layout (and generally speaking the most safe assumption if there’s no other documentation on fft layout). “Negative first” is usually used when the FFT is being rendered for human consumption – such as a waterfall plot.

Given that we now know which FFT bin (which is to say, which index into the FFT array) contains the strongest signal, we’ll go ahead and figure out what frequency that bin relates to.

In the time domain, each complex number is the next time instant. In the frequency domain, each bin is a discrete frequency – or more specifically – a frequency range. The bandwidth of the bin is a function of the sampling rate and number of time domain samples used to do the FFT operation. As you increase the amount of time used to preform the FFT, the more precise the FFT measurement of frequency can be, but it will cover the same bandwidth, as defined by the sampling rate.

...
var sampleRate = 2,621,440
// bandwidth is the range of frequencies
 // contained inside a single FFT bin,
 // measured in Hz.
 var bandwidth = sampleRate/len(freq)
...

Now that we know we have a zero-first layout and the bin bandwidth, we can compute what our frequency offset is in Hz.

...
// binIdx is the index into the freq slice
 // containing the frequency domain data.
 var binIdx = 0
// binFreq is the frequency of the bin
 // denoted by binIdx
 var binFreq = 0
if binIdx > len(freq)/2 {
// This branch covers the case where the bin
 // is past the middle point - which is to say,
 // if this is a negative frequency.
 binFreq = bandwidth * (binIdx - len(freq))
} else {
// This branch covers the case where the bin
 // is in the first half of the frequency array,
 // which is to say - if this frequency is
 // a positive frequency.
 binFreq = bandwidth * binIdx
}
...

However, sice we squared the IQ data, we’re off in frequency by twice the actual frequency – if we are reading 12kHz, the bin is actually 6kHz. We need to adjust for that before continuing with processing.

...
var binFreq = 0
...
// [compute the binFreq as above]
 ...
// Adjust for the squaring of our IQ data
 binFreq = binFreq / 2
...

Finally, we need to shift the frequency by the inverse of the binFreq by generating a carrier wave at a specific frequency and rotating every sample by our carrier wave – so that a wave at the same frequency will slow down (or stand still!) as it approaches 0Hz relative to the carrier wave.

 var tau = pi * 2
// ts tracks where in time we are (basically: phase)
 var ts float
// inc is the amount we step forward in time (seconds)
 // each sample.
 var inc float = (1 / sampleRate)
// amount to shift frequencies, in Hz,
 // in this case, shift +12 kHz to 0Hz
 var shift = -12,000
for i := range iq {
ts += inc
if ts > tau {
// not actually needed, but keeps ts within
 // 0 to 2*pi (since it is modulus 2*pi anyway)
 ts -= tau
}
// Here, we're going to create a carrier wave
 // at the provided frequency (in this case,
 // -12kHz)
 cwIq = complex(cos(tau*shift*ts), sin(tau*shift*ts))
iq[i] = iq[i] * cwIq
}

Now we’ve got the strong signal we’ve observed (which may or may not be our BPSK modulated signal!) close enough to 0Hz that we ought to be able to Phase Lock the signal in order to begin demodulating the signal.

Filter

After we’re roughly in the neighborhood of a few kHz, we can now take some steps to cut out any high frequency components (both positive high frequencies and negative high frequencies). The normal way to do this would be to do an FFT, apply the filter in the frequency domain, and then do an iFFT to turn it back into time series data. This will work in loads of cases, but I’ve found it to be incredibly tricky to get right when doing PSK. As such, I’ve opted to do this the old fashioned way in the time domain.

I’ve – again – opted to go simple rather than correct, and haven’t used nearly any of the advanced level trickery I’ve come across for fear of using it wrong. As a result, our process here is going to be generating a sinc filter by computing a number of taps, and applying that in the time domain directly on the IQ stream.

// Generate sinc taps

func sinc(x float) float {
if x == 0 {
return 1
}
var v = pi * x
return sin(v) / v
}
...
var dst []float
var length = float(len(dst))
if int(length)%2 == 0 {
length++
}
for j := range dst {
i := float(j)
dst[j] = sinc(2 * cutoff * (i - (length-1)/2))
}
...

then we apply it in the time domain

...
// Apply sinc taps to an IQ stream

var iq []complex
// taps as created in `dst` above
 var taps []float
var delay = make([]complex, len(taps))
for i := range iq {
// let's shift the next sample into
 // the delay buffer
 copy(delay[1:], delay)
delay[0] = iq[i]
var phasor complex
for j := range delay {
// for each sample in the buffer, let's
 // weight them by the tap values, and
 // create a new complex number based on
 // filtering the real and imag values.
 phasor += complex(
taps[j] * real(delay[j]),
taps[j] * imag(delay[j]),
)
}
// now that we've run this sample
 // through the filter, we can go ahead
 // and scale it back (since we multiply
 // above) and drop it back into the iq
 // buffer.
 iq[i] = complex(
real(phasor) / len(taps),
imag(phasor) / len(taps),
)
}
...

After running IQ samples through the taps and back out, we’ll have a signal that’s been filtered to the shape of our designed Sinc filter – which will cut out captured high frequency components (both positive and negative).

Astute observers will note that we’re using the real (float) valued taps on both the real and imaginary values independently. I’m sure there’s a way to apply taps using complex numbers, but it was a bit confusing to work through without being positive of the outcome. I may revisit this in the future!

Downsample

Now, post-filter, we’ve got a lot of extra RF bandwidth being represented in our IQ stream at our high sample rate All the high frequency values are now filtered out, which means we can reduce our sampling rate without losing much information at all. We can either do nothing about it and process at the fairly high sample rate we’re capturing at, or we can drop the sample rate down and help reduce the volume of numbers coming our way.

There’s two big ways of doing this; either you can take every Nth sample (e.g., take every other sample to half the sample rate, or take every 10th to decimate the sample stream to a 10th of what it originally was) which is the easiest to implement (and easy on the CPU too), or to average a number of samples to create a new sample.

A nice bonus to averaging samples is that you can trade-off some CPU time for a higher effective number of bits (ENOB) in your IQ stream, which helps reduce noise, among other things. Some hardware does exactly this (called “Oversampling”), and like many things, it has some pros and some cons. I’ve opted to treat our IQ stream like an oversampled IQ stream and average samples to get a marginal bump in ENOB.

Taking a group of 4 samples and averaging them results in a bit of added precision. That means that a stream of IQ data at 8 ENOB can be bumped to 9 ENOB of precision after the process of oversampling and averaging. That resulting stream will be at 1/4 of the sample rate, and this process can be repeated 4 samples can again be taken for a bit of added precision; which is going to be 1/4 of the sample rate (again), or 1/16 of the original sample rate. If we again take a group of 4 samples, we’ll wind up with another bit and a sample rate that’s 1/64 of the original sample rate.

Phase Lock

Our starting point for this section is the same capture as above, but post-coarse sync, filtering downsampling (xz compressed interleaved float32, 163,840 samples per second)

The PLL in PACKRAT was one of the parts I spent the most time stuck on. There’s no shortage of discussions of how hardware PLLs work, or even a few software PLLs, but very little by way of how to apply them and/or troubleshoot them. After getting frustrated trying to follow the well worn path, I decided to cut my own way through the bush using what I had learned about the concept, and hope that it works well enough to continue on.

PLLs, in concept are fairly simple – you generate a carrier wave at a frequency, compare the real-world SDR IQ sample to where your carrier wave is in phase, and use the difference between the local wave and the observed wave to adjust the frequency and phase of your carrier wave. Eventually, if all goes well, that delta is driven as small as possible, and your carrier wave can be used as a reference clock to determine if the observed signal changes in frequency or phase.

In reality, tuning PLLs is a total pain, and basically no one outlines how to apply them to BPSK signals in a descriptive way. I’ve had to steal an approach I’ve seen in hardware to implement my software PLL, with any hope it’s close enough that this isn’t a hazard to learners. The concept is to generate the carrier wave (as above) and store some rolling averages to tune the carrier wave over time. I use two constants, “alpha” and “beta” (which appear to be traditional PLL variable names for this function) which control how quickly the frequency and phase is changed according to observed mismatches. Alpha is set fairly high, which means discrepancies between our carrier and observed data are quickly applied to the phase, and a lower constant for Beta, which will take long-term errors and attempt to use that to match frequency.

This is all well and good. Getting to this point isn’t all that obscure, but the trouble comes when processing a BPSK signal. Phase changes kick the PLL out of alignment and it tends to require some time to get back into phase lock, when we really shouldn’t even be loosing it in the first place. My attempt is to generate two predicted samples, one for each phase of our BPSK signal. The delta is compared, and the lower error of the two is used to adjust the PLL, but the carrier wave itself is used to rotate the sample.

 var alpha = 0.1
var beta = (alpha * alpha) / 2
var phase = 0.0
var frequency = 0.0
...
for i := range iq {
predicted = complex(cos(phase), sin(phase))
sample = iq[i] * conj(predicted)
delta = phase(sample)
predicted2 = complex(cos(phase+pi), sin(phase+pi))
sample2 = iq[i] * conj(predicted2)
delta2 = phase(sample2)
if abs(delta2) < abs(delta) {
// note that we do not update 'sample'.
 delta = delta2
}
phase += alpha * delta
frequency += beta * delta
// adjust the iq sample to the PLL rotated
 // sample.
 iq[i] = sample
}
...

If all goes well, this loop has the effect of driving a BPSK signal’s imaginary values to 0, and the real value between +1 and -1.

Average Idle / Carrier Detect

Our starting point for this section is the same capture as above, but post-PLL (xz compressed interleaved float32, 163,840 samples per second)

When we start out, we have IQ samples that have been mostly driven to an imaginary component of 0 and real value range between +1 and -1 for each symbol period. Our goal now is to determine if we’re receiving a signal, and if so, determine if it’s +1 or -1. This is a deceptively hard problem given it spans a lot of other similarly entertaining hard problems. I’ve opted to not solve the hard problems involved and hope that in practice my very haphazard implementation works well enough. This turns out to be both good (not solving a problem is a great way to not spend time on it) and bad (turns out it does materially impact performance). This segment is the one I plan on revisiting, first. Expect more here at some point!

Given that I want to be able to encapsulate three states in the output from this section (our Symbols are no carrier detected (“0”), real value 1 (“1”) or real value -1 ("-1")), which means spending cycles to determine what the baseline noise is to try and identify when a signal breaks through the noise becomes incredibly important.

var idleThreshold
var thresholdFactor = 10
...
// sigThreshold is used to determine if the symbol
 // is -1, +1 or 0. It's 1.3 times the idle signal
 // threshold.
 var sigThreshold = (idleThreshold * 0.3) + idleThreshold
// iq contains a single symbol's worth of IQ samples.
 // clock alignment isn't really considered; so we'll
 // get a bad packet if we have a symbol transition
 // in the middle of this buffer. No attempt is made
 // to correct for this yet.
 var iq []complex
// avg is used to average a chunk of samples in the
 // symbol buffer.
 var avg float
var mid = len(iq) / 2
// midNum is used to determine how many symbols to
 // average at the middle of the symbol.
 var midNum = len(iq) / 50
for j := mid; j < mid+midNum; j++ {
avg += real(iq[j])
}
avg /= midNum
var symbol float
switch {
case avg > sigThreshold:
symbol = 1
case avg < -sigThreshold:
symbol = -1
default:
symbol = 0
// update the idleThreshold using the thresholdFactor
 // to average the idleThreshold over more samples to
 // get a better idea of average noise.
 idleThreshold = (
(idleThreshold*(thresholdFactor-1) + symbol) \
/ thresholdFactor
)
}
// write symbol to output somewhere
...

Next Steps

Now that we have a stream of values that are either +1, -1 or 0, we can frame / unframe the data contained in the stream, and decode Packets contained inside, coming next in Part 4!

04 December, 2021 04:00PM

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

RcppAPT 0.0.8: Package Maintenance

A new version of the RcppAPT package interfacing from R to the C++ library behind the awesome apt, apt-get, apt-cache, … commands and their cache powering Debian, Ubuntu and the like arrived on CRAN earlier today.

RcppAPT allows you to query the (Debian or Ubuntu) package dependency graph at will, with build-dependencies (if you have deb-src entries), reverse dependencies, and all other goodies. See the vignette and examples for illustrations.

This release updates some package metadata, adds a new package testing helper, and, just like digest three days ago, drat two days ago, and littler yesterday, we converted the vignettes from using the minidown package to the (fairly new) simplermarkdown package which is so much more appropriate for our use of the minimal water.css style.

Changes in version 0.0.8 (2021-12-04)

  • New test file version.R ensures NEWS file documents current package version

  • Travis artifacts and badges have been pruned

  • Vignettes now use simplermarkdown

Courtesy of my CRANberries, there is also a diffstat report for this release. A bit more information about the package is available here as well as as the GitHub repo.

If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

04 December, 2021 03:33PM

December 03, 2021

hackergotchi for Paul Tagliamonte

Paul Tagliamonte

Transmitting BPSK symbols (Part 2/5) 🐀

🐀 This post is part of a series called "PACKRAT". If this is the first post you've found, it'd be worth reading the intro post first and then looking over all posts in the series.

In the last post, we worked through what IQ is, and different formats that it may be sent or received in. Let’s take that and move on to Transmitting BPSK using IQ data!

When we transmit and receive information through RF using an SDR, data is traditionally encoded into a stream of symbols which are then used by a program to modulate the IQ stream, and sent over the airwaves.

PACKRAT uses BPSK to encode Symbols through RF. BPSK is the act of modulating the phase of a sine wave to carry information. The transmitted wave swaps between two states in order to convey a 0 or a 1. Our symbols modulate the transmitted sine wave’s phase, so that it moves between in-phase with the SDR’s transmitter and 180 degrees (or π radians) out of phase with the SDR’s transmitter.

The difference between a “Bit” and a “Symbol” in PACKRAT is not incredibly meaningful, and I’ll often find myself slipping up when talking about them. I’ve done my best to try and use the right word at the right stage, but it’s not as obvious where the line between bit and symbol is – at least not as obvious as it would be with QPSK or QAM. The biggest difference is that there are three meaningful states for PACKRAT over BPSK - a 1 (for “In phase”), -1 (for “180 degrees out of phase”) and 0 (for “no carrier”). For my implementation, a stream of all zeros will not transmit data over the airwaves, a stream of all 1s will transmit all “1” bits over the airwaves, and a stream of all -1s will transmit all “0” bits over the airwaves.

We’re not going to cover turning a byte (or bit) into a symbol yet – I’m going to write more about that in a later section. So for now, let’s just worry about symbols in, and symbols out.

Transmitting a Sine wave at 0Hz

If we go back to thinking about IQ data as a precisely timed measurements of energy over time at some particular specific frequency, we can consider what a sine wave will look like in IQ. Before we dive into antennas and RF, let’s go to something a bit more visual.

For the first example, you can see an example of a camera who’s frame rate (or Sampling Rate!) matches the exact number of rotations per second (or Frequency!) of the propeller and it appears to stand exactly still. Every time the Camera takes a frame, it’s catching the propeller in the exact same place in space, even though it’s made a complete rotation.

The second example is very similar, it’s a light strobing (in this case, our sampling rate, since the darkness is ignored by our brains) at the same rate (frequency) as water dropping from a faucet – and the video creator is even nice enough to change the sampling frequency to have the droplets move both forward and backward (positive and negative frequency) in comparison to the faucet.

IQ works the same way. If we catch something in perfect frequency alignment with our radio, we’ll wind up with readings that are the same for the entire stream of data. This means we can transmit a sine wave by setting all of the IQ samples in our buffer to 1+0i, which will transmit a pure sine wave at exactly the center frequency of the radio.

 var sine []complex{}
for i := range sine {
sine[i] = complex(1.0, 0.0)
}

Alternatively, we can transmit a Sine wave (but with the opposite phase) by flipping the real value from 1 to -1. The same Sine wave is transmitted on the same Frequency, except when the wave goes high in the example above, the wave will go low in the example below.

 var sine []complex{}
for i := range sine {
sine[i] = complex(-1.0, 0.0)
}

In fact, we can make a carrier wave at any phase angle and amplitude by using a bit of trig.

 // angle is in radians - here we have
 // 1.5 Pi (3 Tau) or 270 degrees.
 var angle = pi * 1.5
// amplitude controls the transmitted
 // strength of the carrier wave.
 var amplitude = 1.0
// output buffer as above
 var sine []complex{}
for i := range sine {
sine[i] = complex(
amplitude*cos(angle),
amplitude*sin(angle),
)
}

The amplitude of the transmitted wave is the absolute value of the IQ sample (sometimes called magnitude), and the phase can be computed as the angle (or argument). The amplitude remains constant (at 1) in both cases. Remember back to the airplane propeller or water droplets – we’re controlling where we’re observing the sine wave. It looks like a consistent value to us, but in reality it’s being transmitted as a pure carrier wave at the provided frequency. Changing the angle of the number we’re transmitting will control where in the sine wave cycle we’re “observing” it at.

Generating BPSK modulated IQ data

Modulating our carrier wave with our symbols is fairly straightforward to do – we can multiply the symbol by 1 to get the real value to be used in the IQ stream. Or, more simply - we can just use the symbol directly in the constructed IQ data.

 var sampleRate = 2,621,440
var baudRate = 1024
// This represents the number of IQ samples
 // required to send a single symbol at the
 // provided baud and sample rate. I picked
 // two numbers in order to avoid half samples.
 // We will transmit each symbol in blocks of
 // this size.
 var samplesPerSymbol = sampleRate / baudRate
var samples = make([]complex, samplesPerSymbol)
// symbol is one of 1, -1 or 0.
 for each symbol in symbols {
for i := range samples {
samples[i] = complex(symbol, 0)
}
// write the samples out to an output file
 // or radio.
 write(samples)
}

If you want to check against a baseline capture, here’s 10 example packets at 204800 samples per second.

Next Steps

Now that we can transmit data, we’ll start working on a receive path in Part 3, in order to check our work when transmitting the packets, as well as being able to hear packets we transmit from afar, coming up next in Part 3!!

03 December, 2021 04:00PM

hackergotchi for Dirk Eddelbuettel

Dirk Eddelbuettel

littler 0.3.15 on CRAN: Package Updates

max-heap image

The sixteenth release of littler as a CRAN package just landed, following in the now fifteen year history (!!) as a package started by Jeff in 2006, and joined by me a few weeks later.

littler is the first command-line interface for R as it predates Rscript. It allows for piping as well for shebang scripting via #!, uses command-line arguments more consistently and still starts faster. It also always loaded the methods package which Rscript only started to do in recent years.

littler lives on Linux and Unix, has its difficulties on macOS due to yet-another-braindeadedness there (who ever thought case-insensitive filesystems as a default were a good idea?) and simply does not exist on Windows (yet – the build system could be extended – see RInside for an existence proof, and volunteers are welcome!). See the FAQ vignette on how to add it to your PATH. A few examples are highlighted at the Github repo, as well as in the examples vignette.

This release brings a more robust and featureful install2.r script (thanks to Gergely Daróczi), corrects some documentation typos (thanks to John Kerl), and now compacts pdf vignette better when using the build.r helper. It also one more updates the URLs for the two RStudio downloaders, and adds a simplermarkdown wrapper. Next, we removed the YAML file (and badge) for the disgraced former continuous integration service we shall not name (yet that we all used to use). And, following digest two days ago and drat yesterday, we converted the vignettes from using the minidown package to the (fairly new) simplermarkdown package which is so much more appropriate for our use of the minimal water.css style.

The full change description follows.

Changes in littler version 0.3.15 (2021-12-03)

  • Changes in examples

    • The install2 script can select download methods, and cope with errors from parallel download (thanks to Gergely Daroczi)

    • The build.r now uses both as argument to --compact-vignettes

    • The RStudio download helper were once again updated for changed URLs

    • New caller for simplermarkdown::mdweave_to_html

  • Changes in package

    • Several typos were correct (thanks to John Kerl)

    • Travis artifacts and badges have been pruned

    • Vignettes now use simplermarkdown

My CRANberries service provides a comparison to the previous release. Full details for the littler release are provided as usual at the ChangeLog page, and also on the package docs website. The code is available via the GitHub repo, from tarballs and now of course also from its CRAN page and via install.packages("littler"). Binary packages are available directly in Debian as well as soon via Ubuntu binaries at CRAN thanks to the tireless Michael Rutter.

Comments and suggestions are welcome at the GitHub repo.

If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

03 December, 2021 12:37PM

Petter Reinholdtsen

A Brazilian Portuguese translation of the book Made with Creative Commons

A few days ago, a productive translator started working on a new translation of the Made with Creative Commons book for Brazilian Portuguese. The translation take place on the Weblate web based translation system. Once the translation is complete and proof read, we can publish it on paper as well as in PDF, ePub and HTML format. The translation is already 16% complete, and if more people get involved I am conviced it can very quickly reach 100%. If you are interested in helping out with this or other translations of the Made with Creative Commons book, start translating on Weblate. There are partial translations available in Azerbaijani, Bengali, Brazilian Portuguese, Dutch, French, German, Greek, Polish, Simplified Chinese, Swedish, Thai and Ukrainian.

The git repository for the book contain all source files needed to build the book for yourself. HTML editions to help with proof reading is also available.

As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

03 December, 2021 08:30AM

hackergotchi for Evgeni Golov

Evgeni Golov

Dependency confusion in the Ansible Galaxy CLI

I hope you enjoyed my last post about Ansible Galaxy Namespaces. In there I noted that I originally looked for something completely different and the namespace takeover was rather accidental.

Well, originally I was looking at how the different Ansible content hosting services and their client (ansible-galaxy) behave in regard to clashes in naming of the hosted content.

"Ansible content hosting services"?! There are currently three main ways for users to obtain Ansible content:

  • Ansible Galaxy - the original, community oriented, free hosting platform
  • Automation Hub - the place for Red Hat certified and supported content, available only with a Red Hat subscription, hosted by Red Hat
  • Ansible Automation Platform - the on-premise version of Automation Hub, syncs content from there and allows customers to upload own content

Now the question I was curious about was: how would the tooling behave if different sources would offer identically named content?

This was inspired by Alex Birsan: Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies and zofrex: Bundler is Still Vulnerable to Dependency Confusion Attacks (CVE⁠-⁠2020⁠-⁠36327), who showed that the tooling for Python, Node.js and Ruby can be tricked into fetching content from "the wrong source", thus allowing an attacker to inject malicious code into a deployment.

For the rest of this article, it's not important that there are different implementations of the hosting services, only that users can configure and use multiple sources at the same time.

The problem is that, if the user configures their server_list to contain multiple Galaxy-compatible servers, like Ansible Galaxy and Automation Hub, and then asks to install a collection, the Ansible Galaxy CLI will ask every server in the list, until one returns a successful result. The exact order seems to differ between versions, but this doesn't really matter for the issue at hand.

Imagine someone wants to install the redhat.satellite collection from Automation Hub (using ansible-galaxy collection install redhat.satellite). Now if their configuration defines Galaxy as the first, and Automation Hub as the second server, Galaxy is always asked whether it has redhat.satellite and only if the answer is negative, Automation Hub is asked. Today there is no redhat namespace on Galaxy, but there is a redhat user on GitHub, so…

The canonical answer to this issue is to use a requirements.yml file and setting the source parameter. This parameter allows you to express "regardless which sources are configured, please fetch this collection from here". That's is nice, but I think this not being the default syntax (contrary to what e.g. Bundler does) is a bad approach. Users might overlook the security implications, as the shorter syntax without the source just "magically" works.

However, I think this is not even the main problem here. The documentation says: Once a collection is found, any of its requirements are only searched within the same Galaxy instance as the parent collection. The install process will not search for a collection requirement in a different Galaxy instance. But as it turns out, the source behavior was changed and now only applies to the exact collection it is set for, not for any dependencies this collection might have.

For the sake of the example, imagine two collections: evgeni.test1 and evgeni.test2, where test2 declares a dependency on test1 in its galaxy.yml. Actually, no need to imagine, both collections are available in version 1.0.0 from galaxy.ansible.com and test1 version 2.0.0 is available from galaxy-dev.ansible.com.

Now, given our recent reading of the docs, we craft the following requirements.yml:

collections:
- name: evgeni.test2
  version: '*'
  source: https://galaxy.ansible.com

In a perfect world, following the documentation, this would mean that both collections are fetched from galaxy.ansible.com, right? However, this is not what ansible-galaxy does. It will fetch evgeni.test2 from the specified source, determine it has a dependency on evgeni.test1 and fetch that from the "first" available source from the configuration.

Take for example the following ansible.cfg:

[galaxy]
server_list = test_galaxy, release_galaxy, test_galaxy

[galaxy_server.release_galaxy]
url=https://galaxy.ansible.com/

[galaxy_server.test_galaxy]
url=https://galaxy-dev.ansible.com/

And try to install collections, using the above requirements.yml:

% ansible-galaxy collection install -r requirements.yml -vvv                 
ansible-galaxy 2.9.27
  config file = /home/evgeni/Devel/ansible-wtf/collections/ansible.cfg
  configured module search path = ['/home/evgeni/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.10/site-packages/ansible
  executable location = /usr/bin/ansible-galaxy
  python version = 3.10.0 (default, Oct  4 2021, 00:00:00) [GCC 11.2.1 20210728 (Red Hat 11.2.1-1)]
Using /home/evgeni/Devel/ansible-wtf/collections/ansible.cfg as config file
Reading requirement file at '/home/evgeni/Devel/ansible-wtf/collections/requirements.yml'
Found installed collection theforeman.foreman:3.0.0 at '/home/evgeni/.ansible/collections/ansible_collections/theforeman/foreman'
Process install dependency map
Processing requirement collection 'evgeni.test2'
Collection 'evgeni.test2' obtained from server explicit_requirement_evgeni.test2 https://galaxy.ansible.com/api/
Opened /home/evgeni/.ansible/galaxy_token
Processing requirement collection 'evgeni.test1' - as dependency of evgeni.test2
Collection 'evgeni.test1' obtained from server test_galaxy https://galaxy-dev.ansible.com/api
Starting collection install process
Installing 'evgeni.test2:1.0.0' to '/home/evgeni/.ansible/collections/ansible_collections/evgeni/test2'
Downloading https://galaxy.ansible.com/download/evgeni-test2-1.0.0.tar.gz to /home/evgeni/.ansible/tmp/ansible-local-133/tmp9uqyjgki
Installing 'evgeni.test1:2.0.0' to '/home/evgeni/.ansible/collections/ansible_collections/evgeni/test1'
Downloading https://galaxy-dev.ansible.com/download/evgeni-test1-2.0.0.tar.gz to /home/evgeni/.ansible/tmp/ansible-local-133/tmp9uqyjgki

As you can see, evgeni.test1 is fetched from galaxy-dev.ansible.com, instead of galaxy.ansible.com. Now, if those servers instead would be Galaxy and Automation Hub, and somebody managed to snag the redhat namespace on Galaxy, I would be now getting the wrong stuff… Another problematic setup would be with Galaxy and on-prem Ansible Automation Platform, as you can have any namespace on the later and these most certainly can clash with namespaces on public Galaxy.

I have reported this behavior to Ansible Security on 2021-08-26, giving a 90 days disclosure deadline, which expired on 2021-11-24.

So far, the response was that this is working as designed, to allow cross-source dependencies (e.g. a private collection referring to one on Galaxy) and there is an issue to update the docs to match the code. If users want to explicitly pin sources, they are supposed to name all dependencies and their sources in requirements.yml. Alternatively they obviously can configure only one source in the configuration and always mirror all dependencies.

I am not happy with this and I think this is terrible UX, explicitly inviting people to make mistakes.

03 December, 2021 08:00AM by evgeni

December 02, 2021

hackergotchi for Jonathan McDowell

Jonathan McDowell

Building a desktop to improve my work/life balance

ASRock DeskMini X300

It’s been over 20 months since the first COVID lockdown kicked in here in Northern Ireland and I started working from home. Even when the strict lockdown was lifted the advice here has continued to be “If you can work from home you should work from home”. I’ve been into the office here and there (for new starts given you need to hand over a laptop and sort out some login details it’s generally easier to do so in person, and I’ve had a couple of whiteboard sessions that needed the high bandwidth face to face communication), but day to day is all from home.

Early on I commented that work had taken over my study. This has largely continued to be true. I set my work laptop on the stand on a Monday morning and it sits there until Friday evening, when it gets switched for the personal laptop. I have a lovely LG 34UM88 21:9 Ultrawide monitor, and my laptops are small and light so I much prefer to use them docked. Also my general working pattern is to have a lot of external connections up and running (build machine, test devices, log host) which means a suspend/resume cycle disrupts things. So I like to minimise moving things about.

I spent a little bit of time trying to find a dual laptop stand so I could have both machines setup and switch between them easily, but I didn’t find anything that didn’t seem to be geared up for DJs with a mixer + laptop combo taking up quite a bit of desk space rather than stacking laptops vertically. Eventually I realised that the right move was probably a desktop machine.

Now, I haven’t had a desktop machine since before I moved to the US, realising at the time that having everything on my laptop was much more convenient. I decided I didn’t want something too big and noisy. Cheap GPUs seem hard to get hold of these days - I’m not a gamer so all I need is something that can drive a ~ 4K monitor reliably enough. Looking around the AMD Ryzen 7 5700G seemed to be a decent CPU with one of the better integrated GPUs. I spent some time looking for a reasonable Mini-ITX case + motherboard and then I happened upon the ASRock DeskMini X300. This turns out to be perfect; I’ve no need for a PCIe slot or anything more than an m.2 SSD. I paired it with a Noctua NH-L9a-AM4 heatsink + fan (same as I use in the house server), 32GB DDR4 and a 1TB WD SN550 NVMe SSD. Total cost just under £650 inc VAT + delivery (and that’s a story for another post).

A desktop solves the problem of fitting both machines on the desk at once, but there’s still the question of smoothly switching between them. I read Evgeni Golov’s article on a simple KVM switch for €30. My monitor has multiple inputs, so that’s sorted. I did have a cheap USB2 switch (all I need for the keyboard/trackball) but it turned out to be pretty unreliable at the host detecting the USB change. I bought a UGREEN USB 3.0 Sharing Switch Box instead and it’s turned out to be pretty reliable. The problem is that the LG 32UM88 turns out to have a poor DDC implementation, so while I can flip the keyboard easily with the UGREEN box I also have to manually select the monitor input. Which is a bit annoying, but not terrible.

The important question is whether this has helped. I built all this at the end of October, so I’ve had a month to play with it. Turns out I should have done it at some point last year. At the end of the day instead of either sitting “at work” for a bit longer, or completely avoiding the study, I’m able to lock the work machine and flick to my personal setup. Even sitting in the same seat that “disconnect”, and the knowledge I won’t see work Slack messages or emails come in and feeling I should respond, really helps. It also means I have access to my personal setup during the week without incurring a hit at the start of the working day when I have to set things up again. So it’s much easier to just dip in to some personal tech stuff in the evening than it was previously. Also from the point of view I don’t need to setup the personal config, I can pick up where I left off. All of which is really nice.

It’s also got me thinking about other minor improvements I should make to my home working environment to try and improve things. One obvious thing now the winter is here again is to improve my lighting; I have a good overhead LED panel but it’s terribly positioned for video calls, being just behind me. So I think I’m looking some sort of strip light I can have behind the large monitor to give a decent degree of backlight (possibly bouncing off the white wall). Lots of cheap options I’m not convinced about, and I’ve had a few ridiculously priced options from photographer friends; suggestions welcome.

02 December, 2021 08:00PM

hackergotchi for Paul Tagliamonte

Paul Tagliamonte

Processing IQ data formats (Part 1/5) 🐀

🐀 This post is part of a series called "PACKRAT". If this is the first post you've found, it'd be worth reading the intro post first and then looking over all posts in the series.

When working with SDRs, information about the signals your radio is receiving are communicated by streams of IQ data. IQ is short for “In-phase” and “Quadrature”, which means 90 degrees out of phase. Values in the IQ stream are complex numbers, so converting them to a native complex type in your language helps greatly when processing the IQ data for meaning.

I won’t get too deep into what IQ is or why complex numbers (mostly since I don’t think I fully understand it well enough to explain it yet), but here’s some basics in case this is your first interaction with IQ data before going off and reading more.

Before we get started — at any point, if you feel lost in this post, it's OK to take a break to do a bit of learning elsewhere in the internet. I'm still new to this, so I'm sure my overview in one paragraph here won't help clarify things too much. This took me months to sort out on my own. It's not you, really! I particularly enjoyed reading visual-dsp.switchb.org when it came to learning about how IQ represents signals, and Software-Defined Radio for Engineers for a more general reference.

Each value in the stream is taken at a precisely spaced sampling interval (called the sampling rate of the radio). Jitter in that sampling interval, or a drift in the requested and actual sampling rate (usually represented in PPM, or parts per million – how many samples out of one million are missing) can cause errors in frequency. In the case of a PPM error, one radio may think it’s 100.1MHz and the other may think it’s 100.2MHz, and jitter will result in added noise in the resulting stream.

A single IQ sample is both the real and imaginary values, together. The complex number (both parts) is the sample. The number of samples per second is the number of real and imaginary value pairs per second.

Each sample is reading the electrical energy coming off the antenna at that exact time instant. We’re looking to see how that goes up and down over time to determine what frequencies we’re observing around us. If the IQ stream is only real-valued measures (e.g., float values rather than complex values reading voltage from a wire), you can still send and receive signals, but those signals will be mirrored across your 0Hz boundary. That means if you’re tuned to 100MHz, and you have a nearby transmitter at 99.9MHz, you’d see it at 100.1MHz. If you want to get an intuitive understanding of this concept before getting into the heavy math, a good place to start is looking at how Quadrature encoders work. Using complex numbers means we can see “up” in frequency as well as “down” in frequency, and understand that those are different signals.

The reason why we need negative frequencies is that our 0Hz is the center of our SDR’s tuned frequency, not actually at 0Hz in nature. Generally speaking, it’s doing loads in hardware (and firmware!) to mix the raw RF signals with a local oscillator to a frequency that can be sampled at the requested rate (fundamentally the same concept as a superheterodyne receiver), so a frequency of ‘-10MHz’ means that signal is 10 MHz below the center of our SDR’s tuned frequency.

The sampling rate dictates the amount of frequency representable in the data stream. You’ll sometimes see this called the Nyquist frequency. The Nyquist Frequency is one half of the sampling rate. Intuitively, if you think about the amount of bandwidth observable as being 1:1 with the sampling rate of the stream, and the middle of your bandwidth is 0 Hz, you would only have enough space to go up in frequency for half of your bandwidth – or half of your sampling rate. Same for going down in frequency.

Float 32 / Complex 64

IQ samples that are being processed by software are commonly processed as an interleaved pair of 32 bit floating point numbers, or a 64 bit complex number. The first float32 is the real value, and the second is the imaginary value.

I#0
Q#0
I#1
Q#1
I#2
Q#2

The complex number 1+1i is represented as 1.0 1.0 and the complex number -1-1i is represented as -1.0 -1.0. Unless otherwise specified, all the IQ samples and pseudocode to follow assumes interleaved float32 IQ data streams.

Example interleaved float32 file (10Hz Wave at 1024 Samples per Second)

RTL-SDR

IQ samples from the RTL-SDR are encoded as a stream of interleaved unsigned 8 bit integers (uint8 or u8). The first sample is the real (in-phase or I) value, and the second is the imaginary (quadrature or Q) value. Together each pair of values makes up a complex number at a specific time instant.

I#0
Q#0
I#1
Q#1
I#2
Q#2

The complex number 1+1i is represented as 0xFF 0xFF and the complex number -1-1i is represented as 0x00 0x00. The complex number 0+0i is not easily representable – since half of 0xFF is 127.5.

Complex Number Representation
1+1i []uint8{0xFF, 0xFF}
-1+1i []uint8{0x00, 0xFF}
-1-1i []uint8{0x00, 0x00}
0+0i []uint8{0x80, 0x80} or []uint8{0x7F, 0x7F}

And finally, here’s some pseudocode to convert an rtl-sdr style IQ sample to a floating point complex number:

...
in = []uint8{0x7F, 0x7F}
real = (float(iq[0])-127.5)/127.5
imag = (float(iq[1])-127.5)/127.5
out = complex(real, imag)
....

Example interleaved uint8 file (10Hz Wave at 1024 Samples per Second)

HackRF

IQ samples from the HackRF are encoded as a stream of interleaved signed 8 bit integers (int8 or i8). The first sample is the real (in-phase or I) value, and the second is the imaginary (quadrature or Q) value. Together each pair of values makes up a complex number at a specific time instant.

I#0
Q#0
I#1
Q#1
I#2
Q#2

Formats that use signed integers do have one quirk due to two’s complement, which is that the smallest negative number representable’s absolute value is one more than the largest positive number. int8 values can range between -128 to 127, which means there’s bit of ambiguity in how +1, 0 and -1 are represented. Either you can create perfectly symmetric ranges of values between +1 and -1, but 0 is not representable, have more possible values in the negative range, or allow values above (or just below) the maximum in the range to be allowed.

Within my implementation, my approach has been to scale based on the max integer value of the type, so the lowest possible signed value is actually slightly smaller than -1. Generally, if your code is seeing values that low the difference in step between -1 and slightly less than -1 isn’t very significant, even with only 8 bits. Just a curiosity to be aware of.

Complex Number Representation
1+1i []int8{127, 127}
-1+1i []int8{-128, 127}
-1-1i []int8{-128, -128}
0+0i []int8{0, 0}

And finally, here’s some pseudocode to convert a hackrf style IQ sample to a floating point complex number:

...
in = []int8{-5, 112}
real = (float(in[0]))/127
imag = (float(in[1]))/127
out = complex(real, imag)
....

Example interleaved int8 file (10Hz Wave at 1024 Samples per Second)

PlutoSDR

IQ samples from the PlutoSDR are encoded as a stream of interleaved signed 16 bit integers (int16 or i16). The first sample is the real (in-phase or I) value, and the second is the imaginary (quadrature or Q) value. Together each pair of values makes up a complex number at a specific time instant.

Almost no SDRs capture at a 16 bit depth natively, often you’ll see 12 bit integers (as is the case with the PlutoSDR) being sent around as 16 bit integers. This leads to the next possible question, which is are values LSB or MSB aligned? The PlutoSDR sends data LSB aligned (which is to say, the largest real or imaginary value in the stream will not exceed 4095), but expects data being transmitted to be MSB aligned (which is to say the lowest set bit possible is the 5th bit in the number, or values can only be set in increments of 16).

As a result, the quirk observed with the HackRF (that the range of values between 0 and -1 is different than the range of values between 0 and +1) does not impact us so long as we do not use the whole 16 bit range.

Complex Number Representation
1+1i []int16{32767, 32767}
-1+1i []int16{-32768, 32767}
-1-1i []int16{-32768, -32768}
0+0i []int16{0, 0}

And finally, here’s some pseudocode to convert a PlutoSDR style IQ sample to a floating point complex number, including moving the sample from LSB to MSB aligned:

...
in = []int16{-15072, 496}
// shift left 4 bits (16 bits - 12 bits = 4 bits)
 // to move from LSB aligned to MSB aligned.
 in[0] = in[0] << 4
in[1] = in[1] << 4
real = (float(in[0]))/32767
imag = (float(in[1]))/32767
out = complex(real, imag)
....

Example interleaved i16 file (10Hz Wave at 1024 Samples per Second)

Next Steps

Now that we can read (and write!) IQ data, we can get started first on the transmitter, which we can (in turn) use to test receiving our own BPSK signal, coming next in Part 2!

02 December, 2021 05:00PM

hackergotchi for Steve Kemp

Steve Kemp

It has been some time..

I realize it has been quite some time since I last made a blog-post, so I guess the short version is "I'm still alive", or as Granny Weatherwax would have said:

I ATE'NT DEAD

Of course if I die now this would be an awkward post!

I can't think of anything terribly interesting I've been doing recently, mostly being settled in my new flat and tinkering away with things. The latest "new" code was something for controlling mpd via a web-browser:

This is a simple HTTP server which allows you to minimally control mpd running on localhost:6600. (By minimally I mean literally "stop", "play", "next track", and "previous track").

I have all my music stored on my desktop, I use mpd to play it locally through a pair of speakers plugged into that computer. Sometimes I want music in the sauna, or in the bedroom. So I have a couple of bluetooth speakers which are used to send the output to another room. When I want to skip tracks I just open the mpd-web site on my phone and tap the button. (I did look at android mpd-clients, but at the same time it seemed like installing an application for this was a bit overkill).

I guess I've not been doing so much "computer stuff" outside work for a year or so. I guess lack of time, lack of enthusiasm/motivation.

So looking forward to things? I'll be in the UK for a while over Christmas, barring surprises. That should be nice as I'll get to see family, take our child to visit his grandparents (on his birthday no less) and enjoy playing the "How many Finnish people can I spot in the UK?" game

02 December, 2021 03:00PM

December 01, 2021

hackergotchi for Junichi Uekawa

Junichi Uekawa

December.

December. The world is turbulent and I am still worried where we are going.

01 December, 2021 11:56PM by Junichi Uekawa

Thorsten Alteholz

My Debian Activities in November 2021

FTP master

This month I accepted 564 and rejected 93 packages. The overall number of packages that got accepted was 591.

Debian LTS

This was my eighty-ninth month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian.

This month my all in all workload has been 40h. During that time I did LTS and normal security uploads of:

  • [DLA 2820-1] atftp security update for two CVEs
  • [DLA 2821-1] axis security update for one CVE
  • [DLA 2822-1] netkit-rsh security update for two CVEs
  • [DLA 2825-1] libmodbus security update for two CVEs
  • [#1000408] for libmodbus in Buster
  • [#1000485] for btrbk in Bullseye
  • [#1000486] for btrbk in Buster

I also started to work on pgbouncer to get an update for each release and had to process packages from NEW on security-master.

Further I worked on a script to automatically publish DLAs on the Debian website, that are posted to debian-lts-announce. The script can be found on salsa. It only publishes stuff from people on a whitelist. At the moment it is running on a computer at home. You might run your own copy, or just send me an email to be put on the whitelist as well.

Last but not least I did some days of frontdesk duties.

Debian ELTS

This month was the forty-first ELTS month.

During my allocated time I uploaded:

  • ELA-517-1 for atftp
  • ELA-519-1 for qtbase-opensource-src
  • ELA-520-1 for libsdl1.2
  • ELA-521-1 for libmodbus

Last but not least I did some days of frontdesk duties.

Debian Printing

Unfortunately I did not do as much as I wanted this month. At least I looked at some old bugs and uploaded new upstream versions of …

I hope this will improve in December again. New versions of cups and hplip are on my TODO-list.

Debian Astro

This month I uploaded new versions of …

Other stuff

I improved packaging or fixed bugs of:

01 December, 2021 03:33PM by alteholz

Russ Allbery

Review: A World Without Email

Review: A World Without Email, by Cal Newport

Publisher: Portfolio/Penguin
Copyright: 2021
ISBN: 0-525-53657-4
Format: Kindle
Pages: 264

A World Without Email is the latest book by computer science professor and productivity writer Cal Newport. After a detour to comment on the drawbacks of social media in Digital Minimalism, Newport is back to writing about focus and concentration in the vein of Deep Work. This time, though, the topic is workplace structure and collaborative process rather than personal decisions.

This book is a bit hard for me to review because I spoiled myself for the contents by listening to a lot of Newport's podcast, where he covers the same material. I therefore didn't enjoy it as much as I otherwise would have because the ideas were familiar. I recommend the book over the podcast, though; it's tighter, more coherent, and more comprehensive.

The core contention of this book is that knowledge work (roughly, jobs where one spends significant time working on a computer processing information) has stumbled into a superficially tempting but inefficient and psychologically harmful structure that Newport calls the hyperactive hive mind. This way of organizing work is a local maxima: it feels productive, it's flexible and very easy to deploy, and most minor changes away from it make overall productivity worse. However, the incentive structure is all wrong. It prioritizes quick responses and coordination overhead over deep thinking and difficult accomplishments.

The characteristic property of the hyperactive hive mind is free-flowing, unstructured communication between co-workers. If you need something from someone else, you ask them for it and they send it to you. The "email" in the title is not intended literally; Slack and related instant messaging apps are even more deeply entrenched in the hyperactive hive mind than email is. The key property of this workflow is that most collaborative work is done by contacting other people directly via ad hoc, unstructured messages.

Newport's argument is that this workflow has multiple serious problems, not the least of which is that it makes us miserable. If you have read his previous work, you will correctly expect this to tie into his concept of deep work. Ad hoc, unstructured communication creates a constant barrage of unimportant small tasks and interrupts, most of which require several asynchronous exchanges before your brain can stop tracking the task. This creates constant context-shifting, loss of focus and competence, and background stress from ever-growing email inboxes, unread message notifications, and the semi-frantic feeling that you're forgetting something you need to do.

This is not an original observation, of course. Many authors have suggested individual ways to improve this workflow: rules about how often to check one's email, filtering approaches, task managers, and other personal systems. Newport's argument is that none of these individual approaches can address the problem due to social effects. It's all well and good to say that you should unplug from distractions and ignore requests while you concentrate, but everyone else's workflow assumes that their co-workers are responsive to ad hoc requests. Ignoring this social contract makes the job of everyone still stuck in the hyperactive hive mind harder. They won't appreciate that, and your brain will not be able to relax knowing that you're not meeting your colleagues' expectations.

In Newport's analysis, the necessary solution is a comprehensive redesign of how we do knowledge work, akin to the redesign of factory work that came with the assembly line. It's a collective problem that requires a collective solution. In other industries, organizing work for efficiency and quality is central to the job of management, but in knowledge work (for good historical reasons) employees are mostly left to organize their work on their own. That self-organization has produced a system that doesn't require centralized coordination or decisions and provides a lot of superficial flexibility, but which may be significantly inferior to a system designed for how people think and work.

Even if you find this convincing (and I think Newport makes a good case), there are reasons to be suspicious of corporations trying to make people more productive. The assembly line made manufacturing much more efficient, but it also increased the misery of workers so much that Henry Ford had to offer substantial raises to retain workers. As one of Newport's knowledge workers, I'm not enthused about that happening to my job.

Newport recognizes this and tries to address it by drawing a distinction between the workflow (how information moves between workers) and the work itself (how individual workers solve problems in their area of expertise). He argues that companies need to redesign the former, but should leave the latter to each worker. It's a nice idea, and it will probably work in industries like tech with substantial labor bargaining power. I'm more cynical about other industries.

The second half of the book is Newport's specific principles and recommendations for designing better workflows that don't rely on unstructured email. Some of this will be familiar (and underwhelming) to anyone who works in tech; Newport recommends ticket systems and thinks agile, scrum, and kanban are pointed in the right direction. But there are some other good ideas in here, such as embracing specialization.

Newport argues (with some evidence) that the drastic reduction in secretarial jobs, on the grounds that workers with computers can do the same work themselves, was a mistake. Even with new automation, this approach increased the range of tasks required in every other job. Not only was this a drain on the time of other workers, it caused more context switching, which made everyone less efficient and undermined work quality. He argues for reversing that trend: where the work cannot be automated, hire more support workers and more specialized workers in general, stop expecting everyone to be their own generalist admin, and empower support workers to create better systems rather than using the hyperactive hive mind model to answer requests.

There's more here, ranging from specifics of how to develop a structured process for a type of work to the importance of enabling sustained concentration on a task. It's a less immediately actionable book than Newport's previous writing, but I welcome the partial shift in focus to more systemic issues. Newport continues to be relentlessly apolitical, but here it feels less like he's eliding important analysis and more like he thinks the interests of workers and good employers are both served by the approach he's advocating.

I will warn that Newport leans heavily on evolutionary psychology in his argument that the hyperactive hive mind is bad for us. I think he has some good arguments about the anxiety that comes with not responding to requests from others, but I'm not sure intrusive experiments on spectacularly-unusual remnant hunter-gatherer groups, who are treated like experimental animals, are the best way of making that case. I realize this isn't Newport's research, but I think he could have made his point with more directly relevant experiments.

He also continues his obsession with the superiority of in-person conversation over written communication, and while he has a few good arguments, he has a tendency to turn them into sweeping generalizations that are directly contradicted by, well, my entire life. It would be nice if he were more willing to acknowledge that it's possible to express deep emotional nuance and complex social signaling in writing; it simply requires a level of practice and familiarity (and shared vocabulary) that's often missing from the workplace.

I was muttering a lot near the start of this book, but thankfully those sections are short, and I think the rest of his argument sits on a stronger foundation.

I hope Newport continues moving in the direction of more systemic analysis. If you enjoyed Deep Work, you will probably find A World Without Email interesting. If you're new to Newport, this is not a bad place to start, particularly if you have influence on how communication is organized in your workplace. Those who work in tech will find some bits of this less interesting, but Newport approaches the topic from a different angle than most agile books and covers a broader range if ideas.

Recommended if you like reading this sort of thing.

Rating: 7 out of 10

01 December, 2021 05:07AM

Paul Wise

FLOSS Activities November 2021

Focus

This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Administration

  • Debian BTS: unarchive/reopen/triage bugs for reintroduced packages
  • Debian wiki: unblock IP addresses, approve accounts

Communication

  • Respond to queries from Debian users and contributors on the mailing lists and IRC

Sponsors

The SPTAG, visdom, gensim, purple-discord, plac, fail2ban, uvloop work was sponsored by my employer. All other work was done on a volunteer basis.

01 December, 2021 02:52AM

November 30, 2021

Russell Coker

hackergotchi for Steinar H. Gunderson

Steinar H. Gunderson

Commitcoin

How do you get a git commit with an interesting commit ID (or “SHA”)? Of course, interesting is in the eye of the beholder, but let's define it as having many repeated hex nibbles, e.g. “000” in the commit would be somewhat interesting and “8888888888888888888888888” would be very interesting. This is pretty similar to the dreaded cryptocoin mining; we have no simple way of forcing a given SHA-1 hash unless someone manages a complete second-preimage break, so we must brute-force. (And hopefully without boiling the planet in the process; we'd have to settle for a bit shorter runs than in the example above.)

Git commit IDs are SHA-1 checksums of what they contain; the tree object (“what does the commit contain”), the parents, the commit message and some dates. Of those, let's use the author date as the nonce (I chose to keep the committer date truthful, so as to not be accused of forging history too much). We can set up a shell script to commit with --amend, sweeping GIT_AUTHOR_DATE over the course of a day or so and having EDITOR=true in order not to have to close the editor all the time.

It turns out this is pretty slow (unsurprisingly!). So we discover that actually launching the “editor” takes a long time, and --no-edit is much faster. We can also move to a tmpfs in order not to be block on fsync and block allocation (eatmydata would also work, but doesn't fix the filesystem overhead). At this point, we're at roughly 50 commits/sec or so. So we can sweep through the entire day of author dates, and if nothing interesting comes up, we can just try again (as we also get a new committer date, we've essentially reset our random generator).

But we can do much better than this. A commit in git is many different things; load the index, see if we need to add something, then actually make the commit object and finally update HEAD and whatever branch we might be on. Of those, we only really need to make the commit object and see what it hash ended up with! So we change our script to use git commit-tree instead, and whoa, we're up to 300 commits/sec.

Now we're bottlenecked at the time it takes to fork and launch the git binary—so we can hack the git sources and move the date sweep into builtin/commit-tree.c. This is radically faster; about 100 times as fast! Now what takes time is compressing and creating the commit object.

But OK, my 5950X has 16 cores, right, so we can just split the range in 16 and have different cores test different ranges? Wrong! Because now, the entire sweep takes less than a second, so we no longer get the different committer date and the cores are testing the same SHA over and over. (In effect, our nonce space is too small.) We cheat a bit and add extra whitespace to the end of the commit message to get a larger parameter space; the core ID determines how many spaces.

At this point, you can make commits so fast that the problem essentially becomes that you run out of space, and need to run git prune every few seconds. So the obvious next step would be to not compress and write out the commits at all… and then, I suppose, optimize the routines to not call any git stuff anymore, and then have GPUs do the testing, and of course, finally we'll have Gitcoin ASICs, and every hope of reaching the 1.5 degree goal is lost…

Did I say Gitcoin? No, unfortunately that name was already taken. So I'll call it Commitcoin. And I'm satisifed with a commit containing dddddddd, even though it's of course possible to do much better—hardness is only approximately 2^26 commits to get a commit as interesting as that.

(Cryptobros, please stay out of my inbox. I'm not interested.)

30 November, 2021 11:00AM

Russell Coker

Your Device Has Been Improved

I’ve just started a Samsung tablet downloading a 770MB update, the description says:

  • Overall stability of your device has been improved
  • The security of your device has been improved

Technically I have no doubt that both those claims are true and accurate. But according to common understanding of the English language I think they are both misleading.

By “stability improved” they mean “fixed some bugs that made it unstable” and no technical person would imagine that after a certain number of such updates the number of bugs will ever reach zero and the tablet will be perfectly reliable. In fact if you should consider yourself lucky if they fix more bugs than they add. It’s not THAT uncommon for phones and tablets to be bricked (rendered unusable by software) by an update. In the past I got a Huawei Mate9 as a warranty replacement for a Nexus 6P because an update caused so many Nexus 6P phones to fail that they couldn’t be replaced with an identical phone [1].

By “security improved” they usually mean “fixed some security flaws that were recently discovered to make it almost as secure as it was designed to be”. Note that I deliberately say “almost as secure” because it’s sometimes impossible to fix a security flaw without making significant changes to interfaces which requires more work than desired for an old product and also gives a higher probability of things going wrong. So it’s sometimes better to aim for almost as secure or alternatively just as secure but with some features disabled.

Device manufacturers (and most companies in the Android space make the same claims while having the exact same bugs to deal with, Samsung is no different from the others in this regards) are not making devices more secure or more reliable than when they were initially released. They are aiming to make them almost as secure and reliable as when they were released. They don’t have much incentive to try too hard in this regard, Samsung won’t suffer if I decide my old tablet isn’t reliable enough and buy a new one, which will almost certainly be from Samsung because they make nice tablets.

As a thought experiment, consider if car repairers did the same thing. “Getting us to service your car will improve fuel efficiency”, great how much more efficient will it be than when I purchased it?

As another thought experiment, consider if car companies stopped providing parts for car repair a few years after releasing a new model. This is effectively what phone and tablet manufacturers have been doing all along, software updates for “stability and security” are to devices what changing oil etc is for cars.

30 November, 2021 09:41AM by etbe

November 29, 2021

hackergotchi for Evgeni Golov

Evgeni Golov

Getting access to somebody else's Ansible Galaxy namespace

TL;DR: adding features after the fact is hard, normalizing names is hard, it's patched, carry on.

I promise, the longer version is more interesting and fun to read!

Recently, I was poking around Ansible Galaxy and almost accidentally got access to someone else's namespace. I was actually looking for something completely different, but accidental finds are the best ones!

If you're asking yourself: "what the heck is he talking about?!", let's slow down for a moment:

  • Ansible is a great automation engine built around the concept of modules that do things (mostly written in Python) and playbooks (mostly written in YAML) that tell which things to do
  • Ansible Galaxy is a place where people can share their playbooks and modules for others to reuse
  • Galaxy Namespaces are a way to allow users to distinguish who published what and reduce name clashes to a minimum

That means that if I ever want to share how to automate installing vim, I can publish evgeni.vim on Galaxy and other people can download that and use it. And if my evil twin wants their vim recipe published, it will end up being called evilme.vim. Thus while both recipes are called vim they can coexist, can be downloaded to the same machine, and used independently.

How do you get a namespace? It's automatically created for you when you login for the first time. After that you can manage it, you can upload content, allow others to upload content and other things. You can also request additional namespaces, this is useful if you want one for an Organization or similar entities, which don't have a login for Galaxy.

Apropos login, Galaxy uses GitHub for authentication, so you don't have to store yet another password, just smash that octocat!

Did anyone actually click on those links above? If you did (you didn't, right?), you might have noticed another section in that document: Namespace Limitations. That says:

Namespace names in Galaxy are limited to lowercase word characters (i.e., a-z, 0-9) and ‘_’, must have a minimum length of 2 characters, and cannot start with an ‘_’. No other characters are allowed, including ‘.’, ‘-‘, and space. The first time you log into Galaxy, the server will create a Namespace for you, if one does not already exist, by converting your username to lowercase, and replacing any ‘-‘ characters with ‘_’.

For my login evgeni this is pretty boring, as the generated namespace is also evgeni. But for the GitHub user Evil-Pwnwil-666 it will become evil_pwnwil_666. This can be a bit confusing.

Another confusing thing is that Galaxy supports two types of content: roles and collections, but namespaces are only for collections! So it is Evil-Pwnwil-666.vim if it's a role, but evil_pwnwil_666.vim if it's a collection.

I think part of this split is because collections were added much later and have a much more well thought design of both the artifact itself and its delivery mechanisms.

This is by the way very important for us! Due to the fact that collections (and namespaces!) were added later, there must be code that ensures that users who were created before also get a namespace.

Galaxy does this (and I would have done it the same way) by hooking into the login process, and after the user is logged in it checks if a Namespace exists and if not it creates one and sets proper permissions.

And this is also exactly where the issue was!

The old code looked like this:

    # Create lowercase namespace if case insensitive search does not find match
    qs = models.Namespace.objects.filter(
        name__iexact=sanitized_username).order_by('name')
    if qs.exists():
        namespace = qs[0]
    else:
        namespace = models.Namespace.objects.create(**ns_defaults)

    namespace.owners.add(user)

See how namespace.owners.add is always called? Even if the namespace already existed? Yepp!

But how can we exploit that? Any user either already has a namespace (and owns it) or doesn't have one that could be owned. And given users are tied to GitHub accounts, there is no way to confuse Galaxy here. Now, remember how I said one could request additional namespaces, for organizations and stuff? Those will have owners, but the namespace name might not correspond to an existing user!

So all we need is to find an existing Galaxy namespace that is not a "default" namespace (aka a specially requested one) and get a GitHub account that (after the funny name conversion) matches the namespace name.

Thankfully Galaxy has an API, so I could dump all existing namespaces and their owners. Next I filtered that list to have only namespaces where the owner list doesn't contain a username that would (after conversion) match the namespace name. I found a few. And for one of them (let's call it the_target), the corresponding GitHub username (the-target) was available! Jackpot!

I've registered a new GitHub account with that name, logged in to Galaxy and had access to the previously found namespace.

This felt like sufficient proof that my attack worked and I mailed my findings to the Ansible Security team. The issue was fixed in d4f84d3400f887a26a9032687a06dd263029bde3 by moving the namespace.owners.add call to the "new namespace" branch.

And this concludes the story of how I accidentally got access to someone else's Galaxy namespace (which was revoked after the report, no worries).

29 November, 2021 08:00AM by evgeni

Dima Kogan

GL_image_display

I just spent an unspeakable number of days typing to produce something that sounds very un-impressive: an FLTK widget that can display an image. The docs and code live here. The big difference from the usual image-drawing widget is that this one uses OpenGL internally, so after the initial image load, the common operations (drawing, redrawing, panning and zooming) are very fast. I have high-resolution images in my projects, and this will make my tools much nicer.

Three separate interfaces are available:

  • C: core library and GLUT application
  • C++: FLTK widget
  • Python: FLTK widget for pyfltk

The FLTK widgets have built-in interactive panning/zooming, and the library can draw line overlays. So nice applications can be built quickly.

I already added some early disabled-by-default support into the mrcal-stereo tool to visualize the rectification and report sensitivities:

widget.png

Nice!

29 November, 2021 07:53AM by Dima Kogan

Russ Allbery

Fall haul

It's been a while since I've posted one of these, and I also may have had a few moments of deciding to support authors by buying their books even if I'm not going to get a chance to read them soon. There's also a bit of work reading in here.

Ryka Aoki — Light from Uncommon Stars (sff)
Frederick R. Chromey — To Measure the Sky (non-fiction)
Neil Gaiman, et al. — Sandman: Overture (graphic novel)
Alix E. Harrow — A Spindle Splintered (sff)
Jordan Ifueko — Raybearer (sff)
Jordan Ifueko — Redemptor (sff)
T. Kingfisher — Paladin's Hope (sff)
TJ Klune — Under the Whispering Door (sff)
Kiese Laymon — How to Slowly Kill Yourself and Others in America (non-fiction)
Yuna Lee — Fox You (romance)
Tim Mak — Misfire (non-fiction)
Naomi Novik — The Last Graduate (sff)
Shelley Parker-Chan — She Who Became the Sun (sff)
Gareth L. Powell — Embers of War (sff)
Justin Richer & Antonio Sanso — OAuth 2 in Action (non-fiction)
Dean Spade — Mutual Aid (non-fiction)
Lana Swartz — New Money (non-fiction)
Adam Tooze — Shutdown (non-fiction)
Bill Watterson — The Essential Calvin and Hobbes (strip collection)
Bill Willingham, et al. — Fables: Storybook Love (graphic novel)
David Wong — Real-World Cryptography (non-fiction)
Neon Yang — The Black Tides of Heaven (sff)
Neon Yang — The Red Threads of Fortune (sff)
Neon Yang — The Descent of Monsters (sff)
Neon Yang — The Ascent to Godhood (sff)
Xiran Jay Zhao — Iron Widow (sff)

29 November, 2021 03:45AM

November 28, 2021

hackergotchi for Wouter Verhelst

Wouter Verhelst

GR procedures and timelines

A vote has been proposed in Debian to change the formal procedure in Debian by which General Resolutions (our name for "votes") are proposed. The original proposal is based on a text by Russ Allberry, which changes a number of rules to be less ambiguous and, frankly, less weird.

One thing Russ' proposal does, however, which I am absolutely not in agreement with, is to add a absolutly hard time limit after three weeks. That is, in the proposed procedure, the discussion time will be two weeks initially (unless the Debian Project Leader chooses to reduce it, which they can do by up to one week), and it will be extended if more options are added to the ballot; but after three weeks, no matter where the discussion stands, the discussion period ends and Russ' proposed procedure forces us to go to a vote, unless all proposers of ballot options agree to withdraw their option.

I believe this is a big mistake. I think any procedure we come up with should allow for the possibility that we may end up with a situation where everyone agrees that extending the discussion time a short time is a good idea, without necessarily resetting the whole discussion time to another two weeks (modulo a decision by the DPL).

At the same time, any procedure we come up with should try to avoid the possibility of process abuse by people who would rather delay a vote ad infinitum than to see it voted upon. A hard time limit certainly does that; but I believe it causes more problems than it solves.

I think insted that it is necessary for any procedure to allow for the discussion time to be extended as long as a strong enough consensus exists that this would be beneficial.

As such, I have proposed an amendment to Russ' proposal (a full version of my proposed constitution can be seen on salsa) that hopefully solves these issues in a novel way: it allows anyone to request an extension to the discussion time, which then needs to be sponsored according to the same rules as a new ballot option. If the time extension is successfully created, those who supported the extension can then also no longer propose any new ones. Additionally, after 4 weeks, the proposed procedure allows anyone to object, so that 4 weeks is probably the practical limit -- although the possibility exists if enough support exists to extend the discussion time (or not enough to end it). The full rules involve slightly more than that (I don't like to put too much formal language in a blog post), but they're not too complicated, I think.

That proposal has received a number of seconds, but after a week it hasn't yet reached the constitutional requirement for the option to be on the ballot.

So, I guess this is a public request for more support to my proposal. If you're a Debian Developer and you agree with me that my proposed procedure is better than the alternative, please step forward and let yourself be heard.

Thanks!

28 November, 2021 07:04PM

hackergotchi for Joachim Breitner

Joachim Breitner

Zero-downtime upgrades of Internet Computer canisters

TL;DR: Zero-downtime upgrades are possible if you stick to the basic actor model.

Background

DFINITY’s Internet Computer provides a kind of serverless compute platform, where the services are WebAssemmbly programs called “canisters”. These services run without stopping (or at least that’s what it feels like from the service’s perspective; this is called “orthogonal persistence”), and process one message after another. Messages not only come from the outside (“ingress” calls), but are also exchanged between canisters.

On top of these uni-directional messages, the system provides the concept of “inter-canister calls”, which associates a respondse message with the outgoing message, and guarantees that a response will come. This RPC-like interface allows canister developers to program in the popular async/await model, where these inter-canister calls look almost like normal function calls, and the subsequent code is suspended until the response comes back.

The problem

This is all very well, until you try to upgrade your canister, i.e. install new code to fix a bug or add a feature. Because if you used the await pattern, there may still be suspended computations waiting for the response. If you swap out the program now, the code of that suspended computation will no longer be present, and the response cannot be handled! Worse, because of an infelicity with the current system’s API, when the response comes back, it may actually corrupt your service’s state.

That is why upgrading a canister requires stopping it first, which means waiting for all outstanding calls to come back. During this time, your canister is not available for new calls (so there is downtime), and worse, the length of the downtime is at the whims of the canisters you called – they could withhold the response ad infinitum, rendering your canister unupgradeable.

Clearly, this is not acceptable for any serious application. In this post, I’ll explore some of the ways to mitigate this problem, and how to create canisters that are safely instantanously (no downtime) upgradeable.

It’s a spectrum

Some canisters are trivially upgradeable, for others all hope is lost; it depends on what the canister does and how. As an overview, here is the spectrum:

  1. A canister that never performs inter-canister calls can always be upgraded without stopping.
  2. A canister that only does one-way calls, and does them in a particular way (see below), can always be upgraded without stopping.
  3. A canister that performs calls, and where it is acceptable to simply drop outstanding repsonses, can always be upgraded without stopping, once the System API has been improved and your Canister Development Kit (CDK; Motoko or Rust) has adapted.
  4. A canister that performs calls, but uses explicit continuations to handle, responses instead of the await-convenience, based on an eventually fixed System API, can be upgradeded without stopping, and will even handle responses afterwards.
  5. A canister that uses await to do inter-canister call cannot be upgraded without stopping.

In this post I will explain 2, which is possible now, in more detail. Variant 3 and 4 only become reality if and when the System API has improved.

One-way calls

A one-way call is a call where you don’t care about the response; neither the replied data, nor possible failure conditions.

Since you don’t care about the response, you can pass an invalid continuation to the system (technical detail: a Wasm table index of -1). Because it is invalid for any (realistic) Wasm module, it will stay invalid even after an upgrade, and the problem of silent corruption mentioned above is avoided. And otherwise it’s fine for this to be invalid: it means the canister “traps” once the response comes back, which is harmeless (and possibly even cheaper than a do-nothing computation).

This requires your CDK to support this kind of call. Mostly incidential, Motoko (and Candid) actually have the concept of one-way call in their type system, namely shared functions with return type () instead of async ... (Motoko is actually older than the system, and not every prediction about what the system will provide has proven successful). So, pending this PR to be released, Motoko will implement one-way calls in this way. On Rust, you have to use the System API directly or wait for cdk-rs to provide this ability (patches welcome, happy to advise).

You might wonder: How are calls useful if I don’t get to look at the response? Of course, this is a set-back – calls with responses are useful, and await is convenient. And if you have to integrate with an existing service that only provides normal calls, you are out of luck.

But if you get to design the canister and all called canisters together, it may be possible to use only one-way messages. You’d be programming in the plain actor model now, with all its advantages (simple concurrency, easy to upgrade, general robustness).

Consider for example a token ledger canister, not unlike the ICP ledger canister. For the most part, it doesn’t have to do any outgoing calls (and thus be trivially upgradeble). But say we need to add notify functionality, where the ledger canister tells other canisters about a transaction. This is a good example for a one-way call: Maybe the ledger canister doesn’t care if that notification was received? The ICP leder does care (once it comes back successful, this particular notification cannot be sent again), but maybe your ledger can do it differently: let the other canister confirm the receip via another one-way call, instead of via the reply; or simply charge for each notification and do not worry about repeated notifications.

Maybe you want to add archiving functionality, where the ledger canister streams its data to an archive canister. There, again, instead of using successful responses to confirm receipt, the archive canister can ping the ledger canister with the latest received index directly.

Yes, it changes the programming model a bit, and all involved parties have to play together, but the gain (zero-downtime upgrades) is quite valuable, and removes a fair number of other sources of issues.

And in the future?

The above is possible with today’s Internet Computer. If the System API gets improves the way I hope it will be, you have a possible middle ground: You still don’t get to use await and instead have to write your response handler as separate functions, but this way you can call any canister again, and you get the system’s assistance in mapping responses to calls. With this in place, any canister can be rewritten to a form that supports zero-downtime upgrades, without affecting its interface or what the canister can do.

28 November, 2021 05:11PM by Joachim Breitner ([email protected])

Russ Allbery

Review: Soul Music

Review: Soul Music, by Terry Pratchett

Series: Discworld #16
Publisher: Harper
Copyright: January 1995
Printing: November 2013
ISBN: 0-06-223741-1
Format: Mass market
Pages: 420

Soul Music is the sixteenth Discworld novel and something of a plot sequel to Reaper Man (although more of a sequel to the earlier Mort). I would not start reading the Discworld books here.

Susan is a student in the Quirm College for Young Ladies with an uncanny habit of turning invisible. Well, not invisible exactly; rather, people tend to forget that she's there, even when they're in the middle of talking to her. It's disconcerting for the teachers, but convenient when one is uninterested in Literature and would rather read a book.

She listened with half an ear to what the rest of the class was doing.

It was a poem about daffodils.

Apparently the poet had liked them very much.

Susan was quite stoic about this. It was a free country. People could like daffodils if they wanted to. They just should not, in Susan's very definite opinion, be allowed to take up more than a page to say so.

She got on with her education. In her opinion, school kept on trying to interfere with it.

Around her, the poet's vision was being taken apart with inexpert tools.

Susan's determinedly practical education is interrupted by the Death of Rats, with the help of a talking raven and Binky the horse, and without a lot of help from Susan, who is decidedly uninterested in being the sort of girl who goes on adventures. Adventures have a different opinion, since Susan's grandfather is Death. And Death has wandered off again.

Meanwhile, the bard Imp y Celyn, after an enormous row with his father, has gone to Ankh-Morpork. This is not going well; among other things, the Guild of Musicians and their monopoly and membership dues came as a surprise. But he does meet a dwarf and a troll in the waiting room of the Guild, and then buys an unusual music instrument in the sort of mysterious shop that everyone knows has been in that location forever, but which no one has seen before.

I'm not sure there is such a thing as a bad Discworld novel, but there is such a thing as an average Discworld novel. At least for me, Soul Music is one of those. There are some humorous bits, a few good jokes, one great character, and some nice bits of philosophy, but I found the plot forgettable and occasionally annoying. Susan is great. Imp is... not, which is made worse by the fact the reader is eventually expected to believe Susan cares enough about Imp to drive the plot.

Discworld has always been a mix of parody and Pratchett's own original creation, and I have always liked the original creation substantially more than the parody. Soul Music is a parody of rock music, complete with Cut-Me-Own-Throat Dibbler as an unethical music promoter. The troll Imp meets makes music by beating rocks together, so they decide to call their genre "music with rocks in it." The magical instrument Imp buys has twelve strings and a solid body. Imp y Celyn means "bud of the holly." You know, like Buddy Holly. Get it?

Pratchett's reference density is often on the edge of overwhelming the book, but for some reason the parody references in this one felt unusually forced and obvious to me. I did laugh occasionally, but by the end of the story the rock music plot had worn out its welcome. This is not helped by the ending being a mostly incoherent muddle of another parody (admittedly featuring an excellent motorcycle scene). Unlike Moving Pictures, which is a similar parody of Hollywood, Pratchett didn't seem to have much insightful to say about music. Maybe this will be more your thing if you like constant Blues Brothers references.

Susan, on the other hand, is wonderful, and for me is the reason to read this book. She is a delightfully atypical protagonist, and her interactions with the teachers and other students at the girl's school are thoroughly enjoyable. I would have happily read a whole book about her, and more broadly about Death and his family and new-found curiosity about the world. The Death of Rats was also fun, although more so in combination with the raven to translate. I wish this part of her story had a more coherent ending, but I'm looking forward to seeing her in future books.

Despite my complaints, the parody part of this book wasn't bad. It just wasn't as good as the rest of the book. I wanted a better platform for Susan's introduction than a lot of music and band references. If you really like Pratchett's parodies, your mileage may vary. For me, this book was fun but forgettable.

Followed, in publication order, by Interesting Times. The next Death book is Hogfather.

Rating: 7 out of 10

28 November, 2021 05:35AM

November 27, 2021

Review: A Psalm for the Wild-Built

Review: A Psalm for the Wild-Built, by Becky Chambers

Series: Monk & Robot #1
Publisher: Tordotcom
Copyright: July 2021
ISBN: 1-250-23622-3
Format: Kindle
Pages: 160

At the start of the story, Sibling Dex is a monk in a monastery in Panga's only City. They have spent their entire life there, love the buildings, know the hidden corners of the parks, and find the architecture beautiful. They're also heartily sick of it and desperate for the sound of crickets.

Sometimes, a person reaches a point in their life when it becomes absolutely essential to get the fuck out of the city.

Sibling Dex therefore decides to upend their life and travel the outlying villages doing tea service. And they do. They commission an ox-bike wagon, throw themselves into learning cultivation and herbs, experiment with different teas, and practice. It's a lot to learn, and they don't get it right from the start, but Sibling Dex is the sort of person who puts in the work to do something well. Before long, they have a new life as a traveling tea monk.

It's better than living in the City. But it still isn't enough.

We don't find out much about the moon of Panga in this story. Humans live there and it has a human-friendly biosphere with recognizable species, but it is clearly not Earth. The story does not reveal how humans came to live there. Dex's civilization is quite advanced and appears to be at least partly post-scarcity: people work and have professions, but money is rarely mentioned, poverty doesn't appear to be a problem, and Dex, despite being a monk with no obvious source of income, is able to commission the construction of a wagon home without any difficulty. They follow a religion that has no obvious Earth analogue.

The most fascinating thing about Panga is an event in its history. It previously had an economy based on robot factories, but the robots became sentient. Since this is a Becky Chambers story, the humans reaction was to ask the robots what they wanted to do and respect their decision. The robots, not very happy about having their whole existence limited to human design, decided to leave, walking off into the wild. Humans respected their agreement, rebuilt their infrastructure without using robots or artificial intelligence, and left the robots alone. Nothing has been heard from them in centuries.

As you might expect, Sibling Dex meets a robot. Its name is Mosscap, and it was selected to check in with humans. Their attempts to understand each other is much of the story. The rest is Dex's attempt to find what still seems to be missing from life, starting with an attempt to reach a ruined monastery out in the wild.

As with Chambers's other books, A Psalm for the Wild-Built contains a lot of earnest and well-meaning people having thoughtful conversations. Unlike her other books, there is almost no plot apart from those conversations of self-discovery and a profile of Sibling Dex as a character. That plus the earnestness of two naturally introspective characters who want to put their thoughts into words gave this story an oddly didactic tone for me. There are moments that felt like the moral of a Saturday morning cartoon show (I am probably dating myself), although the morals are more sophisticated and conditional. Saying I disliked the tone would be going too far, but it didn't flow as well for me as Chambers's other novels.

I liked the handling of religion, and I loved Sibling Dex's efforts to describe or act on an almost impossible to describe sense that their life isn't quite what they want. There are some lovely bits of description, including the abandoned monastery. The role of a tea monk in this imagined society is a neat, if small, bit of world-building: a bit like a counselor and a bit like a priest, but not truly like either because of the different focus on acceptance, listening, and a hot cup of tea. And Dex's interaction with Mosscap over offering and accepting food is a beautiful bit of characterization.

That said, the story as a whole didn't entirely gel for me, partly because of the didactic tone and partly because I didn't find Mosscap or the described culture of the robots as interesting as I was hoping that I would. But I'm still invested enough that I would read the sequel.

A Psalm for the Wild-Built feels like a prelude or character introduction more than a complete story. When we leave the characters, they're just getting started. You know more about the robots (and Sibling Dex) at the end than you did at the beginning, but don't expect much in the way of resolution.

Followed by A Prayer for the Crown-Shy, scheduled for 2022.

Rating: 7 out of 10

27 November, 2021 05:27AM

November 26, 2021

Reproducible Builds (diffoscope)

diffoscope 194 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 194. This version includes the following changes:

[ Chris Lamb ]
* Don't traceback when comparing nested directories with non-directories.
  (Closes: reproducible-builds/diffoscope#288)

You find out more by visiting the project homepage.

26 November, 2021 12:00AM

November 25, 2021

hackergotchi for Mike Gabriel

Mike Gabriel

Touching Firefox on Linux

More as a reminder to myself, but possibly also helpful to other people who want to use Firefox on a tablet running Debian...

Without the below adjustment, finger gestures in Firefox running on a tablet result in image moving, text highlighting, etc. (operations related to copy+paste). Not the intuitively expected behaviour...

If you use e.g. GNOME on Wayland for your tablet and want to enable touch functionalities in Firefox, then switch the whole browser to native Wayland rendering. This line in ~/.profile seems to help:

export MOZ_ENABLE_WAYLAND=1

If you use a desktop environment running on top of X.Org, then make sure you have added the following line to ~/.profile:

export MOZ_USE_XINPUT2=1

Logout/login again and Firefox should be scrollable with 2-finger movements up and down, zooming in and out also works then.

light+love
Mike (aka sunweaver at debian.org)

25 November, 2021 10:01AM by sunweaver

November 23, 2021

Enrico Zini

Really lossy compression of JPEG

Suppose you have a tool that archives images, or scientific data, and it has a test suite. It would be good to collect sample files for the test suite, but they are often so big one can't really bloat the repository with them.

But does the test suite need everything that is in those files? Not necesarily. For example, if one's testing code that reads EXIF metadata, one doesn't care about what is in the image.

That technique works extemely well. I can take GRIB files that are several megabytes in size, zero out their data payload, and get nice 1Kb samples for the test suite.

I've started to collect and organise the little hacks I use for this into a tool I called mktestsample:

$ mktestsample -v samples1/*
2021-11-23 20:16:32 INFO common samples1/cosmo_2d+0.grib: size went from 335168b to 120b
2021-11-23 20:16:32 INFO common samples1/grib2_ifs.arkimet: size went from 4993448b to 39393b
2021-11-23 20:16:32 INFO common samples1/polenta.jpg: size went from 3191475b to 94517b
2021-11-23 20:16:32 INFO common samples1/test-ifs.grib: size went from 1986469b to 4860b

Those are massive savings, but I'm not satisfied about those almost 94Kb of JPEG:

$ ls -la samples1/polenta.jpg
-rw-r--r-- 1 enrico enrico 94517 Nov 23 20:16 samples1/polenta.jpg
$ gzip samples1/polenta.jpg
$ ls -la samples1/polenta.jpg.gz
-rw-r--r-- 1 enrico enrico 745 Nov 23 20:16 samples1/polenta.jpg.gz

I believe I did all I could: completely blank out image data, set quality to zero, maximize subsampling, and tweak quantization to throw everything away.

Still, the result is a 94Kb file that can be gzipped down to 745 bytes. Is there something I'm missing?

I suppose JPEG is better at storing an image than at storing the lack of an image. I cannot really complain :)

I can still commit compressed samples of large images to a git repository, taking very little data indeed. That's really nice!

23 November, 2021 06:58PM

November 22, 2021

Jonathan Wiltshire

Mischief managed

I’m finally paying up a certain amount of household technical debt, including investigating some exciting mystery cabling and insulating the space it inhabits. This has meant pulling down large chunks of ceiling (eventually, most or all of it for the insulation) on a cable hunt.

Turns out the best tool for this part of the job is a decent length of 4 by 2, some borrowed muscle, and a certain amount of bravery. Once a couple of holes have been cut the old-fashioned way to be sure there’s nothing crucial above the ceiling (like the other side of the felt roof), the 4×2 really comes into its own:

To use the 4×2, aim for a gap between two joists and imagine you’re holding a caber. Launch it. The ceiling will come off far worse than the lump of wood you’re holding.

We found the mystery cable, but didn’t really solve the mystery it creates and in the process uncovered another bizarre installation. The local lighting circuit is mostly a spur system in that white junction box by the RSJ. The overhead supplies dive under the RSJ through the junction box to the light switch including a full 3-core feed, not the usual loop-in system used in the rest of the house (I am not sure how prevalent loop-in systems are in other countries, they’re sometimes called three-plate systems – but they’re very common in the UK).

It does at least explain why I could never reverse-engineer the setup from the ceiling roses alone, which had only half the cores in the fitting than expected throughout the room (it wasn’t even that the first fitting was looped in and being a supply for the others).

On the other hand, normalising everything to a loop-in system and removing that awful rats nest of TPE should be straightforward. Neutral isn’t required in that switch so that’s one less problem.

I couldn’t resist labelling the switch in its relocated position:

Unfortunately, as valuable as that exercise was, I still have to get to the bottom of the original mystery cable which is at varying points 6mm2, 2.5mm2 and 1.5mm2 with apparently no current limiter or switch separation. Time for a bit more 4 by 2…

22 November, 2021 10:50PM by Jonathan

hackergotchi for Ricardo Mones

Ricardo Mones

Claws Mail 4 in experimental

A full month has passed since Claws Mail 4.0.0 was uploaded to Debian experimental, and, somewhat surprisingly, I've received no bug report about it.

This of course can be either because nobody has been brave enough to install it or because well, it works really nice.

For those who don't know what I'm talking about, just note that this version is the first Debian upload for the GTK+3 version of Claws Mail. There was an initial upstream release, namely 3.99, but it was less polished and also I was very busy, so I decided not to upload it. Since then I've been using git's 'gtk3' branch daily without problems, so, for me, it's as stable as its GTK+2 counterpart. There's still some rough edges, of course.

Note also that, if everything goes well, Claws Mail 4.x will be the version to be shipped with Debian 12 (bookworm).

22 November, 2021 09:49AM by mones

hackergotchi for Paul Tagliamonte

Paul Tagliamonte

Be careful when using vxlan!

I’ve spent a bit of time playing with vxlan - which is very neat, but also incredibly insecure by default.

When using vxlan, be very careful to understand how the host is connected to the internet. The kernel will listen on all interfaces for packets, which means hosts accessable to VMs it’s hosting (e.g., by bridged interface or a private LAN will accept packets from VMs and inject them into arbitrary VLANs, even ones it’s not on.

I reported this to the kernel mailing list to no reply with more technical details.

The tl;dr is:

  $ ip link add vevx0a type veth peer name vevx0z
  $ ip addr add 169.254.0.2/31 dev vevx0a
  $ ip addr add 169.254.0.3/31 dev vevx0z
  $ ip link add vxlan0 type vxlan id 42 \
    local 169.254.0.2 dev vevx0a dstport 4789
  $ # Note the above 'dev' and 'local' ip are set here
  $ ip addr add 10.10.10.1/24 dev vxlan0

results in vxlan0 listening on all interfaces, not just vevx0z or vevx0a. To prove it to myself, I spun up a docker container (using a completely different network bridge – with no connection to any of the interfaces above), and ran a Go program to send VXLAN UDP packets to my bridge host:

$ docker run -it --rm -v $(pwd):/mnt debian:unstable /mnt/spam 172.17.0.1:4789
$

which results in packets getting injected into my vxlan interface

$ sudo tcpdump -e -i vxlan0
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on vxlan0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
21:30:15.746754 de:ad:be:ef:00:01 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 64: truncated-ip - 27706 bytes missing! 33.0.0.0 > localhost: ip-proto-114
21:30:15.746773 de:ad:be:ef:00:01 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 64: truncated-ip - 27706 bytes missing! 33.0.0.0 > localhost: ip-proto-114
21:30:15.746787 de:ad:be:ef:00:01 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 64: truncated-ip - 27706 bytes missing! 33.0.0.0 > localhost: ip-proto-114
21:30:15.746801 de:ad:be:ef:00:01 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 64: truncated-ip - 27706 bytes missing! 33.0.0.0 > localhost: ip-proto-114
21:30:15.746815 de:ad:be:ef:00:01 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 64: truncated-ip - 27706 bytes missing! 33.0.0.0 > localhost: ip-proto-114
21:30:15.746827 de:ad:be:ef:00:01 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 64: truncated-ip - 27706 bytes missing! 33.0.0.0 > localhost: ip-proto-114
21:30:15.746870 de:ad:be:ef:00:01 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 64: truncated-ip - 27706 bytes missing! 33.0.0.0 > localhost: ip-proto-114
21:30:15.746885 de:ad:be:ef:00:01 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 64: truncated-ip - 27706 bytes missing! 33.0.0.0 > localhost: ip-proto-114
21:30:15.746899 de:ad:be:ef:00:01 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 64: truncated-ip - 27706 bytes missing! 33.0.0.0 > localhost: ip-proto-114
21:30:15.746913 de:ad:be:ef:00:01 (oui Unknown) > Broadcast, ethertype IPv4 (0x0800), length 64: truncated-ip - 27706 bytes missing! 33.0.0.0 > localhost: ip-proto-114
10 packets captured
10 packets received by filter
0 packets dropped by kernel

(the program in question is the following:)

  package main

  import (
      "net"
      "os"
      "github.com/mdlayher/ethernet"
      "github.com/mdlayher/vxlan"
  )
  func main() {
      conn, err := net.Dial("udp", os.Args[1])
      if err != nil { panic(err) }
      for i := 0; i < 10; i++ {
          vxf := &vxlan.Frame{
              VNI: vxlan.VNI(42),
              Ethernet: &ethernet.Frame{
                  Source:      net.HardwareAddr{0xDE, 0xAD, 0xBE,
0xEF, 0x00, 0x01},
                  Destination: net.HardwareAddr{0xFF, 0xFF, 0xFF,
0xFF, 0xFF, 0xFF},
                  EtherType:   ethernet.EtherTypeIPv4,
                  Payload:     []byte("Hello, World!"),
              },
          }
          frb, err := vxf.MarshalBinary()
          if err != nil { panic(err) }
          _, err = conn.Write(frb)
          if err != nil { panic(err) }
      }
  }

When using vxlan, be absolutely sure all hosts that can address any interface on the host are authorized to send arbitrary packets into any VLAN that box can send to, or there’s very careful and specific controls and firewalling. Note this includes public interfaces (e.g., dual-homed private network / internet boxes), or any type of dual-homing (VPNs, etc).

22 November, 2021 02:39AM

November 21, 2021

Julian Andres Klode

APT Z3 Solver Basics

Z3 is a theorem prover developed at Microsoft research and available as a dynamically linked C++ library in Debian-based distributions. While the library is a whopping 16 MB, and the solver is a tad slow, it’s permissive licensing, and number of tactics offered give it a huge potential for use in solving dependencies in a wide variety of applications.

Z3 does not need normalized formulas, but offers higher level abstractions like atmost and atleast and implies, that we will make use of together with boolean variables to translate the dependency problem to a form Z3 understands.

In this post, we’ll see how we can apply Z3 to the dependency resolution in APT. We’ll only discuss the basics here, a future post will explore optimization criteria and recommends.

Translating the universe

APT’s package universe consists of 3 relevant things: packages (the tuple of name and architecture), versions (basically a .deb), and dependencies between versions.

While we could translate our entire universe to Z3 problems, we instead will construct a root set from packages that were manually installed and versions marked for installation, and then build the transitive root set from it by translating all versions reachable from the root set.

For each package P in the transitive root set, we create a boolean literal P. We then translate each version P1, P2, and so on. Translating a version means building a boolean literal for it, e.g. P1, and then translating the dependencies as shown below.

We now need to create two more clauses to satisfy the basic requirements for debs:

  1. If a version is installed, the package is installed; and vice versa. We can encode this requirement for P above as P == atleast({P1,P2}, 1).
  2. There can only be one version installed. We add an additional constraint of the form atmost({P1,P2}, 1).

We also encode the requirements of the operation.

  1. For each package P that is manually installed, add a constraint P.
  2. For each version V that is marked for install, add a constraint V.
  3. For each package P that is marked for removal, add a constraint !P.

Dependencies

Packages in APT have dependencies of two basic forms: Depends and Conflicts, as well as variations like Breaks (identical to Conflicts in solving terms), and Recommends (soft Depends) - we’ll ignore those for now. We’ll discuss Conflicts in the next section.

Let’s take a basic dependency list: A Depends: X|Y, Z. To represent that dependency, we expand each name to a list of versions that can satisfy the dependency, for example X1|X2|Y1, Z1.

Translating this dependency list to our Z3 solver, we create boolean variables X1,X2,Y1,Z1 and define two rules:

  1. A implies atleast({X1,X2,Y1}, 1)
  2. A implies atleast({Z1}, 1)

If there actually was nothing that satisfied the Z requirement, we’d have added a rule not A. It would be possible to simply not tell Z3 about the version at all as an optimization, but that adds more complexity, and the not A constraint should not cause too many problems.

Conflicts

Conflicts cannot have or in them. A dependency B Conflicts: X, Y means that only one of B, X, and Y can be installed. We can directly encode this in Z3 by using the constraint atmost({B,X,Y}, 1). This is an optimized encoding of the constraint: We could have encoded each conflict in the form !B or !X, !B or !X, and so on. Usually this leads to worse performance as it introduces additional clauses.

Complete example

Let’s assume we start with an empty install and want to install the package a below.

Package: a
Version: 1
Depends: c | b

Package: b
Version: 1

Package: b
Version: 2
Conflicts: x

Package: d
Version: 1

Package: x
Version: 1

The translation in Z3 rules looks like this:

  1. Package rules for a:
    1. a == atleast({a1}, 1) - package is installed iff one version is
    2. atmost({a1}, 1) - only one version may be installed
    3. a – a must be installed
  2. Dependency rules for a
    1. implies(a1, atleast({b2, b1}, 1)) – the translated dependency above. note that c is gone, it’s not reachable.
  3. Package rules for b:
    1. b == atleast({b1,b2}, 1) - package is installed iff one version is
    2. atmost({b1, b2}, 1) - only one version may be installed
  4. Dependencies for b (= 2):
    1. atmost({b2, x1}, 1) - the conflicts between x and b = 2 above
  5. Package rules for x:
    1. x == atleast({x1}, 1) - package is installed iff one version is
    2. atmost({x1}, 1) - only one version may be installed

The package d is not translated, as it is not reachable from the root set {a1}, the transitive root set is {a1,b1,b2,x1}.

Next iteration: Optimization

We have now constructed the basic set of rules that allows us to solve solve our dependency problems (equivalent to SAT), however it might lead to suboptimal solutions where it removes automatically installed packages, or installs more packages than necessary, to name a few examples.

In our next iteration, we have to look at introducing optimization; for example, have the minimum number of removals, the minimal number of changed packages, or satisfy as many recommends as possible. We will also look at the upgrade problem (upgrade as many packages as possible), the autoremove problem (remove as many automatically installed packages as possible).

21 November, 2021 07:49PM

Antoine Beaupré

The last syncmaildir crash

My syncmaildir (SMD) setup failed me one too many times (previously, previously). In an attempt to migrate to an alternative mail synchronization tool, I looked into using my IMAP server again, and found out my mail spool was in a pretty bad shape. I'm comparing mbsync and offlineimap in the next post but this post talks about how I recovered the mail spool so that tools like those could correctly synchronise the mail spool again.

The latest crash

On Monday, SMD just started failing with this error:

nov 15 16:12:19 angela systemd[2305]: Starting pull emails with syncmaildir...
nov 15 16:12:22 angela systemd[2305]: smd-pull.service: Succeeded.
nov 15 16:12:22 angela systemd[2305]: Finished pull emails with syncmaildir.
nov 15 16:14:08 angela systemd[2305]: Starting pull emails with syncmaildir...
nov 15 16:14:11 angela systemd[2305]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
nov 15 16:14:11 angela systemd[2305]: smd-pull.service: Failed with result 'exit-code'.
nov 15 16:14:11 angela systemd[2305]: Failed to start pull emails with syncmaildir.
nov 15 16:16:14 angela systemd[2305]: Starting pull emails with syncmaildir...
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: Network error.
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: Unable to get any data from the other endpoint.
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: This problem may be transient, please retry.
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: Hint: did you correctly setup the SERVERNAME variable
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: on your client? Did you add an entry for it in your ssh
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: configuration file?
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: Network error
nov 15 16:16:17 angela smd-pull[27188]: register: smd-client@localhost: TAGS: error::context(handshake) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
nov 15 16:16:17 angela systemd[2305]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
nov 15 16:16:17 angela systemd[2305]: smd-pull.service: Failed with result 'exit-code'.
nov 15 16:16:17 angela systemd[2305]: Failed to start pull emails with syncmaildir.

What is frustrating is that there's actually no network error here. Running the command by hand I did see a different message, but now I have lost it in my backlog. It had something to do with a filename being too long, and I gave up debugging after a while. This happened suddenly too, which added to the confusion.

In a fit of rage I started this blog post and experimenting with alternatives, which led me down a lot of rabbit holes.

Reviewing my previous mail crash documentation, it seems most solutions involve talking to an IMAP server, so I figured I would just do that. Wanting to try something new, i gave isync (AKA mbsync) a try. Oh dear, I did not expect how much trouble just talking to my IMAP server would be, which wasn't not isync's fault, for what that's worth. It was the primary tool I used to debug things, and served me well in that regard.

Mailbox corruption

The first thing I found out is that certain messages in the IMAP spool were corrupted. mbsync would stop on a FETCH command and Dovecot would give me those errors on the server side.

"wrong W value"

nov 16 15:31:27 marcos dovecot[3621800]: imap(anarcat)<3630489><wAmSzO3QZtfAqAB1>: Error: Mailbox junk: Maildir filename has wrong W value, renamed the file from /home/anarcat/Maildir/.junk/cur/1454623938.M101164P22216.marcos,S=2495,W=2578:2,S to /home/anarcat/Maildir/.junk/cur/1454623938.M101164P22216.marcos,S=2495:2,S
nov 16 15:31:27 marcos dovecot[3621800]: imap(anarcat)<3630489><wAmSzO3QZtfAqAB1>: Error: Mailbox junk: Deleting corrupted cache record uid=1582: UID 1582: Broken virtual size in mailbox junk: read(/home/anarcat/Maildir/.junk/cur/1454623938.M101164P22216.marcos,S=2495,W=2578:2,S): FETCH BODY[] got too little data: 2540 vs 2578

At least this first error was automatically healed by Dovecot (by renaming the file without the W= flag). The problem is that the FETCH command fails and mbsync exits noisily. So you need to constantly restart mbsync with a silly command like:

while ! mbsync -a; do sleep 1; done

"cached message size larger than expected"

nov 16 13:53:08 marcos dovecot[3520770]: imap(anarcat)<3594402><M5JHb+zQ3NLAqAB1>: Error: Mailbox Sent: UID=19288: read(/home/anarcat/Maildir/.Sent/cur/1224790447.M898726P9811V000000000000FE06I00794FB1_0.marvin,S=2588:2,S) failed: Cached message size larger than expected (2588 > 2482, box=Sent, UID=19288) (read reason=mail stream)
nov 16 13:53:08 marcos dovecot[3520770]: imap(anarcat)<3594402><M5JHb+zQ3NLAqAB1>: Error: Mailbox Sent: Deleting corrupted cache record uid=19288: UID 19288: Broken physical size in mailbox Sent: read(/home/anarcat/Maildir/.Sent/cur/1224790447.M898726P9811V000000000000FE06I00794FB1_0.marvin,S=2588:2,S) failed: Cached message size larger than expected (2588 > 2482, box=Sent, UID=19288)
nov 16 13:53:08 marcos dovecot[3520770]: imap(anarcat)<3594402><M5JHb+zQ3NLAqAB1>: Error: Mailbox Sent: UID=19288: read(/home/anarcat/Maildir/.Sent/cur/1224790447.M898726P9811V000000000000FE06I00794FB1_0.marvin,S=2588:2,S) failed: Cached message size larger than expected (2588 > 2482, box=Sent, UID=19288) (read reason=)
nov 16 13:53:08 marcos dovecot[3520770]: imap-login: Panic: epoll_ctl(del, 7) failed: Bad file descriptor

This second problem is much harder to fix, because dovecot does not recover automatically. This is Dovecot complaining that the cached size (the S= field, but also present in Dovecot's metadata files) doesn't match the file size.

I wonder if at least some of those messages were corrupted in the OfflineIMAP to syncmaildir migration because part of that procedure is to run the strip_header script to remove content from the emails. That could easily have broken things since the files do not also get renamed.

Workaround

So I read a lot of the Dovecot documentation on the maildir format, and wrote an extensive fix script for those two errors. The script worked and mbsync was able to sync the entire mail spool.

And no, rebuilding the index files didn't work. Also tried doveadm force-resync -u anarcat which didn't do anything.

In the end I also had to do this, because the wrong cache values were also stored elsewhere.

service dovecot stop ; find -name 'dovecot*' -delete; service dovecot start

This would have totally broken any existing clients, but thankfully I'm starting from scratch (except maybe webmail, but I'm hoping it will self-heal as well, assuming it only has a cache and not a full replica of the mail spool).

Incoherence between Maildir and IMAP

Unfortunately, the first mbsync was incomplete as it was missing about 15,000 mails:

anarcat@angela:~(main)$ find Maildir -type f -type f -a \! -name '.*' | wc -l 
384836
anarcat@angela:~(main)$ find Maildir-mbsync/ -type f -a \! -name '.*' | wc -l 
369221

As it turns out, mbsync was not at fault here either: this was yet more mail spool corruption.

It's actually 26 folders (out of 205) with inconsistent sizes, which can be found with:

for folder in * .[^.]* ; do 
  printf "%s\t%d\n" $folder $(find "$folder" -type f -a \! -name '.*' | wc -l );
done

The special \! -name '.*' bit is to ignore the mbsync metadata, which creates .uidvalidity and .mbsyncstate in every folder. That ignores about 200 files but since they are spread around all folders, which was making it impossible to review where the problem was.

Here is what the diff looks like:

--- Maildir-list    2021-11-17 20:42:36.504246752 -0500
+++ Maildir-mbsync-list 2021-11-17 20:18:07.731806601 -0500
@@ -6,16 +6,15 @@
[...]
 .Archives  1
 .Archives.2010 3553
-.Archives.2011 3583
-.Archives.2012 12593
+.Archives.2011 3582
+.Archives.2012 620
 .Archives.2013 8576
 .Archives.2014 11057
-.Archives.2015 8173
+.Archives.2015 8165
 .Archives.2016 54
 .band  34
 .bitbuck   1
@@ -38,13 +37,12 @@
 .couchsurfers  2
-cur    11285
+cur    11280
 .current   130
 .cv    2
 .debbug    262
-.debian    37544
-drafts 1
-.Drafts    4
+.debian    37533
+.Drafts    2
 .drone 241
 .drupal    188
 .drupal-devel  303
[...]

Misfiled messages

It's a bit all over the place, but we can already notice some huge differences between mailboxes, for example in the Archives folders. As it turns out, at least 12,000 of those missing mails were actually misfiled: instead of being in the Maildir/.Archives.2012/cur/ folder, they were directly in Maildir/.Archives.2012/. This is something that doesn't matter for SMD (and possibly for notmuch? it does matter, notmuch suddenly found 12,000 new mails) but that definitely matters to Dovecot and therefore mbsync...

After moving those files around, we still have 4,000 message missing:

anarcat@angela:~(main)$ find Maildir-mbsync/  -type f -a \! -name '.*' | wc -l 
381196
anarcat@angela:~(main)$ find Maildir/  -type f -a \! -name '.*' | wc -l 
385053

The problem is that those 4,000 missing mails are harder to track. Take, for example, .Archives.2011, which has a single message missing, out of 3,582. And the files are not identical: the checksums don't match after going through the IMAP transport, so we can't use a tool like hashdeep to compare the trees and find why any single file is missing.

"register" folder

One big chunk of the 4,000, however, is a special folder called register in my spool, which I am syncing separately (see Securing registration email for details on that setup). That actually covers 3,700 of those messages, so I actually have a more modest 300 messages to figure out, after (easily!) configuring mbsync to sync that folder separately:

 @@ -30,9 +33,29 @@ Slave :anarcat-local:
  # Exclude everything under the internal [Gmail] folder, except the interesting folders
  #Patterns * ![Gmail]* "[Gmail]/Sent Mail" "[Gmail]/Starred" "[Gmail]/All Mail"
  # Or include everything
 -Patterns *
 +#Patterns *
 +Patterns * !register  !.register
  # Automatically create missing mailboxes, both locally and on the server
  #Create Both
  Create slave
  # Sync the movement of messages between folders and deletions, add after making sure the sync works
  #Expunge Both
 +
 +IMAPAccount anarcat-register
 +Host imap.anarc.at
 +User register
 +PassCmd "pass imap.anarc.at-register"
 +SSLType IMAPS
 +CertificateFile /etc/ssl/certs/ca-certificates.crt
 +
 +IMAPStore anarcat-register-remote
 +Account anarcat-register
 +
 +MaildirStore anarcat-register-local
 +SubFolders Maildir++
 +Inbox ~/Maildir-mbsync/.register/
 +
 +Channel anarcat-register
 +Master :anarcat-register-remote:
 +Slave :anarcat-register-local:
 +Create slave

"tmp" folders and empty messages

After syncing the "register" messages, I end up with the measly little 160 emails out of sync:

anarcat@angela:~(main)$ find Maildir-mbsync/  -type f -a \! -name '.*' | wc -l 
384900
anarcat@angela:~(main)$ find Maildir/  -type f -a \! -name '.*' | wc -l 
385059

Argh. After more digging, I have found 131 mails in the tmp/ directories of the client's mail spool. Mysterious! On the server side, it's even more files, and not the same ones. Possible that those were mails that were left there during a failed delivery of some sort, during a power failure or some sort of crash? Who knows. It could be another race condition in SMD if it runs while mail is being delivered in tmp/...

The first thing to do with those is to cleanup a bunch of empty files (21 on angela):

find .[^.]*/tmp -type f -empty -delete

As it turns out, they are all duplicates, in the sense that notmuch can easily find a copy of files with the same message ID in its database. In other words, this hairy command returns nothing

find .[^.]*/tmp -type f | while read path; do
  msgid=$(grep -m 1  -i ^message-id "$path" | sed 's/Message-ID: //i;s/[<>]//g');
  if notmuch count --exclude=false  "id:$msgid" | grep -q 0; then
    echo "$path <$msgid> not in notmuch" ;
  fi;
done

... which is good. Or, to put it another way, this is safe:

find .[^.]*/tmp -type f -delete

Poof! 314 mails cleaned on the server side. Interestingly, SMD doesn't pick up on those changes at all and still sees files in tmp/ directories on the client side, so we need to operate the same twisted logic there.

notmuch to the rescue again

After cleaning that on the client, we get:

anarcat@angela:~(main)$ find Maildir/  -type f -a \! -name '.*' | wc -l 
384928
anarcat@angela:~(main)$ find Maildir-mbsync/  -type f -a \! -name '.*' | wc -l 
384901

Ha! 27 mails difference. Those are the really sticky, unclear ones. I was hoping a full sync might clear that up, but after deleting the entire directory and starting from scratch, I end up with:

anarcat@angela:~(main)$ find Maildir -type f -type f -a \! -name '.*' | wc -l 
385034
anarcat@angela:~(main)$ find Maildir-mbsync -type f -type f -a \! -name '.*' | wc -l 
384993

That is: even more messages missing (now 37). Sigh.

Thankfully, this is something notmuch can help with: it can index all files by Message-ID (which I learned is case-insensitive, yay) and tell us which messages don't make it through.

Considering the corruption I found in the mail spool, I wouldn't be the least surprised those messages are just skipped by the IMAP server. Unfortunately, there's nothing on the Dovecot server logs that would explain the discrepancy.

Here again, notmuch comes to the rescue. We can list all message IDs to figure out that discrepancy:

notmuch search --exclude=false --output=messages '*' | pv -s 18M | sort > Maildir-msgids
notmuch --config=.notmuch-config-mbsync search --exclude=false --output=messages '*' | pv -s 18M | sort > Maildir-mbsync-msgids

And then we can see how many messages notmuch thinks are missing:

$ wc -l *msgids
372723 Maildir-mbsync-msgids
372752 Maildir-msgids

That's 29 messages. Oddly, it doesn't exactly match the find output:

anarcat@angela:~(main)$ find Maildir-mbsync -type f -type f -a \! -name '.*' | wc -l 
385204
anarcat@angela:~(main)$ find Maildir -type f -type f -a \! -name '.*' | wc -l 
385241

That is 10 more messages. Ugh. But actually, I know what those are: more misfiled messages (in a .folder/draft/ directory, bizarrely, so the totals actually match.

In the notmuch output, there's a lot of stuff like this:

id:notmuch-sha1-fb880d673e24f5dae71b6b4d825d4a0d5d01cde4

Those are messages without a valid Message-ID. Notmuch (presumably) constructs one based on the file's checksum. Because the files differ between the IMAP server and the local mail spool (which is unfortunate, but possibly inevitable), those do not match. There are exactly the same number of those on both sides, so I'll go ahead and assume those are all accounted for.

What remains is:

anarcat@angela:~(main)$ diff -u Maildir-mbsync-msgids Maildir-msgids  | grep '^\-[^-]' | grep -v sha1 | wc -l 
2
anarcat@angela:~(main)$ diff -u Maildir-mbsync-msgids Maildir-msgids  | grep '^\+[^+]' | grep -v sha1 | wc -l 
21
anarcat@angela:~(main)$ 

ie. 21 missing from mbsync, and, surprisingly, 2 missing from the original mail spool.

Further inspection also showed they were all messages with some sort of "corruption": no body and only headers. I am not sure that is a legal email format in the first place. Since they were mostly spam or administrative emails ("You have been unsubscribed from mailing list..."), it seems fairly harmless to ignore those.

Conclusion

As we'll see in the next article, SMD has stellar performance. But that comes at a huge cost: it accesses the mail storage directly. This can (and has) created significant problems on the mail server. It's unclear exactly why those things happen, but Dovecot expects a particular storage format on its file, and it seems unwise to bypass that.

In the future, I'll try to remember to avoid that, especially since mechanisms like SMD require special server access (SSH) which, in the long term, I am not sure I want to maintain or expect.

In other words, just talking with an IMAP server opens up a lot more possibilities of hosting than setting up a custom synchronisation protocol over SSH. It's also safer and more reliable, as we have seen. Thankfully, I've been able to recover from all the errors I could find, but it could have gone differently and it would have been possible for SMD to permanently corrupt significant part of my mail archives.

In the end, however, the last drop was just another weird bug which, ironically, SMD mysteriously recovered from on its own while I was writing this documentation and migrating away from it.

In any case, I recommend SMD users start looking for alternatives. The project has been archived upstream, and the Debian package has been orphaned. I have seen significant mail box corruption, including entire mail spool destruction, mostly due to incorrect locking code. I have filed a release-critical bug in Debian to make sure it doesn't ship with Debian bookworm.

Alternatives like mbsync provide fast and reliable transport, including over SSH. See the next article for further discussion of the alternatives.

21 November, 2021 04:04PM

mbsync vs OfflineIMAP

After recovering from my latest email crash (previously, previously), I had to figure out which tool I should be using. I had many options but I figured I would start with a popular one (mbsync).

But I also evaluated OfflineIMAP which was resurrected from the Python 2 apocalypse, and because I had used it before, for a long time.

Read on for the details.

Benchmark setup

All programs were tested against a Dovecot 1:2.3.13+dfsg1-2 server, running Debian bullseye.

The client is a Purism 13v4 laptop with a Samsung SSD 970 EVO 1TB NVMe drive.

The server is a custom build with a AMD Ryzen 5 2600 CPU, and a RAID-1 array made of two NVMe drives (Intel SSDPEKNW010T8 and WDC WDS100T2B0C).

The mail spool I am testing against has almost 400k messages and takes 13GB of disk space:

$ notmuch count --exclude=false
372758
$ du -sh --exclude xapian Maildir
13G Maildir

The baseline we are comparing against is SMD (syncmaildir) which performs the sync in about 7-8 seconds locally (3.5 seconds for each push/pull command) and about 10-12 seconds remotely.

Anything close to that or better is good enough. I do not have recent numbers for a SMD full sync baseline, but the setup documentation mentions 20 minutes for a full sync. That was a few years ago, and the spool has obviously grown since then, so that is not a reliable baseline.

A baseline for a full sync might be also set with rsync, which copies files at nearly 40MB/s, or 317Mb/s!

anarcat@angela:tmp(main)$ time rsync -a --info=progress2 --exclude xapian  shell.anarc.at:Maildir/ Maildir/
 12,647,814,731 100%   37.85MB/s    0:05:18 (xfr#394981, to-chk=0/395815)    
72.38user 106.10system 5:19.59elapsed 55%CPU (0avgtext+0avgdata 15988maxresident)k
8816inputs+26305112outputs (0major+50953minor)pagefaults 0swaps

That is 5 minutes to transfer the entire spool. Incremental syncs are obviously pretty fast too:

anarcat@angela:tmp(main)$ time rsync -a --info=progress2 --exclude xapian  shell.anarc.at:Maildir/ Maildir/
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/395815)    
1.42user 0.81system 0:03.31elapsed 67%CPU (0avgtext+0avgdata 14100maxresident)k
120inputs+0outputs (3major+12709minor)pagefaults 0swaps

As an extra curiosity, here's the performance with tar, pretty similar with rsync, minus incremental which I cannot be bothered to figure out right now:

anarcat@angela:tmp(main)$ time ssh shell.anarc.at tar --exclude xapian -cf - Maildir/ | pv -s 13G | tar xf - 
56.68user 58.86system 5:17.08elapsed 36%CPU (0avgtext+0avgdata 8764maxresident)k
0inputs+0outputs (0major+7266minor)pagefaults 0swaps
12,1GiO 0:05:17 [39,0MiB/s] [===================================================================> ] 92%

Interesting that rsync manages to almost beat a plain tar on file transfer, I'm actually surprised by how well it performs here, considering there are many little files to transfer.

(But then again, this maybe is exactly where rsync shines: while tar needs to glue all those little files together, rsync can just directly talk to the other side and tell it to do live changes. Something to look at in another article maybe?)

Since both ends are NVMe drives, those should easily saturate a gigabit link. And in fact, a backup of the server mail spool achieves much faster transfer rate on disks:

anarcat@marcos:~$ tar fc - Maildir | pv -s 13G > Maildir.tar
15,0GiO 0:01:57 [ 131MiB/s] [===================================] 115%

That's 131Mibyyte per second, vastly faster than the gigabit link. The client has similar performance:

anarcat@angela:~(main)$ tar fc - Maildir | pv -s 17G > Maildir.tar
16,2GiO 0:02:22 [ 116MiB/s] [==================================] 95%

So those disks should be able to saturate a gigabit link, and they are not the bottleneck on fast links. Which begs the question of what is blocking performance of a similar transfer over the gigabit link, but that's another question altogether, because no sync program ever reaches the above performance anyways.

Finally, note that when I migrated to SMD, I wrote a small performance comparison that could be interesting here. It show SMD to be faster than OfflineIMAP, but not as much as we see here. In fact, it looks like OfflineIMAP slowed down significantly since then (May 2018), but this could be due to my larger mail spool as well.

mbsync

The isync (AKA mbsync) project is written in C and supports syncing Maildir and IMAP folders, with possibly multiple replicas. I haven't tested this but I suspect it might be possible to sync between two IMAP servers as well. It supports partial mirorrs, message flags, full folder support, and "trash" functionality.

Complex configuration file

I started with this .mbsyncrc configuration file:

SyncState *
Sync New ReNew Flags

IMAPAccount anarcat
Host imap.anarc.at
User anarcat
PassCmd "pass imap.anarc.at"
SSLType IMAPS
CertificateFile /etc/ssl/certs/ca-certificates.crt

IMAPStore anarcat-remote
Account anarcat

MaildirStore anarcat-local
# Maildir/top/sub/sub
#SubFolders Verbatim
# Maildir/.top.sub.sub
SubFolders Maildir++
# Maildir/top/.sub/.sub
# SubFolders legacy
# The trailing "/" is important
#Path ~/Maildir-mbsync/
Inbox ~/Maildir-mbsync/

Channel anarcat
# AKA Far, convert when all clients are 1.4+
Master :anarcat-remote:
# AKA Near
Slave :anarcat-local:
# Exclude everything under the internal [Gmail] folder, except the interesting folders
#Patterns * ![Gmail]* "[Gmail]/Sent Mail" "[Gmail]/Starred" "[Gmail]/All Mail"
# Or include everything
Patterns *
# Automatically create missing mailboxes, both locally and on the server
#Create Both
Create slave
# Sync the movement of messages between folders and deletions, add after making sure the sync works
#Expunge Both

Long gone are the days where I would spend a long time reading a manual page to figure out the meaning of every option. If that's your thing, you might like this one. But I'm more of a "EXAMPLES section" kind of person now, and I somehow couldn't find a sample file on the website. I started from the Arch wiki one but it's actually not great because it's made for Gmail (which is not a usual Dovecot server). So a sample config file in the manpage would be a great addition. Thankfully, the Debian packages ships one in /usr/share/doc/isync/examples/mbsyncrc.sample but I only found that after I wrote my configuration. It was still useful and I recommend people take a look if they want to understand the syntax.

Also, that syntax is a little overly complicated. For example, Far needs colons, like:

Far :anarcat-remote:

Why? That seems just too complicated. I also found that sections are not clearly identified: IMAPAccount and Channel mark section beginnings, for example, which is not at all obvious until you learn about mbsync's internals. There are also weird ordering issues: the SyncState option needs to be before IMAPAccount, presumably because it's global.

Using a more standard format like .INI or TOML could improve that situation.

Stellar performance

A transfer of the entire mail spool takes 56 minutes and 6 seconds, which is impressive.

It's not quite "line rate": the resulting mail spool was 12GB (which is a problem, see below), which turns out to be about 29Mbit/s and therefore not maxing the gigabit link, and an order of magnitude slower than rsync.

The incremental runs are roughly 2 seconds, which is even more impressive, as that's actually faster than rsync:

===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.015       0.052       1.930       2.029       2.105       
user        0.660       0.040       0.592       0.661       0.722       
sys         0.338       0.033       0.268       0.341       0.387    

Those tests were performed with isync 1.3.0-2.2 on Debian bullseye. Tests with a newer isync release originally failed because of a corrupted message that triggered bug 999804 (see below). Running 1.4.3 under valgrind works around the bug, but adds a 50% performance cost, the full sync running in 1h35m.

Once the upstream patch is applied, performance with 1.4.3 is fairly similar, considering that the new sync included the register folder with 4000 messages:

120.74user 213.19system 59:47.69elapsed 9%CPU (0avgtext+0avgdata 105420maxresident)k
29128inputs+28284376outputs (0major+45711minor)pagefaults 0swaps

That is ~13GB in ~60 minutes, which gives us 28.3Mbps. Incrementals are also pretty similar to 1.3.x, again considering the double-connect cost:

===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.500       0.087       2.340       2.491       2.629       
user        0.718       0.037       0.679       0.711       0.793       
sys         0.322       0.024       0.284       0.320       0.365

Those tests were all done on a Gigabit link, but what happens on a slower link? My server uplink is slow: 25 Mbps down, 6 Mbps up. There mbsync is worse than the SMD baseline:

===> multitime results
1: mbsync -a
Mean        Std.Dev.    Min         Median      Max
real        31.531      0.724       30.764      31.271      33.100      
user        1.858       0.125       1.721       1.818       2.131       
sys         0.610       0.063       0.506       0.600       0.695       

That's 30 seconds for a sync, which is an order of magnitude slower than SMD.

Great user interface

Compared to OfflineIMAP and (ahem) SMD, the mbsync UI is kind of neat:

anarcat@angela:~(main)$ mbsync -a
Notice: Master/Slave are deprecated; use Far/Near instead.
C: 1/2  B: 204/205  F: +0/0 *0/0 #0/0  N: +1/200 *0/0 #0/0

(Note that nice switch away from slavery-related terms too.)

The display is minimal, and yet informative. It's not obvious what does mean at first glance, but the manpage is useful at least for clarifying that:

This represents the cumulative progress over channels, boxes, and messages affected on the far and near side, respectively. The message counts represent added messages, messages with updated flags, and trashed messages, respectively. No attempt is made to calculate the totals in advance, so they grow over time as more information is gathered. (Emphasis mine).

In other words:

  • C 2/2: channels done/total (2 done out of 2)
  • B 204/205: mailboxes done/total (204 out of 205)
  • F: changes on the far side
  • N: +10/200 *0/0 #0/0: changes on the "near" side:
    • +10/200: 10 out of 200 messages downloaded
    • *0/0: no flag changed
    • #0/0: no message deleted

You get used to it, in a good way. It does not, unfortunately, show up when you run it in systemd, which is a bit annoying as I like to see a summary mail traffic in the logs.

Interoperability issue

In my notmuch setup, I have bound key S to "mark spam", which basically assigns the tag spam to the message and removes a bunch of others. Then I have a notmuch-purge script which moves that message to the spam folder, for training purposes. It basically does this:

notmuch search --output=files --format=text0 "$search_spam" \
    | xargs -r -0 mv -t "$HOME/Maildir/${PREFIX}junk/cur/"

This method, which worked fine in SMD (and also OfflineIMAP) created this error on sync:

Maildir error: duplicate UID 37578.

And indeed, there are now two messages with that UID in the mailbox:

anarcat@angela:~(main)$ find Maildir/.junk/ -name '*U=37578*'
Maildir/.junk/cur/1637427889.134334_2.angela,U=37578:2,S
Maildir/.junk/cur/1637348602.2492889_221804.angela,U=37578:2,S

This is actually a known limitation or, as mbsync(1) calls it, a "RECOMMENDATION":

When using the more efficient default UID mapping scheme, it is important that the MUA renames files when moving them between Maildir fold ers. Mutt always does that, while mu4e needs to be configured to do it:

(setq mu4e-change-filenames-when-moving t)

So it seems I would need to fix my script. It's unclear how the paths should be renamed, which is unfortunate, because I would need to change my script to adapt to mbsync, but I can't tell how just from reading the above.

(A manual fix is actually to rename the file to remove the U= field: mbsync will generate a new one and then sync correctly.)

Fortunately, someone else already fixed that issue: afew, a notmuch tagging script (much puns, such hurt), has a move mode that can rename files correctly, specifically designed to deal with mbsync. I had already been told about afew, but it's one more reason to standardize my notmuch hooks on that project, it looks like.

Update: I have tried to use afew and found it has significant performance issues. It also has a completely different paradigm to what I am used to: it assumes all incoming mail has a new and lays its own tags on top of that (inbox, sent, etc). It can only move files from one folder at a time (see this bug) which breaks my spam training workflow. In general, I sync my tags into folders (e.g. ham, spam, sent) and message flags (e.g. inbox is F, unread is "not S", etc), and afew is not well suited for this (although there are hacks that try to fix this). I have worked hard to make my tagging scripts idempotent, and it's something afew doesn't currently have. Still, it would be better to have that code in Python than bash, so maybe I should consider my options here.

Stability issues

The newer release in Debian bookworm (currently at 1.4.3) has stability issues on full sync. I filed bug 999804 in Debian about this, which lead to a thread on the upstream mailing list. I have found at least three distinct crashes that could be double-free bugs "which might be exploitable in the worst case", not a reassuring prospect.

The thing is: mbsync is really fast, but the downside of that is that it's written in C, and with that comes a whole set of security issues. The Debian security tracker has only three CVEs on isync, but the above issues show there could be many more.

Reading the source code certainly did not make me very comfortable with trusting it with untrusted data. I considered sandboxing it with systemd (below) but having systemd run as a --user process makes that difficult. I also considered using an apparmor profile but that is not trivial because we need to allow SSH and only some parts of it...

Thankfully, upstream has been diligent at addressing the issues I have found. They provided a patch within a few days which did fix the sync issues.

Update: upstream actually took the issue very seriously. They not only got CVE-2021-44143 assigned for my bug report, they also audited the code and found several more issues collectively identified as CVE-2021-3657, which actually also affect 1.3 (ie. Debian 11/bullseye/stable). Somehow my corpus doesn't trigger that issue, but it was still considered serious enough to warrant a CVE. So one the one hand: excellent response from upstream; but on the other hand: how many more of those could there be in there?

Automation with systemd

The Arch wiki has instructions on how to setup mbsync as a systemd service. It suggests using the --verbose (-V) flag which is a little intense here, as it outputs 1444 lines of messages.

I have used the following .service file:

[Unit]
Description=Mailbox synchronization service
ConditionHost=!marcos
Wants=network-online.target
After=network-online.target
Before=notmuch-new.service

[Service]
Type=oneshot
ExecStart=/usr/bin/mbsync -a
Nice=10
IOSchedulingClass=idle
NoNewPrivileges=true

[Install]
WantedBy=default.target

And the following .timer:

[Unit]
Description=Mailbox synchronization timer
ConditionHost=!marcos

[Timer]
OnBootSec=2m
OnUnitActiveSec=5m
Unit=mbsync.service

[Install]
WantedBy=timers.target

Note that we trigger notmuch through systemd, with the Before and also by adding mbsync.service to the notmuch-new.service file:

[Unit]
Description=notmuch new
After=mbsync.service

[Service]
Type=oneshot
Nice=10
ExecStart=/usr/bin/notmuch new

[Install]
WantedBy=mbsync.service

An improvement over polling repeatedly with a .timer would be to wake up only on IMAP notify, but neither imapnotify nor goimapnotify seem to be packaged in Debian. It would also not cover for the "sent folder" use case, where we need to wake up on local changes.

Password-less setup

The sample file suggests this should work:

IMAPStore remote
Tunnel "ssh -q host.remote.com /usr/sbin/imapd"

Add BatchMode, restrict to IdentitiesOnly, provide a password-less key just for this, add compression (-C), find the Dovecot imap binary, and you get this:

IMAPAccount anarcat-tunnel
Tunnel "ssh -o BatchMode=yes -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519_mbsync -o HostKeyAlias=shell.anarc.at -C [email protected] /usr/lib/dovecot/imap"

And it actually seems to work:

$ mbsync -a
Notice: Master/Slave are deprecated; use Far/Near instead.
C: 0/2  B: 0/1  F: +0/0 *0/0 #0/0  N: +0/0 *0/0 #0/0imap(anarcat): Error: net_connect_unix(/run/dovecot/stats-writer) failed: Permission denied
C: 2/2  B: 205/205  F: +0/0 *0/0 #0/0  N: +1/1 *3/3 #0/0imap(anarcat)<1611280><90uUOuyElmEQlhgAFjQyWQ>: Info: Logged out in=10808 out=15396642 deleted=0 expunged=0 trashed=0 hdr_count=0 hdr_bytes=0 body_count=1 body_bytes=8087

It's a bit noisy, however. dovecot/imap doesn't have a "usage" to speak of, but even the source code doesn't hint at a way to disable that Error message, so that's unfortunate. That socket is owned by root:dovecot so presumably Dovecot runs the imap process as $user:dovecot, which we can't do here. Oh well?

Interestingly, the SSH setup is not faster than IMAP.

With IMAP:

===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.367       0.065       2.220       2.376       2.458       
user        0.793       0.047       0.731       0.776       0.871       
sys         0.426       0.040       0.364       0.434       0.476

With SSH:

===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.515       0.088       2.274       2.532       2.594       
user        0.753       0.043       0.645       0.766       0.804       
sys         0.328       0.045       0.212       0.340       0.393

Basically: 200ms slower. Tolerable.

Migrating from SMD

The above was how I migrated to mbsync on my first workstation. The work on the second one was more streamlined, especially since the corruption on mailboxes was fixed:

  1. install isync, with the patch:

    dpkg -i isync_1.4.3-1.1~_amd64.deb
    
  2. copy all files over from previous workstation to avoid a full resync (optional):

    rsync -a --info=progress2 angela:Maildir/ Maildir-mbsync/
    
  3. rename all files to match new hostname (optional):

    find Maildir-mbsync/ -type f -name '*.angela,*' -print0 |  rename -0 's/\.angela,/\.curie,/'
    
  4. trash the notmuch database (optional):

    rm -rf Maildir-mbsync/.notmuch/xapian/
    
  5. disable all smd and notmuch services:

    systemctl --user --now disable smd-pull.service smd-pull.timer smd-push.service smd-push.timer notmuch-new.service notmuch-new.timer
    
  6. do one last sync with smd:

    smd-pull --show-tags ; smd-push --show-tags ; notmuch new ; notmuch-sync-flagged -v
    
  7. backup notmuch on the client and server:

    notmuch dump | pv > notmuch.dump
    
  8. backup the maildir on the client and server:

    cp -al Maildir Maildir-bak
    
  9. create the SSH key:

    ssh-keygen -t ed25519 -f .ssh/id_ed25519_mbsync
    cat .ssh/id_ed25519_mbsync.pub
    
  10. add to .ssh/authorized_keys on the server, like this:

    command="/usr/lib/dovecot/imap",restrict ssh-ed25519 AAAAC...

  11. move old files aside, if present:

    mv Maildir Maildir-smd
    
  12. move new files in place (CRITICAL SECTION BEGINS!):

    mv Maildir-mbsync Maildir
    
  13. run a test sync, only pulling changes:

    mbsync --create-near --remove-none --expunge-none --noop anarcat-register

  14. if that works well, try with all mailboxes:

    mbsync --create-near --remove-none --expunge-none --noop -a

  15. if that works well, try again with a full sync:

    mbsync register mbsync -a

  16. reindex and restore the notmuch database, this should take ~25 minutes:

    notmuch new
    pv notmuch.dump | notmuch restore
    
  17. enable the systemd services and retire the smd-* services:

    systemctl --user enable mbsync.timer notmuch-new.service systemctl --user start mbsync.timer rm ~/.config/systemd/user/smd* systemctl daemon-reload

During the migration, notmuch helpfully told me the full list of those lost messages:

[...]
Warning: cannot apply tags to missing message: CAN6gO7_QgCaiDFvpG3AXHi6fW12qaN286+2a7ERQ2CQtzjSEPw@mail.gmail.com
Warning: cannot apply tags to missing message: CAPTU9Wmp0yAmaxO+qo8CegzRQZhCP853TWQ_Ne-YF94MDUZ+Dw@mail.gmail.com
Warning: cannot apply tags to missing message: [email protected]
[...]
Warning: cannot apply tags to missing message: [email protected]
Warning: cannot apply tags to missing message: [email protected]
Warning: cannot apply tags to missing message: notmuch-sha1-000458df6e48d4857187a000d643ac971deeef47
Warning: cannot apply tags to missing message: notmuch-sha1-0079d8e0c3340e6f88c66f4c49fca758ea71d06d
Warning: cannot apply tags to missing message: notmuch-sha1-0194baa4cfb6d39bc9e4d8c049adaccaa777467d
Warning: cannot apply tags to missing message: notmuch-sha1-02aede494fc3f9e9f060cfd7c044d6d724ad287c
Warning: cannot apply tags to missing message: notmuch-sha1-06606c625d3b3445420e737afd9a245ae66e5562
Warning: cannot apply tags to missing message: notmuch-sha1-0747b020f7551415b9bf5059c58e0a637ba53b13
[...]

As detailed in the crash report, all of those were actually innocuous and could be ignored.

Also note that we completely trash the notmuch database because it's actually faster to reindex from scratch than let notmuch slowly figure out that all mails are new and all the old mails are gone. The fresh indexing took:

nov 19 15:08:54 angela notmuch[2521117]: Processed 384679 total files in 23m 41s (270 files/sec.).
nov 19 15:08:54 angela notmuch[2521117]: Added 372610 new messages to the database.

While a reindexing on top of an existing database was going twice as slow, at about 120 files/sec.

Current config file

Putting it all together, I ended up with the following configuration file:

SyncState *
Sync All

# IMAP side, AKA "Far"
IMAPAccount anarcat-imap
Host imap.anarc.at
User anarcat
PassCmd "pass imap.anarc.at"
SSLType IMAPS
CertificateFile /etc/ssl/certs/ca-certificates.crt

IMAPAccount anarcat-tunnel
Tunnel "ssh -o BatchMode=yes -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519_mbsync -o HostKeyAlias=shell.anarc.at -C [email protected] /usr/lib/dovecot/imap"

IMAPStore anarcat-remote
Account anarcat-tunnel

# Maildir side, AKA "Near"
MaildirStore anarcat-local
# Maildir/top/sub/sub
#SubFolders Verbatim
# Maildir/.top.sub.sub
SubFolders Maildir++
# Maildir/top/.sub/.sub
# SubFolders legacy
# The trailing "/" is important
#Path ~/Maildir-mbsync/
Inbox ~/Maildir/

# what binds Maildir and IMAP
Channel anarcat
Far :anarcat-remote:
Near :anarcat-local:
# Exclude everything under the internal [Gmail] folder, except the interesting folders
#Patterns * ![Gmail]* "[Gmail]/Sent Mail" "[Gmail]/Starred" "[Gmail]/All Mail"
# Or include everything
#Patterns *
Patterns * !register  !.register
# Automatically create missing mailboxes, both locally and on the server
Create Both
#Create Near
# Sync the movement of messages between folders and deletions, add after making sure the sync works
Expunge Both
# Propagate mailbox deletion
Remove both

IMAPAccount anarcat-register-imap
Host imap.anarc.at
User register
PassCmd "pass imap.anarc.at-register"
SSLType IMAPS
CertificateFile /etc/ssl/certs/ca-certificates.crt

IMAPAccount anarcat-register-tunnel
Tunnel "ssh -o BatchMode=yes -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519_mbsync -o HostKeyAlias=shell.anarc.at -C [email protected] /usr/lib/dovecot/imap"

IMAPStore anarcat-register-remote
Account anarcat-register-tunnel

MaildirStore anarcat-register-local
SubFolders Maildir++
Inbox ~/Maildir/.register/

Channel anarcat-register
Far :anarcat-register-remote:
Near :anarcat-register-local:
Create Both
Expunge Both
Remove both

Note that it may be out of sync with my live (and private) configuration file, as I do not publish my "dotfiles" repository publicly for security reasons.

OfflineIMAP

I've used OfflineIMAP for a long time before switching to SMD. I don't exactly remember why or when I started using it, but I do remember it became painfully slow as I started using notmuch, and would sometimes crash mysteriously. It's been a while, so my memory is hazy on that.

It also kind of died in a fire when Python 2 stop being maintained. The main author moved on to a different project, imapfw which could serve as a framework to build IMAP clients, but never seemed to implement all of the OfflineIMAP features and certainly not configuration file compatibility. Thankfully, a new team of volunteers ported OfflineIMAP to Python 3 and we can now test that new version to see if it is an improvement over mbsync.

Crash on full sync

The first thing that happened on a full sync is this crash:

Copy message from RemoteAnarcat:junk:
 ERROR: Copying message 30624 [acc: Anarcat]
  decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Thread 'Copy message from RemoteAnarcat:junk' terminated with exception:
Traceback (most recent call last):
  File "/usr/share/offlineimap3/offlineimap/imaputil.py", line 406, in utf7m_decode
    for c in binary.decode():
AttributeError: 'memoryview' object has no attribute 'decode'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/share/offlineimap3/offlineimap/threadutil.py", line 146, in run
    Thread.run(self)
  File "/usr/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 802, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 342, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 908, in _fetch_from_imap
    ndata1 = self.parser['8bit-RFC'].parsebytes(data[0][1])
  File "/usr/lib/python3.9/email/parser.py", line 123, in parsebytes
    return self.parser.parsestr(text, headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 67, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 56, in parse
    feedparser.feed(data)
  File "/usr/lib/python3.9/email/feedparser.py", line 176, in feed
    self._call_parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 180, in _call_parse
    self._parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 298, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 256, in _parsegen
    if self._cur.get_content_type() == 'message/delivery-status':
  File "/usr/lib/python3.9/email/message.py", line 578, in get_content_type
    value = self.get('content-type', missing)
  File "/usr/lib/python3.9/email/message.py", line 471, in get
    return self.policy.header_fetch_parse(k, v)
  File "/usr/lib/python3.9/email/policy.py", line 163, in header_fetch_parse
    return self.header_factory(name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 601, in __call__
    return self[name](name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 196, in __new__
    cls.parse(value, kwds)
  File "/usr/lib/python3.9/email/headerregistry.py", line 445, in parse
    kwds['parse_tree'] = parse_tree = cls.value_parser(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2675, in parse_content_type_header
    ctype.append(parse_mime_parameters(value[1:]))
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2569, in parse_mime_parameters
    token, value = get_parameter(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2492, in get_parameter
    token, value = get_value(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2403, in get_value
    token, value = get_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1294, in get_quoted_string
    token, value = get_bare_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1223, in get_bare_quoted_string
    token, value = get_encoded_word(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1064, in get_encoded_word
    text, charset, lang, defects = _ew.decode('=?' + tok + '?=')
  File "/usr/lib/python3.9/email/_encoded_words.py", line 181, in decode
    string = bstring.decode(charset)
AttributeError: decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')


Last 1 debug messages logged for Copy message from RemoteAnarcat:junk prior to exception:
thread: Register new thread 'Copy message from RemoteAnarcat:junk' (account 'Anarcat')
ERROR: Exceptions occurred during the run!
ERROR: Copying message 30624 [acc: Anarcat]
  decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')

Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 802, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 342, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 908, in _fetch_from_imap
    ndata1 = self.parser['8bit-RFC'].parsebytes(data[0][1])
  File "/usr/lib/python3.9/email/parser.py", line 123, in parsebytes
    return self.parser.parsestr(text, headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 67, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 56, in parse
    feedparser.feed(data)
  File "/usr/lib/python3.9/email/feedparser.py", line 176, in feed
    self._call_parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 180, in _call_parse
    self._parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 298, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 256, in _parsegen
    if self._cur.get_content_type() == 'message/delivery-status':
  File "/usr/lib/python3.9/email/message.py", line 578, in get_content_type
    value = self.get('content-type', missing)
  File "/usr/lib/python3.9/email/message.py", line 471, in get
    return self.policy.header_fetch_parse(k, v)
  File "/usr/lib/python3.9/email/policy.py", line 163, in header_fetch_parse
    return self.header_factory(name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 601, in __call__
    return self[name](name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 196, in __new__
    cls.parse(value, kwds)
  File "/usr/lib/python3.9/email/headerregistry.py", line 445, in parse
    kwds['parse_tree'] = parse_tree = cls.value_parser(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2675, in parse_content_type_header
    ctype.append(parse_mime_parameters(value[1:]))
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2569, in parse_mime_parameters
    token, value = get_parameter(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2492, in get_parameter
    token, value = get_value(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2403, in get_value
    token, value = get_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1294, in get_quoted_string
    token, value = get_bare_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1223, in get_bare_quoted_string
    token, value = get_encoded_word(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1064, in get_encoded_word
    text, charset, lang, defects = _ew.decode('=?' + tok + '?=')
  File "/usr/lib/python3.9/email/_encoded_words.py", line 181, in decode
    string = bstring.decode(charset)

Folder junk [acc: Anarcat]:
 Copy message UID 30626 (29008/49310) RemoteAnarcat:junk -> LocalAnarcat:junk
Command exited with non-zero status 100
5252.91user 535.86system 3:21:00elapsed 47%CPU (0avgtext+0avgdata 846304maxresident)k
96344inputs+26563792outputs (1189major+2155815minor)pagefaults 0swaps

That only transferred about 8GB of mail, which gives us a transfer rate of 5.3Mbit/s, more than 5 times slower than mbsync. This bug is possibly limited to the bullseye version of offlineimap3 (the lovely 0.0~git20210225.1e7ef9e+dfsg-4), while the current sid version (the equally gorgeous 0.0~git20211018.e64c254+dfsg-1) seems unaffected.

Tolerable performance

The new release still crashes, except it does so at the very end, which is an improvement, since the mails do get transferred:

 *** Finished account 'Anarcat' in 511:12
ERROR: Exceptions occurred during the run!
ERROR: Exception parsing message with ID (<[email protected]>) from imaplib (response type: bytes).
 AttributeError: decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')

Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 810, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 343, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 910, in _fetch_from_imap
    raise OfflineImapError(

ERROR: Exception parsing message with ID (<[email protected]>) from imaplib (response type: bytes).
 AttributeError: decoding with 'x-mac-roman' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')

Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 810, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 343, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 910, in _fetch_from_imap
    raise OfflineImapError(

ERROR: IMAP server 'RemoteAnarcat' does not have a message with UID '32686'

Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 810, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 343, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 889, in _fetch_from_imap
    raise OfflineImapError(reason, severity)

Command exited with non-zero status 1
8273.52user 983.80system 8:31:12elapsed 30%CPU (0avgtext+0avgdata 841936maxresident)k
56376inputs+43247608outputs (811major+4972914minor)pagefaults 0swaps
"offlineimap  -o " took 8 hours 31 mins 15 secs

This is 8h31m for transferring 12G, which is around 3.1Mbit/s. That is nine times slower than mbsync, almost an order of magnitude!

Now that we have a full sync, we can test incremental synchronization. That is also much slower:

===> multitime results
1: sh -c "offlineimap -o || true"
            Mean        Std.Dev.    Min         Median      Max
real        24.639      0.513       23.946      24.526      25.708      
user        23.912      0.473       23.404      23.795      24.947      
sys         1.743       0.105       1.607       1.729       2.002

That is also an order of magnitude slower than mbsync, and significantly slower than what you'd expect from a sync process. ~30 seconds is long enough to make me impatient and distracted; 3 seconds, less so: I can wait and see the results almost immediately.

Integrity check

That said: this is still on a gigabit link. It's technically possible that OfflineIMAP performs better than mbsync over a slow link, but I Haven't tested that theory.

The OfflineIMAP mail spool is missing quite a few messages as well:

anarcat@angela:~(main)$ find Maildir-offlineimap -type f -type f -a \! -name '.*' | wc -l 
381463
anarcat@angela:~(main)$ find Maildir -type f -type f -a \! -name '.*' | wc -l 
385247

... although that's probably all either new messages or the register folder, so OfflineIMAP might actually be in a better position there. But digging in more, it seems like the actual per-folder diff is fairly similar to mbsync: a few messages missing here and there. Considering OfflineIMAP's instability and poor performance, I have not looked any deeper in those discrepancies.

Other projects to evaluate

Those are all the options I have considered, in alphabetical order

  • doveadm-sync: requires dovecot on both ends, can tunnel over SSH, may have performance issues in incremental sync, written in C
  • fdm: fetchmail replacement, IMAP/POP3/stdin/Maildir/mbox,NNTP support, SOCKS support (for Tor), complex rules for delivering to specific mailboxes, adding headers, piping to commands, etc. discarded because no (real) support for keeping mail on the server, and written in C
  • getmail: fetchmail replacement, IMAP/POP3 support, supports incremental runs, classification rules, Python
  • interimap: syncs two IMAP servers, apparently faster than doveadm and offlineimap, but requires running an IMAP server locally, Perl
  • isync/mbsync: TLS client certs and SSH tunnels, fast, incremental, IMAP/POP/Maildir support, multiple mailbox, trash and recursion support, and generally has good words from multiple Debian and notmuch people (Arch tutorial), written in C, review above
  • mail-sync: notify support, happens over any piped transport (e.g. ssh), diff/patch system, requires binary on both ends, mentions UUCP in the manpage, mentions rsmtp which is a nice name for rsendmail. not evaluated because it seems awfully complex to setup, Haskell
  • nncp: treat the local spool as another mail server, not really compatible with my "multiple clients" setup, Golang
  • offlineimap3: requires IMAP, used the py2 version in the past, might just still work, first sync painful (IIRC), ways to tunnel over SSH, review above, Python

Most projects were not evaluated due to lack of time.

Conclusion

I'm now using mbsync to sync my mail. I'm a little disappointed by the synchronisation times over the slow link, but I guess that's on par for the course if we use IMAP. We are bound by the network speed much more than with custom protocols. I'm also worried about the C implementation and the crashes I have witnessed, but I am encouraged by the fast upstream response.

Time will tell if I will stick with that setup. I'm certainly curious about the promises of interimap and mail-sync, but I have ran out of time on this project.

21 November, 2021 04:04PM

November 20, 2021

hackergotchi for Jonathan Dowland

Jonathan Dowland

hledger footguns

I wrote in budgeting tools that I was taking a look at Plain Text Accounting and in particular, hledger. My Jury's still out on the tools, but in the time I've been looking at them I've come across a couple of foot-guns I thought it was worth writing down.

hledger's ledger format is derived from that of its predecessor ledger, and so some of the problems might be inherited.

1. significant white space delimiters

The basic syntax for a transaction looks like this

2020-03-15 client payment
    assets:checking         $ 2000
    income:consulting       $-2000

There's some significant white space delimiters in play. The most subtle is what separates the account names from the values: it is two or more spaces. A single space, and the value is treated as part of the account name. For some reason I hit this frequently with trying to encode opening balances: the account name used as the source of the initial balances is something not otherwise generally referred to again (something like equity:opening balances) and the transaction amount is inferred where possible, so I ended up with a bunch of accounts named equity:opening balances £100 and similar.

2. flexible decimal delimiter

The value of transactions can be interspersed with commas and periods to make it more readable: e.g. $2000 could be written as $2,000. Different locales have different conventions here: It seems some(/most/all?) of Europe use periods to separate out the units and a comma to delimit the fractional part, whereas the US and the UK do the opposite. There is no built-in association between the currency symbol you are using and the period/comma convention: it's quite possible to accidentally write a number which is interpreted differently to how you intended, and it doesn't matter if you are using $ or £ etc.

3. new syntax has unexpected results in old versions

Finally, my favourite. hledger has a notion of rules that can be used to match transactions when importing from CSV. The format looks like this:

if (match rule)
& (another rule)
account1 some:account:from
account2 some:account:to

By default, multiple rules in sequence like above are OR'd: any of them can match. The & prefix switches the behaviour to AND. But, & is a relatively new addition: it's not supported in 1.18.1, the version in Debian stable, which upstream released in June 2020. In prior versions the & prefix is not a syntax error, or at least, not one that's reported: it's silently ignored; meaning, the line with the & does nothing, and any of the other rules in the set will match. This is easy to miss, and means imports could be incorrectly posted.

20 November, 2021 09:03PM

November 19, 2021

Mike Hommey

Announcing git-cinnabar 0.5.8

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.5.7?

  • Updated git to 2.34.0 for the helper.
  • Python 3.5 and newer are now officially supported. Git-cinnabar will try to use the python3 program by default, but will fallback to python2.7 if that’s where the Mercurial libraries are available. It is possible to pick a specific python with the GIT_CINNABAR_PYTHON environment variable.
  • Fixed compatibility with Mercurial 5.8 and newer.
  • The prebuilt binaries are now optimized on arm64 macOS and Windows.
  • git cinnabar download now properly returns an error code when failing to extract the prebuilt binaries.
  • Pushing to a non-empty Mercurial repository without having pulled at least once from it is now prevented.
  • Replaced the nagging about fsck with a smaller check always happening after pulling.
  • Fail earlier on git fetch hg::url <sha1> (it would properly fetch the Mercurial changeset and its ancestors, but git would fail at the end because the sha1 is not a git sha1 ; use git cinnabar fetch instead)
  • Minor fixes.

19 November, 2021 10:05PM by glandium

hackergotchi for Gunnar Wolf

Gunnar Wolf

For our millionth bug, bookworms eat raspberries alive

I guess you already heard, right? The Debian Bug Tracking System has hit a big milestone! We just passed our one millionth bug report! (and yes, that’s a cause for celebration; bug reporting is probably the best way for the system to grow and improve)

So, to celebrate, I want to announce I have nudged our unofficial Raspberry Pi images build scripts to now also build images for our upcoming Debian release, Debian 12 «Bookworm»

(image above: A bookworm learns about raspberries in various stages of testing. Image sources: Transformers Wiki, CC BY-SA and Sam Saunders at Flickr, CC BY-SA)

So… Get’em while they are fresh! https://raspi.debian.net/! And enjoy the following (non-book)worm-on-a-raspberry picture from Wikimedia Commons:

Oh, FWIW – The site still shows images for Buster. You will notice they are no longer being autobuilt (why spend CPU time in something that’s no longer going to change significatively?). The Bookworm images are not yet tested; as soon as I can test them, I will drop the Buster ones.

19 November, 2021 03:37PM

hackergotchi for Evgeni Golov

Evgeni Golov

A String is not a String, and that's Groovy!

Halloween is over, but I still have some nightmares to share with you, so sit down, take some hot chocolate and enjoy :)

When working with Jenkins, there is almost no way to avoid writing Groovy. Well, unless you only do old style jobs with shell scripts, but y'all know what I think about shell scripts…

Anyways, Eric have been rewriting the jobs responsible for building Debian packages for Foreman to pipelines (and thus Groovy).

Our build process for pull requests is rather simple:

  1. Setup sources - get the orig tarball and adjust changelog to have an unique version for pull requests
  2. Call pbuilder
  3. Upload the built package to a staging archive for testing

For merges, it's identical, minus the changelog adjustment.

And if there are multiple packages changed in one go, it runs each step in parallel for each package.

Now I've been doing mass changes to our plugin packages, to move them to a shared postinst helper instead of having the same code over and over in every package. This required changes to many packages and sometimes I'd end up building multiple at once. That should be fine, right?

Well, yeah, it did build fine, but the upload only happened for the last package. This felt super weird, especially as I was absolutely sure we did test this scenario (multiple packages in one PR) and it worked just fine…

So I went on a ride though the internals of the job, trying to understand why it didn't work.

This requires a tad more information about the way we handle packages for Foreman:

  • the archive is handled by freight
  • it has suites like buster, focal and plugins (that one is a tad special)
  • each suite has components that match Foreman releases, so 2.5, 3.0, 3.1, nightly etc
  • core packages (Foreman etc) are built for all supported distributions (right now: buster and focal)
  • plugin packages are built only once and can be used on every distribution

As generating the package index isn't exactly fast in freight, we tried not not run it too often. The idea was that when we build two packages for the same target (suite/version combination), we upload both at once and run import only once for both. That means that when we build Foreman for buster and focal, this results in two parallel builds and then two parallel uploads (as they end up in different suites). But if we build Foreman and Foreman Installer, we have four parallel builds, but only two parallel uploads, as we can batch upload Foreman and Installer per suite. Well, or so was the theory.

The Groovy code, that was supposed to do this looked roughly like this:

def packages_to_build = find_changed_packages()
def repos = [:]

packages_to_build.each { pkg ->
    suite = 'buster'
    component = '3.0'
    target = "${suite}-${component}"

    if (!repos.containsKey(target)) {
        repos[target] = []
    }

    repos[target].add(pkg)
}

do_the_build(packages_to_build)
do_the_upload(repos)

That's pretty straight forward, no? We create an empty Map, loop over a list of packages and add them to an entry in the map which we pre-create as empty if it doesn't exist.

Well, no, the resulting map always ended with only having one element in each target list. And this is also why our original tests always worked: we tested with a PR containing changes to Foreman and a plugin, and plugins go to this special target we have…

So I started playing with the code (https://groovyide.com/playground is really great for that!), trying to understand why the heck it erases previous data.

The first finding was that it just always ended up jumping into the "if map entry not found" branch, even though the map very clearly had the correct entry after the first package was added.

The second one was weird. I was trying to minimize the reproducer code (IMHO always a good idea) and switched target = "${suite}-${component}" to target = "lol". Two entries in the list, only one jump into the "map entry not found" branch. What?! 🧐

So this is clearly related to the fact that we're using String interpolation here. But hey, that's a totally normal thing to do, isn't it?!

Admittedly, at this point, I was lost. I knew what breaks, but not why.

Luckily, I knew exactly who to ask: Jens.

After a brief "well, that's interesting", Jens quickly found the source of our griefs: Double-quoted strings are plain java.lang.String if there’s no interpolated expression, but are groovy.lang.GString instances if interpolation is present.. And when we do repos[target] the GString target gets converted to a String, but when we use repos.containsKey() it remains a GString. This is because GStrings get converted to Strings, if the method wants one, but containsKey takes any Object while the repos[target] notation for some reason converts it. Maybe this is because using GString as Map keys should be avoided.

We can reproduce this with simpler code:

def map = [:]
def something = "something"
def key = "${something}"
map[key] = 1
println key.getClass()
map.keySet().each {println it.getClass() }
map.keySet().each {println it.equals(key)}
map.keySet().each {println it.equals(key as String)}

Which results in the following output:

class org.codehaus.groovy.runtime.GStringImpl
class java.lang.String
false
true

With that knowledge, the fix was to just use the same repos[target] notation also for checking for existence — Groovy helpfully returns null which is false-y when it can't find an entry in a Map absent.

So yeah, a String is not always a String, and it'll bite you!

19 November, 2021 02:16PM by evgeni

hackergotchi for Neil Williams

Neil Williams

git worktrees

A few scenarios have been problematic with git and I've now discovered git worktrees which help with each.

  • If you've wanted to compare multiple files in different branches of the same tree - without needing to commit on either side.
  • If you want to work on two (or more) versions of the same file at the same time, again without needing to commit.
  • You have a file or a bunch of files that aren't ready to be committed, even locally.
  • You are working on a development branch and an urgent fix is required on an old git tag.
  • You have a large git repository which is a burden to clone (or has complex submodules).

You could go to the trouble of making a new directory and re-cloning the same tree. However, a local commit in one tree is then not accessible to the other tree.

You could commit everything every time, but with a dirty tree, that involves sorting out the .gitignore rules as well. That could well be pointless with an experimental change.

Git worktrees allow multiple filesystems from a single git tree. Commits on any branch are visible from other branches, even when the commit was on a different worktree. This makes things like cherry-picking easy, without needing to push pointless changes or branches.

Branches on a worktree can be rebased as normal, with the benefit that commit hashes from other local changes are available for reference and cherry-picks.

I'm sure git worktrees are not new. However, I've only started using them recently and others have asked about how the worktree operates.

Creating a new tree can be done with a new or existing branch. To make it easier, set the new directory at the same time, usually in ../

New branch (branched from the current branch):

git worktree add -b branch_name ../branch_name

Existing branch - note, slightly different syntax here, specify the commit-ish last (branch name, tag or hash):

git worktree add ../branch_name branch_name
git worktree list
/home/neil/Documents/testing/testrepo        0612677 [master]
/home/neil/Documents/testing/testtree        d38f5a3 [testtree]

Use git worktree remove <name> to drop the entire directory for that tree and the git tracking.

I'm using this for work on the Debian Security Tracker. I have two local branches and having two worktrees allows me to have three terminals open, using the same files and the same git repository.

One to run make serve and update the local SQLite database. One to access master to run git pull One to make local changes without risking collisions on master.

git add data/CVE/list
git commit
# pre commit hook runs here
git log -n 1
# copy the hash
# switch to master terminal
git pull
git cherry-pick <HASH>
git push
# switch to server terminal
git rebase master
# no git pull or fetch, it's all local
make
# switch back to changes terminal
git rebase master

Sadly, one area where this isn't as easy is with importing a new DSC into Salsa with git-buildpackage as that uses several branches at the same time. It would be possible but you'll need to have a separate upstream and possibly pristine-tar branches and supply the relevant options. Possibly something git-buildpackage to adopt - it is common to need to make changes to the packaging with a new upstream release & a lot of those changes are currently done outside git.

For the rest of the support, see git worktree (1)

19 November, 2021 01:26PM by Neil Williams

hackergotchi for Bits from Debian

Bits from Debian

New Debian Developers and Maintainers (September and October 2021)

The following contributors got their Debian Developer accounts in the last two months:

  • Bastian Germann (bage)
  • Gürkan Myczko (tar)

The following contributors were added as Debian Maintainers in the last two months:

  • Clay Stan
  • Daniel Milde
  • David da Silva Polverari
  • Sunday Cletus Nkwuda
  • Ma Aiguo
  • Sakirnth Nagarasa

Congratulations!

19 November, 2021 12:00PM by Jean-Pierre Giraud

hackergotchi for Mike Gabriel

Mike Gabriel

Improbability of a million, lintian thinks...

An interesting mindset overcome by reality...

Also, lintian does not differentiate between between 100.000 and 1.000.000.

W: ayatana-indicator-display: improbable-bug-number-in-closes 1000143
N: 
N:   The most recent changelog closes a low-numbered bug number. While this is distantly possible, it's more likely a typo or
N:   a placeholder value that mistakenly wasn't filled in.
N: 
N:   Visibility: warning
N:   Show-Always: no
N:   Check: debian/changelog
N: 
N:

¯\_(ツ)_/¯

light+love
Mike

19 November, 2021 07:08AM by sunweaver

Reproducible Builds (diffoscope)

diffoscope 193 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 193. This version includes the following changes:

[ Chris Lamb ]
* Don't duplicate file lists at each directory level.
  (Closes: #989192, reproducible-builds/diffoscope#263)
* When pretty-printing JSON, mark the difference as such, additionally
  avoiding including the full path.
  (Closes: reproducible-builds/diffoscope#205)

* Codebase improvements:
  - Update a bunch of %-style string interpolations into f-strings or
    str.format.
  - Import itertools top-level directly.
  - Drop some unused imports.
  - Use isinstance(...) over type(...) ==
  - Avoid aliasing variables if we aren't going to use them.

[ Brandon Maier ]
* Fix missing diff output on large diffs.

[ Mattia Rizzolo ]
* Ignore a Python warning coming from a dependent library (triggered by
  supporting Python 3.10)
* Document that support both Python 3.9 and 3.10.

You find out more by visiting the project homepage.

19 November, 2021 12:00AM

November 17, 2021

hackergotchi for Rapha&#235;l Hertzog

Raphaël Hertzog

Freexian’s report about Debian Long Term Support, October 2021

A Debian LTS logo

Every month we review the work funded by Freexian’s Debian LTS offering. Please find the report for October below.

Debian project funding

  • Our project funding work continues with an active bid on the work of packaging gradle in Debian. The next steps are reviewing the bid and formal approval.
  • In October 2,475 EUR was put aside to fund Debian projects.

We’re looking forward to receiving more projects from various Debian teams! Learn more about the rationale behind this initiative in this article.

Debian LTS contributors

In October 12 contributors were paid to work on Debian LTS, their reports are available below.

  • Adrian Bunk did 40.5h in October (out of 28.5h assigned and 18h remaining, thus keeping 6h for November).
  • Anton Gladky did 12h (out of 12h assigned).
  • Ben Hutchings did 14.75h in October (out of 2h assigned and 28h remaining, thus keeping 15.25h for November).
  • Chris Lamb did 18h (out of 18h assigned).
  • Holger Levsen did 1h (out of 12h assigned, but gave back the remaining 11h).
  • Jeremiah Foster worked 20h (out of 20h assigned and 10h remaining, thus keeping 10h for November).
  • Markus Koschany did 28.5h (out of 28.5h assigned).
  • Ola Lundqvist did 5h (out of 5h assigned).
  • Roberto C. Sánchez did 28.5h (out of 28.5h assigned).
  • Sylvain Beucler did 23.5h (out of 28.5h assigned, but gave back the remaining 5h).
  • Thorsten Alteholz did 28.5h (out of 28.5h assigned).
  • Utkarsh Gupta did 28.5h (out of 28.5h assigned).

Evolution of the situation

In October we released 34 DLAs.

Also, we would like to remark once again that we are constantly looking for new contributors. Please contact Jeremiah if you are interested!

The security tracker currently lists 37 packages with a known CVE and the dla-needed.txt file has 22 packages needing an update.

Thanks to our sponsors

Sponsors that joined recently are in bold.

17 November, 2021 04:51PM by Raphaël Hertzog

hackergotchi for Christoph Berg

Christoph Berg

PostgreSQL and Undelete

pg_dirtyread

Earlier this week, I updated pg_dirtyread to work with PostgreSQL 14. pg_dirtyread is a PostgreSQL extension that allows reading "dead" rows from tables, i.e. rows that have already been deleted, or updated. Of course that works only if the table has not been cleaned-up yet by a VACUUM command or autovacuum, which is PostgreSQL's garbage collection machinery.

Here's an example of pg_dirtyread in action:

# create table foo (id int, t text);
CREATE TABLE
# insert into foo values (1, 'Doc1');
INSERT 0 1
# insert into foo values (2, 'Doc2');
INSERT 0 1
# insert into foo values (3, 'Doc3');
INSERT 0 1

# select * from foo;
 id │  t
────┼──────
  1 │ Doc1
  2 │ Doc2
  3 │ Doc3
(3 rows)

# delete from foo where id < 3;
DELETE 2

# select * from foo;
 id │  t
────┼──────
  3 │ Doc3
(1 row)

Oops! The first two documents have disappeared.

Now let's use pg_dirtyread to look at the table:

# create extension pg_dirtyread;
CREATE EXTENSION

# select * from pg_dirtyread('foo') t(id int, t text);
 id │  t
────┼──────
  1 │ Doc1
  2 │ Doc2
  3 │ Doc3

All three documents are still there, but only one of them is visible.

pg_dirtyread can also show PostgreSQL's system colums with the row location and visibility information. For the first two documents, xmax is set, which means the row has been deleted:

# select * from pg_dirtyread('foo') t(ctid tid, xmin xid, xmax xid, id int, t text);
 ctid  │ xmin │ xmax │ id │  t
───────┼──────┼──────┼────┼──────
 (0,1) │ 1577 │ 1580 │  1 │ Doc1
 (0,2) │ 1578 │ 1580 │  2 │ Doc2
 (0,3) │ 1579 │    0 │  3 │ Doc3
(3 rows)

Undelete

Caveat: I'm not promising any of the ideas quoted below will actually work in practice. There are a few caveats and a good portion of intricate knowledge about the PostgreSQL internals might be required to succeed properly. Consider consulting your favorite PostgreSQL support channel for advice if you need to recover data on any production system. Don't try this at work.

I always had plans to extend pg_dirtyread to include some "undelete" command to make deleted rows reappear, but never got around to trying that. But rows can already be restored by using the output of pg_dirtyread itself:

# insert into foo select * from pg_dirtyread('foo') t(id int, t text) where id = 1;

This is not a true "undelete", though - it just inserts new rows from the data read from the table.

pg_surgery

Enter pg_surgery, which is a new PostgreSQL extension supplied with PostgreSQL 14. It contains two functions to "perform surgery on a damaged relation". As a side-effect, they can also make delete tuples reappear.

As I discovered now, one of the functions, heap_force_freeze(), works nicely with pg_dirtyread. It takes a list of ctids (row locations) that it marks "frozen", but at the same time as "not deleted".

Let's apply it to our test table, using the ctids that pg_dirtyread can read:

# create extension pg_surgery;
CREATE EXTENSION

# select heap_force_freeze('foo', array_agg(ctid))
    from pg_dirtyread('foo') t(ctid tid, xmin xid, xmax xid, id int, t text) where id = 1;
 heap_force_freeze
───────────────────

(1 row)

Et voilà, our deleted document is back:

# select * from foo;
 id │  t
────┼──────
  1 │ Doc1
  3 │ Doc3
(2 rows)

# select * from pg_dirtyread('foo') t(ctid tid, xmin xid, xmax xid, id int, t text);
 ctid  │ xmin │ xmax │ id │  t
───────┼──────┼──────┼────┼──────
 (0,1) │    2 │    0 │  1 │ Doc1
 (0,2) │ 1578 │ 1580 │  2 │ Doc2
 (0,3) │ 1579 │    0 │  3 │ Doc3
(3 rows)

Disclaimer

Most importantly, none of the above methods will work if the data you just deleted has already been purged by VACUUM or autovacuum. These actively zero out reclaimed space. Restore from backup to get your data back.

Since both pg_dirtyread and pg_surgery operate outside the normal PostgreSQL MVCC machinery, it's easy to create corrupt data using them. This includes duplicated rows, duplicated primary key values, indexes being out of sync with tables, broken foreign key constraints, and others. You have been warned.

pg_dirtyread does not work (yet) if the deleted rows contain any toasted values. Possible other approaches include using pageinspect and pg_filedump to retrieve the ctids of deleted rows.

Please make sure you have working backups and don't need any of the above.

17 November, 2021 03:46PM

November 15, 2021

Vincent Bernat

Git as a source of truth for network automation

The first step when automating a network is to build the source of truth. A source of truth is a repository of data that provides the intended state: the list of devices, the IP addresses, the network protocols settings, the time servers, etc. A popular choice is NetBox. Its documentation highlights its usage as a source of truth:

NetBox intends to represent the desired state of a network versus its operational state. As such, automated import of live network state is strongly discouraged. All data created in NetBox should first be vetted by a human to ensure its integrity. NetBox can then be used to populate monitoring and provisioning systems with a high degree of confidence.

When introducing Jerikan, a common feedback we got was: “you should use NetBox for this.” Indeed, Jerikan’s source of truth is a bunch of YAML files versioned with Git.

Why Git?

If we look at how things are done with servers and services, in a datacenter or in the cloud, we are likely to find users of Terraform, a tool turning declarative configuration files into infrastructure. Declarative configuration management tools like Salt, Puppet,1 or Ansible take care of server configuration. NixOS is an alternative: it combines package management and configuration management with a functional language to build virtual machines and containers. When using a Kubernetes cluster, people use Kustomize or Helm, two other declarative configuration management tools. Tapped together, these tools implement the infrastructure as code paradigm.

Infrastructure as code is an approach to infrastructure automation based on practices from software development. It emphasizes consistent, repeatable routines for provisioning and changing systems and their configuration. You make changes to code, then use automation to test and apply those changes to your systems.

― Kief Morris, Infrastructure as Code, O’Reilly.

A version control system is a central tool for infrastructure as code. The usual candidate is Git with a source code management system like GitLab or GitHub. You get:

Traceability and visibility
Git keeps a log of all changes: what, who, why, and when. With a bit of discipline, each change is explained and self-contained. It becomes part of the infrastructure documentation. When the support team complains about a degraded experience for some customers over the last two months or so, you quickly discover this may be related to a change to an incoming policy in New York.
Rolling back
If a change is defective, it can be reverted quickly, safely, and without much effort, even if other changes happened in the meantime. The policy change at the origin of the problem spanned over three routers. Reverting this specific change and deploying the configuration let you solve the situation until you find a better fix.
Branching, reviewing, merging
When working on a new feature or refactoring some part of the infrastructure, a team member creates a branch and works on their change without interfering with the work of other members. Once the branch is ready, a pull request is created and the change is ready to be reviewed by the other team members before merging. You discover the issue was related to diverting traffic through an IX where one ISP was connected without enough capacity. You propose and discuss a fix that includes a change of the schema and the templates used to declare policies to be able to handle this case.
Continuous integration
For each change, automated tests are triggered. They can detect problems and give more details on the effect of a change. Branches can be deployed to a test infrastructure where regression tests are executed. The results can be synthesized as a comment in the pull request to help the review. You check your proposed change does not modify the other existing policies.

Why not NetBox?

NetBox does not share these features. It is a database with a REST and a GraphQL API. Traceability is limited: changes are not grouped into a transaction and they are not documented. You cannot fork the database. Usually, there is one staging database to test modifications before applying them to the production database. It does not scale well and reviews are difficult. Applying the same change to the production database can be hazardous. Rolling back a change is non-trivial.

Update (2021-11)

Nautobot, a fork of NetBox, will soon address this point by using Dolt, an SQL database engine allowing you to clone, branch, and merge, like a Git repository. Dolt is compatible with MySQL clients. See “Nautobots, Roll Back!” for a preview of this feature.

Moreover, NetBox is not usually the single source of truth. It contains your hardware inventory, the IP addresses, and some topology information. However, this is not the place you put authorized SSH keys, syslog servers, or the BGP configuration. If you also use Ansible, this information ends in its inventory. The source of truth is therefore fragmented between several tools with different workflows. Since NetBox 2.7, you can append additional data with configuration contexts. This mitigates this point. The data is arranged hierarchically but the hierarchy cannot be customized.2 Nautobot can manage configuration contexts in a Git repository, while still allowing to use of the API to fetch them. You get some additional perks, thanks to Git, but the remaining data is still in a database with a different lifecycle.

Lastly, the schema used by NetBox may not fit your needs and you cannot tweak it. For example, you may have a rule to compute the IPv6 address from the IPv4 address for dual-stack interfaces. Such a relationship cannot be easily expressed and enforced in NetBox. When changing the IPv4 address, you may forget the IPv6 address. The source of truth should only contain the IPv4 address but you also want the IPv6 address in NetBox because this is your IPAM and you need it to update your DNS entries.

Why not Git?

There are some limitations when putting your source of truth in Git:

  1. If you want to expose a web interface to allow an external team to request a change, it is more difficult to do it with Git than with a database. Out-of-the-box, NetBox provides a nice web interface and a permission system. You can also write your own web interface and interact with NetBox through its API.
  2. YAML files are more difficult to query in different ways. For example, looking for a free IP address is complex if they are scattered in multiple places.

In my opinion, in most cases, you are better off putting the source of truth in Git instead of NetBox. You get a lot of perks by doing that and you can still use NetBox as a read-only view, usable by other tools. We do that with an Ansible module. In the remaining cases, Git could still fit the bill. Read-only access control can be done through submodules. Pull requests can restrict write access: a bot can check the changes only modify allowed files before auto-merging. This still requires some Git knowledge, but many teams are now comfortable using Git, thanks to its ubiquity.


  1. Wikimedia manages its infrastructure with Puppet. They publish everything on GitHub. Creative Commons uses Salt. They also publish everything on GitHub. Thanks to them for doing that! I wish I could provide more real-life examples. ↩︎

  2. Being able to customize the hierarchy is key to avoiding repetition in the data. For example, if switches are paired together, some data should be attached to them as a group and not duplicated on each of them. Tags can be used to partially work around this issue but you lose the hierarchical aspect. ↩︎

15 November, 2021 07:53AM by Vincent Bernat

November 14, 2021

Russ Allbery

Review: The Last Graduate

Review: The Last Graduate, by Naomi Novik

Series: The Scholomance #2
Publisher: Del Rey
Copyright: 2021
ISBN: 0-593-12887-7
Format: Kindle
Pages: 388

This is a direct sequel to A Deadly Education, by which I mean it starts in the same minute at which A Deadly Education ends (and let me say how grateful I am for a sequel that doesn't drop days, months, or years between books). You do not want to read this series out of order.

This book is also very difficult to review without spoiling either it or the previous book, so please bear with me if I'm elliptical in my ravings. Because The Last Graduate is so good. So good, not only as a piece of writing, but as a combination of two of my favorite tropes in fiction, one of which I can't talk about because of spoilers. I adored this book in a way that is not entirely rational.

I will attempt a review below anyway, but if you liked the first book, just stop reading here and go read the second one. It's more of everything I loved in the first book except even better, it did some things I was expecting and some things I didn't expect at all, and it's just so ridiculously good. Just be aware that it has another final-line cliffhanger. The third book is coming in (hopefully) 2022.

Novik handles the cliffhanger at the end of the previous book beautifully, which is worth noting because there were so many ways in which it could have gone poorly. One of the best things about this series is Novik's skill at writing El's relationship with her mother, even though her mother has not appeared in the series so far. El argues with her mother's voice in her head, tells stories about her, wonders what her mother would think of her classmates (or in some cases knows exactly what her mother would think of her classmates), and sometimes makes the explicit decision to not be her mother. The relationship has the sort of messy complexity, shared history, and underlying respect that many people experience in life but that I've rarely seen portrayed this well in a fantasy novel.

Novik's presentation of that relationship works because El's voice is so strong. Within fifteen minutes of starting The Last Graduate, I was already muttering "I love this book" to myself, mostly because of how much I enjoy El's sarcastic, self-deprecating internal commentary. Novik strikes a balance between self-awareness, snark, humor, and real character growth that rivals Murderbot in its effectiveness of first-person perspective. It carries the story over a few weak points, such as a romance that didn't do much for me. Even when I didn't care about part of the plot, I cared about El's opinion of the plot and what it said about El's growing understanding of how to navigate the world.

A Deadly Education was scene and character establishment. El insisted on being herself and following her own morals and social rules, and through that found some allies. The Last Graduate gives El enough breathing space to make more nuanced decisions. This is the part of growing up where one realizes the limitations of one's knee-jerk reactions and innate moral judgment. It's also when it becomes hard to trust success that is entirely outside of one's previous experience. El was not a kid who had friends, so she doesn't know what to do with them now that she has them. She's barely able to convince herself that they are friends.

This is one of the two fictional tropes I mentioned, the one that I can talk about (at least briefly) without major spoilers. I have such a soft spot for stubborn, sarcastic, principled characters who refuse to play by the social rules that they think are required to make friends and who then find friends who like them for themselves. The moment when they start realizing this has happened and have no idea how to deal with it or how to be a person who has friends is one I will happily read over and over again. I enjoyed this book from the beginning, but there were two points when it grabbed my heart and I was all in. The first one is a huge spoiler that I can't talk about. The second was this paragraph:

[She] came round to me and put her arm around my waist and said under her breath, "Hey, she can be taught," with a tease in her voice that wobbled a little, and when I looked at her, her eyes were bright and wet, and I put my arm around her shoulders and hugged her.

You'll know it when you get there.

The Last Graduate also gives the characters other than El and Orion more room, which is part of how it handles the chosen one trope. It's been obvious since early in the first book that Orion is a sort of chosen one, and it becomes obvious to the reader that El may be as well. But Novik doesn't let the plot focus only on them; instead, she uses that trope to look at how alliances and collective action happen, and how no one can carry the weight by themselves. As El learns more and gains power, she also becomes less central to the plot resolution and has to learn how to be less self-reliant. This is not a book where one character is trained to save the world. It's a book where she manages to enlist the support of a kick-ass project manager and becomes part of a team.

Middle books of a trilogy are notoriously challenging. Often they're travel books: the first book sets up a problem, the second book moves the characters both physically and emotionally into a position to solve the problem, and the third book is the payoff. Travel books often sag. They can feel obligatory but somewhat boring, like a chore on the way to the third-book climax. The Last Graduate is not a travel book; it is, instead, a pivot book, which is my favorite form of trilogy. It's a book that rewrites the problem the first book set up, both resolving it and expanding the scope beyond what the reader had expected. This is immensely satisfying when done well, and Novik does it extremely well.

This is not a flawless book. There are some pacing hiccups, there is a romance angle that didn't work for me (although it does arrive at some character insights that I thought were spot on), and although I think Novik is doing something interesting with the trope, there is a lot of chosen one power escalation happening here. It's not the sort of book that I can claim is perfectly written. Instead, it's the sort of book that uses some of my favorite plot elements and emotional beats in such an effective way and with such a memorable character that I do not have it in me to care about any of the flaws. Your mileage may therefore vary, but I would be happy to read books like this until the end of time.

As mentioned above, The Last Graduate ends on another cliffhanger. This time I was worried that Novik might have ended the series there, since there's enough of an internal climax that I could imagine some literary fiction (which often seems allergic to endings) would have stopped here. Thankfully, Novik's web site says this is not the case. The next year is going to be a difficult wait.

The third book of this series is going to be incredibly difficult to write, and I hope Novik is up to the challenge she's made for herself. But she handled the transition between the first and second book so well, and this book is so good that I have a lot of hope. If the third book is half as good as I'm hoping, this is going to be one of my favorite fantasy series of all time.

Followed by an as-yet-untitled third book.

Rating: 10 out of 10

14 November, 2021 04:49AM

Ruby Team

Ruby transition and packaging hints #2 - Gemfile.lock created by bundler/setup with Ruby 2.7 preventing successful test with Ruby 3.0

We currently face an issue in all packages requiring bunlder/setup and trying to run the tests for Ruby 2.7 and 3.0. The problem is that the first tests will create Gemfile.lock (or gemfile/gemfile-*.lock) using Ruby 2.7 and the next run for Ruby 3 will report e.g.:

Failure/Error: require 'bundler/setup' # Set up gems listed in the Gemfile.

Bundler::GemNotFound:
  Could not find racc-1.4.16 in any of the sources

or

/usr/share/rubygems-integration/all/gems/bundler-2.2.27/lib/bundler/definition.rb:496:in `materialize':
  Could not find rexml-3.2.3.1 in any of the sources (Bundler::GemNotFound)

Both bugs #996207 and #996302 are incarnations of this issue. The fix is as easy as making sure that the .lock files are removed before each run. This can be done in e.g. debian/ruby-tests.rake as very first task:

File.delete("Gemfile.lock") if File.exist?("Gemfile.lock")

In another case the .lock file is created by the tests in gemfiles/. While the first examples could actually be solved by gem2deb removing Gemfile.lock on its own, I’m not quite sure how to handle the last case using packaging tools.

The interesting part is that we will unlikely be confronted with this issue anytime soon again. It seems very specific to the Ruby 3.0 transition.

Update

After talking to Antonio he added some code to gem2deb-test-runner to moving Gemfile.lock files out of the way. The tool already did this in an autopkgtest environment. In the upcoming 1.7 release it will do it in general and this will fix some more FTBFSes, e.g. #998497 and #996141 - originally reported against ruby-voight-kampff and ruby-bootsnap.

14 November, 2021 03:25AM by Daniel Leidert ([email protected])

November 13, 2021

Ruby transition and packaging hints #1 - Adjusting Ruby version in commands

This is the first part of a series of short posts about issues that came up during the Ruby 3.0 transition and how to fix them. Hopefully more team members will join in and add their input.

During the Ruby 3.0 transition there are essentially two different Ruby versions with two different binaries available, /usr/bin/ruby2.7 and /usr/bin/ruby3.0, while /usr/bin/ruby points to the current default version, which is Ruby 2.7.

In some cases the tests shipped by the source packages will use shell commands to run scripts or Ruby code. It is imparative that in these cases the Ruby executable is not invoked by /usr/bin/ruby or ruby, because this will point to Ruby 2.7 only and fail if the tests are invoked with Ruby version 3.

The fix is to rely on RbConfig.ruby which will point to the absolute pathname of the ruby command for the current Ruby environment, e.g.

cmd = "#{RbConfig.ruby} ..."

This issue appeard for example in ruby-byebug and ruby-backports.

13 November, 2021 08:24PM by Daniel Leidert ([email protected])