A Week In Openverse: August 16-20th

#openverse, #week-in-openverse

Multi-stage Docker builds in the Openverse API

Prologue

If you’ve seen the design mockups for the audio integration, you must have seen the absolutely beautiful waveforms that are a unique visual feature of the audio player. These waveforms function as a seek bar, a progress bar and as a visual representation of the audio.

These waveforms need to be computed from the audio file. This computation can be done on the client either automatically (when using libraries like wavesurfer.js) or manually (using a waveform generation library like waveform-data.js). Since our media is loaded from third-party sites without that don’t support CORS, libraries built on top of client-side WebAudio cannot read them.

Our alternative was to compute these waveforms on the server. The library audiowaveform by the BBC is an excellent library written in C++ that can do a very fast computation of the waveform on the server. As a proof of concept, we wanted to write an APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. to generate the waveform for any audio object in the API database on demand. But how do we run the audiowaveform library on cue from the Django endpoint?

Approach 1. Install from the PPA

The library is hosted on a PPA for Ubuntu from where it can be installed directly. So the natural approach was to try and load it from there. Turns out adding an Ubuntu PPA on Debian is pretty convoluted as is, made even worse by the fact that we’re doing this in a container with limited binaries and a constraint on the size.

However, after jumping through several hoops like manually setting up the signing key and mapping the Ubuntu repository to the Debian version, we get to the install step. And the installation step complains about missing dependencies on several Boost libraries.

(ノಠ益ಠ)ノ彡┻━┻

Approach 2: Using the container

The library is also available as a container on the Docker Hub. This option was looking very lucrative after the PPA debacle. Deriving from the container to make a separate service was a good option but for a concept test, setting up an entire server seemed like a lot of overhead, it was more scaffolding than the actual project.

The audiowaveform library had to be installed into the OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. API container itself.

Multistage Docker builds

Multistage Docker builds allow us to create a Docker image which consists of several tiers. A tier can compile and build the application, which can then by copied as an artefact into the subsequent tiers. Since only the final level goes into the image, with none of the bulk of tooling, the image remains quite lean.

Tier 1

The realies/audiowaveform container has the audiowaveform binary located at /usr/local/bin/audiowaveform. It is statically linked with a number of libraries. These linkages can be examined by invoking the ldd command with the binary as the argument. Here is the output of the ldd command.

/ # ldd /usr/local/bin/audiowaveform

	/lib/ld-musl-x86_64.so.1 (0x4000000000)
	libsndfile.so.1 => /usr/lib/libsndfile.so.1 (0x4001903000)
	libgd.so.3 => /usr/lib/libgd.so.3 (0x400196d000)
	libmad.so.0 => /usr/lib/libmad.so.0 (0x40019c7000)
	libid3tag.so.0 => /usr/lib/libid3tag.so.0 (0x40019e8000)
	libboost_program_options.so.1.72.0 => /usr/lib/libboost_program_options.so.1.72.0 (0x4001a01000)
	libboost_filesystem.so.1.72.0 => /usr/lib/libboost_filesystem.so.1.72.0 (0x4001a90000)
	libboost_regex.so.1.72.0 => /usr/lib/libboost_regex.so.1.72.0 (0x4001aaf000)
	libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x4001bb0000)
	libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x4001d49000)
	libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x4000000000)
	libFLAC.so.8 => /usr/lib/libFLAC.so.8 (0x4001d5d000)
	libogg.so.0 => /usr/lib/libogg.so.0 (0x4001d91000)
	libvorbis.so.0 => /usr/lib/libvorbis.so.0 (0x4001d9b000)
	libvorbisenc.so.2 => /usr/lib/libvorbisenc.so.2 (0x4001dc3000)
	libpng16.so.16 => /usr/lib/libpng16.so.16 (0x4001e6d000)
	libz.so.1 => /lib/libz.so.1 (0x4001e9d000)
	libfreetype.so.6 => /usr/lib/libfreetype.so.6 (0x4001eb7000)
	libjpeg.so.8 => /usr/lib/libjpeg.so.8 (0x4001f6c000)
	libwebp.so.7 => /usr/lib/libwebp.so.7 (0x4001fe6000)
	libicui18n.so.67 => /usr/lib/libicui18n.so.67 (0x400203c000)
	libicuuc.so.67 => /usr/lib/libicuuc.so.67 (0x40022c7000)
	libbz2.so.1 => /usr/lib/libbz2.so.1 (0x400246c000)
	libbrotlidec.so.1 => /usr/lib/libbrotlidec.so.1 (0x400247b000)
	libicudata.so.67 => /usr/lib/libicudata.so.67 (0x4002487000)
	libbrotlicommon.so.1 => /usr/lib/libbrotlicommon.so.1 (0x4003f9e000)

For our image, the idea is to copy the audiowaveform binary (and all its dependencies) from this container into our Openverse API container.

FROM realies/audiowaveform:latest AS awf

RUN ldd /usr/local/bin/audiowaveform | tr -s '[:blank:]' '\n' | grep '^/' | \
    xargs -I % sh -c 'mkdir -p $(dirname deps%); cp % deps%;'

These two steps create the realies/audiowaveform:latest image (giving it a label awf, this will be handy later) and run ldd to identify all its linked dependencies.

Refer to the output of ldd above. Our RUN command parses this output as follows:

  • break line at every whitespace by replacing blanks with newlines (tr -s '[:blank:]' '\n')
  • pick the lines that start with a slash (grep '^/')
  • create subshells (sh -c) that, for every line of output represented by ‘%’ (xargs -I %)
    • replicate the folder structure (mkdir -p $(dirname deps%)
    • copy the file from its location to the same path under deps/ (cp % deps%)

That leaves us with the following folder structure inside deps/.

/deps # tree

.
├── lib
│   ├── ld-musl-x86_64.so.1
│   └── libz.so.1
└── usr
    └── lib
        ├── libFLAC.so.8
        ├── libboost_filesystem.so.1.72.0
        ├── libboost_program_options.so.1.72.0
        ├── libboost_regex.so.1.72.0
        ├── libbrotlicommon.so.1
        ├── libbrotlidec.so.1
        ├── libbz2.so.1
        ├── libfreetype.so.6
        ├── libgcc_s.so.1
        ├── libgd.so.3
        ├── libicudata.so.67
        ├── libicui18n.so.67
        ├── libicuuc.so.67
        ├── libid3tag.so.0
        ├── libjpeg.so.8
        ├── libmad.so.0
        ├── libogg.so.0
        ├── libpng16.so.16
        ├── libsndfile.so.1
        ├── libstdc++.so.6
        ├── libvorbis.so.0
        ├── libvorbisenc.so.2
        └── libwebp.so.7

Tier 2

We then need to copy both the audiowaveform binary and its dependencies over to our container.

COPY --from=awf /deps /
COPY --from=awf /usr/local/bin/audiowaveform /usr/local/bin

These two COPY commands will do just that. The --from flag specifies to Docker that we want to copy the assets not from the host file system but rather an image with the given label. Copying /deps into the root directory, automatically writes the /lib/ and /usr/lib/ files to their correct places in the image.

That’s it, now we have audiowaveform running inside the Openverse API container and we can invoke it from the Django application using the Python subprocess module. Maybe that’s a separate post for later.

References

#docker, #openverse, #openverse-api

A Week in Openverse: August 9th-13th

Welcome to the first of many A Week in OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. posts. In this weekly series, we’ll highlight work completed through the week by Openverse contributors. This will always include a changelog of closed issues per-repository, and may occasionally contain deeper dives into new functionality and features.

openverse

openverse-frontend

openverse-api

openverse-catalog

#openverse, #week-in-openverse

Introducing the Openverse Development Weekly Chat

OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. will begin hosting a weekly chat on the #openverse slackSlack Slack is a Collaborative Group Chat Platform https://slack.com/. The WordPress community has its own Slack Channel at https://make.wordpress.org/chat/. channel to discuss ongoing development and issues. There is a community editable rolling agenda where anyone can add topics of discussion.

Outside of agenda items, we will discuss open issues, triage + label new issues, and review open pull requests.

This is a great opportunity to learn about ongoing work within the project and start making your own contributions to the Openverse! We hope to see you there.

Welcome to Openverse

This is the Make site for the OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. project, a search engine for openly licensed media. We currently index over 500 million Creative Commons licensed images and will be adding audio and video support in the near future.

Please be patient as we setup the project, processes, and procedures.
Also, join us in SlackSlack Slack is a Collaborative Group Chat Platform https://slack.com/. The WordPress community has its own Slack Channel at https://make.wordpress.org/chat/. on the #openverse channel for discussions of the project!

You can see public announcements of the project in a few places:

Still here? Visit these quick links to learn more and contribute to the project: