Yamanote: A software development and deployment system

I left Mozilla back in July, 2018. There are many reasons for this decision, and I’ll talk about just one here: I decided to Help People Get Jobs. The following text is from a blog post I wrote at work. I have reposted it here, edited for length and content (Internal Indeed systems are not referenced.)

At Indeed, we now use a software development and deployment system called “Yamanote.” Yamanote takes its name from the Yamanote Line (山手線) in Tokyo, Japan. It is one of Tokyo’s busiest and most important lines, connecting most of Tokyo’s major stations and urban centers. The Yamanote line is a continuous railway loop. Trains which run clockwise are known as sotomawari (外回り, “outer circle”) and those counter-clockwise as uchi-mawari (内回り, “inner circle”). We deploy software on two lines as well: QA and PROD. Just like the Yamanote line, our goal is for our software trains to run reliably, safely, and securely, in a continuous ring of green.

The level of service and quality required to operate the Yamanote train line is what we aspire to and apply to our software development. We continuously improve our tools and processes. It is expected that code merging into any of our deployment branches is tested and ready for public use.

Our business (We Help People Get Jobs) is a very long-term concern (think in decades!) and we expect some of our software to have a service-life that can span many years. We optimize for lasting impact. We operate quickly and carefully.

The Yamanote line is a marvel of technology, with incredible innovations in electrical engineering and automated systems. Despite all of the technology used, the most important factors in the line’s reliability are the people who operate the system:

Every staff member on the line has the authority to stop and start the trains with a wave of their white glove. It’s this level of responsibility and care that makes all the difference. We aspire to bring that “white glove” treatment into our software development discipline, as the millions of people who depend on us also need to get to work every single day.

Thanks to Wikipedia for the Yamanote line history facts. I’m working on releasing the Yamanote tools as Open Source in a future post.

Turning a Corner in the New Year

I’m going to start the year off with a blog post, mostly to procrastinate on replying to the many e-mails that my very productive colleagues have sent my way 🙂

2017 was quite a year beyond the socio-economic, geo-political, and bizarre. I, and many of my colleagues did what we could: find solace in work. I’ve often found that in uncertain times, making forward progress on difficult technical projects provides just enough incentive to continue for a bit longer. With the successful release of Firefox 57, I’m again optimistic about the future for the technical work. The Firefox Layout Engine team has a lot to be proud of in the 57 version. The winning combination was shipping big-ticket investments, and grinding down on many very difficult bugs. Plan “A” all the way!

Of course, it’s easy to see this in retrospect. I recall many late nights wondering “is this even going to work?” My friend and former colleague Robert O’Callahan recently revealed that he had similar doubts from before my time on the project. I wonder how much of that is inherent in the work. Is the Mozilla mission also the same narrative that fills me and other leaders with the sense that we’re always in rescue mode? Is that a sustainable lifestyle? In any case, it does feel like we’ve turned a corner.

For the first time in a long time, I feel like the wind’s at our backs and about to go into 2018 with that momentum. It would be hubris on my part to say that we’ve figured it all out. 2018’s uncertainties (e.g, Spectre/Meltdown) promise more late nights ahead. We’ve got lots of things in flight, and more ideas to investigate. We’ll need to make changes to deal with it all, but that’s par for the coming year’s course.

Happy New Year!

Why bother building a Web Layout Engine?

All of Mozilla is currently in San Francisco for the semi-annual All-Hands event. The Web Platform Layout team also meets here to discuss current work and future projects. Our team is responsible for the following browser rendering operations:

  1. Compute Style
  2. Size
  3. Position
  4. Animate
  5. Paint

…in 2D, 3D, and Virtual Reality.

Because we typically operate 7 days a week, 24 hours a day across the globe, it’s a rare opportunity to meet with everyone to share what we’ve been working on, and where we’re going next. We took turns doing Lightning Talks with each team member taking 2 to 5 minutes to share:

  • A brief introduction.
  • What you worked on since the last All-Hands.
  • What you plan to work on next.
  • What you want to learn while you’re here.

This was a “behind the scenes” chat about the guts of the Layout Engine, as illustrated in this image:

I’ve been thinking about the internals of our Web Layout Engines and how it’s a lot like the internal movement of a mechanical wristwatch. I shared the following story about the Swiss watch industry before our meeting started. I think the similarities with the web browser business are interesting.

Swiss watches are often regarded as the best of the best in haute horology. The long history of handcrafted manufacturing and very successful marketing around a watch’s provenance have made for a multi-billion dollar industry.

The most complex and expensive part of a wristwatch is its movement: the machine that rotates the watch hands and other complications. In that sense, a watch movement is not unlike a web browser’s layout engine, which controls the display of a browser’s content.

Within the wristwatch industry, there’s a debate about the differences between in-house movements (made by the watch vendor) and outsourced movements. ETA is a relatively obscure Swiss company that manufactures watch movements for other watch brands. There’s a long list of watch vendors that source their movements from ETA. This makes for what seems to be a great business plan: outsource the most expensive components, design the external case/dial appearance, add effective marketing, and you’re in the high-margin Swiss Watch business.

Why bother building your own watch movements, when you can use a cheaper one off the shelf? Why bother building your own layout engine, when you can use one of several “free” alternatives?

Here’s one very good reason why: In 2003 the Swiss Government launched an investigation into ETA after they announced that they would stop supplying watch movements to other companies after being acquired by The Swatch Group. The same Swatch Group that sells low-cost plastic watches at shopping malls around the world also owns several luxury watch brands. What do you do when the giant corporation that builds your watch movements decides to stop supplying? What happens when the giant corporation that builds your layout engine decides to unfairly favor its own products?

The level of skill and effort required to design and build a watch movement is an order of magnitude beyond what’s required for the outer components. There are a number of companies that still manufacture in-house movements: Patek Philippe, Rolex, and Seiko are 3 examples. I personally wear a Seiko that I bought for less than $100. My preference for in-house movements is not about price, luxury, or exclusivity. I believe there’s great value in the independent creativity and craftsmanship inherent in building the whole widget. I also love pushing the state of the art and not being limited by what others decide that we can offer. Food for thought as we work on the layout engine this week.

Lighting Fires under Bugs

It’s been way too long since I posted on this blog. It seems I’ve fallen into the 140-character vortex like so many bloggers. I will endeavor to work on that in 2017.

I’m often asked “What’s up with this bug?” or some variation. This is often the case when a reported software defect is stuck in a status other than “fixed.” I then have to get in there, figure out what’s stuck, and somehow get it unstuck.

I’m almost never the first responder. By the time I’m called in, people smarter than me have already been looking into the problem, and my job is to light a fire under the bug. That is, there may already be enough information known to get a fix, it just needs the right spark.

Sometimes, that job requires that I find an appropriate person to assign the bug to. I try to add some new value (other than point a finger) when asked to light these fires under bugs. In some cases, I hack together a feeble fix, then a “real engineer” comes in and fixes it up for production. In rare cases, my code gets checked into the trunk.

I feel like this is one of the most important jobs for the Engineering Manager, but I haven’t read much material on the subject. It’s not a task where I get to display superior technical acumen, or simply throw managerial weight around. It requires patience, thought, and a bit of luck–much like lighting a campfire in the rain.

Setting up for Android and Firefox OS Development

This post is a follow-up to an earlier article I wrote about setting up a FirefoxOS development environment.

I’m going to set up a Sony Z3C as the target device for Mobile OS software development. The Sony Z3C (also known as Aries or aosp_d5803 ) is a nice device for Mobile OS hacking as it’s an AOSP device with good support for building the OS binaries. I’ve set the phone up for both FirefoxOS and Android OS development, to compare and see what’s common across both environments.

Please note that if you got your Sony Z3C from the Mozilla Foxfooding program, then this article isn’t for you. Those phones are already flashed and automatically updated with specific FirefoxOS builds that Mozilla staff selected for your testing. Please don’t replace those builds unless you’re actively developing for these phones and have a device set aside for that purpose.

My development host is a Mac (OSX 10.10) laptop already set up to build the Firefox for Macintosh product. It’s also set up to build the Firefox OS binaries for the Flame device.

Most of the development environment for the Flame is also used for the Aries device. In particular, the case-sensitive disk partition is required for both FirefoxOS and Android OS development. You’ll want this partition to be at least 100GB in size if you want to build both operating systems. Set this up before downloading FirefoxOS or Android souce code to avoid ‘include file not found’ errors.

The next step to developing OS code for the Aries is to root the device. This will void your warranty, so tread carefully.

For most Gecko and Gaia developers, you’ll want to start from the base image for the Aries. The easiest way to flash your device with a known-good FirefoxOS build is to run flash.sh in the expanded aries.zip file from the official builds. You can then flash the phone with just Gecko or Gaia from your local source code.

The Aries binaries from a FirefoxOS build:

aries_firefoxos_images

The Aries binaries in an Android Lollipop build:

aries_android_images

If you want to build Android OS for the Aries, then read these docs from Sony, and these Mac-specific steps for building Android Lollipop. Note that the Android Lollipop SDK requires XCode 5.1.1 and Java 7 (JRE and JDK.) Both versions of XCode and Java are older than the latest versions available, so you’ll need to install the downgrades before building the Android OS.

When it comes time to configure your Android OS build via the lunch command, select aosp_d5803-userdebug as your device. Once the build is finished (after about 2 hours on my Mac,) use these commands to flash your phone with the Android OS you just built:

fastboot flash boot out/target/product/aries/boot.img
fastboot flash system out/target/product/aries/system.img
fastboot flash userdata out/target/product/aries/userdata.img

Firefox Platform Rendering – Current Work

I’m often asked “what are you working on?” Here’s a snapshot of some of the things currently on my teams’ front burners:

I’m surely forgetting a few things, but that’s a quick snapshot for now. Do you have suggestions for what Platform Rendering features we should pick up next? Add your comments below…

Gaia Tips and Tricks for Gecko Hackers

I’m often assigned Firefox Rendering bugs in bugzilla. By the time a bug gets assigned to me, the reporter had usually exhausted other options and assumed (correctly) that I’m ultimately responsible for fixing Firefox rendering bugs. Of course, I often have to reassign most bugs to more capable individuals.

Some of the hardest bugs to assign are the ones reported by our own Gaia team: the team responsible for building the user experience in Firefox OS. The Gaia engineers take CSS and JavaScript and build powerful mobile apps like the phone dialer and SMS client. When they report bugs, it’s often found within lots of CSS and JS code. I wanted to learn how to effectively reduce the time it takes to resolve rendering issues reported by the Gaia team. It takes a long time to go from a Gaia bug like “scrolling in the gallery app is slow” to find the underlying Gecko bug, for example “rounding issue creates an invalidation rectangle that is too large.”

To do that, I became a Gaia developer for a few days at our Paris office. I reasoned that if I could learn how they work, then I can help my team boil down issues faster and become more responsive to their needs. We already recognize the value of having expert web application developers on staff, but we could do a better job with a better understanding of how they work. With that in mind, I spent the week without any C++ code to look at, and dived into the world of mobile web app development.

I wrote down the steps I took to set up a FirefoxOS build and test environment in an earlier post This time, I’ll list a few of the tips and tricks I learned while I was working with the Gaia developers.

The first and most important tip: You will brick the phone when working on the OS. In fact, you’re probably not trying hard enough if you don’t brick it 🙂 Fastboot lets you connect ADB to the phone when it becomes unresponsive to flash the device with a known good system (like the base image.) Learn how to manually force fastboot on your phone.

Julien showed me how to maintain a Gaia developer profile on your desktop development environment. This set of commands will configure your B2G build to produce the desktop B2G runtime that’s a bit easier to debug than a device build:

# change value of the FIREFOX to point to the full path to the B2G desktop build
 export FIREFOX=/Volumes/firefoxos/B2G/build/dist/B2G.app/Contents/MacOS/b2g
 export PROFILE_FOLDER=gaia-profile DEBUG=1 DESKTOP=0
 make

With a Gaia developer profile, you can switch between B2G desktop and a regular Firefox browser build for testing:

export FIREFOX=/full/path/to/desktop/browser
 $FIREFOX -profile gaia-profile --no-remote app://sms.gaiamobile.org

The Gaia profile lets you use URL’s like app://sms.gaiamobile.org to run the Gaia apps on the desktop browser. This trick alone was a huge time saver! Try it with other URL’s like app://communications.gaiamobile.org

For a first Gaia development project, I picked up the implementation of the new card view for gaia that is based on an asynchronous panning and zooming (APZC.) Etienne did the initial proof-of-concept and my goal is to rebase/finish/polish it and add some CSS Scroll Snapping features. My initial tests for this feature are very promising. CSS Scroll Snapping is much more responsive than the previous JavaScript-based implementation. I’m still working out some bugs but hope to land my first Gaia pull request soon.

I’ve already been able to apply what I’ve learned to triage bugs like this one. The bug started out described as a problem with how we launch GMail on B2G in Arabic language. Based on the testing tricks I learned from Gaia team, I was able to distill it to a root cause with scrollbar rendering on right-to-left (RTL) languages. I added a simplified test case to the bug that should greatly reduce debugging time, and assigned it to one of our RTL experts. That’s quite a bit better than assigning tough bugs to random developers with the entire OS as the test case!

Thanks to Julien and Ettiene for helping me get up to speed. I highly recommend that any Gecko engineer spend a few days as a Gaia hacker. I’m humbled by the ingenuity these developers have for building the entire OS user experience with only the capabilities offered by the Web. We could all learn a lot in the trenches with these hackers!

What can SVG learn from Flash?

Regular readers of my blog know that I also worked on the Macromedia Flash Professional authoring tool and the Adobe Flash Player for many years. I learned a great deal about the design of ubiquitous platforms, and the limitations of single-vendor implementations. At a recent meeting with the W3C SVG working group, I shared some of my thoughts on how Flash was able to reach critical mass across the Web, and how SVG can leverage those lessons for the future.

Basically, it boils down to 3 principles:
1. Flash offered expressive design-fidelity across all user agents.
2. Flash authoring was superior to SVG authoring tools for producing content that adheres to principle # 1.
3. Most Flash content is self-contained and atomic in a packaged file format that helped preserve design-fidelity in # 1.

I shared some feedback regarding what I hear from Firefox users about SVG. I also shared what I never hear from Firefox users: “We need more SVG features.”

As the working group ponders new SVG specifications for review, the main gripe I hear from users is the lack of interoperability for the current feature set. That is, I don’t get requests for a new DOM or fancy gradient meshes, I get bugs about basic rendering differences across browsers. As a result, I’ve directed our SVG investment towards these paper cuts that make authors distrust SVG for complex designs. I can see why it’s more tempting to focus on new feature specifications, but adoption is hampered by the legacy of interoperability (or lack thereof.) I’d like to see the group organize around fixing these bugs across all browsers in a coordinated fashion, eg. in a hackathon or bug bash at a future multi-browser face-to-face meeting.

I also talked about how SVG could be a very expressive authoring source format for a modern implementation that is more focused on pixel-fidelity. Unfortunately, I didn’t get a lot of support for that idea from other browser vendors, as the desire to compete for the best implementation seemed to outweigh the benefits of dependable runtime characteristics. I’m really surprised that SVG hasn’t stepped in to replace Flash for more use cases, and I’m quite certain that the 3 principles I mentioned above are the reason why. I do hope that authoring tool vendors step in and help drive the state of the art here. It’s one thing for browser vendors to offer competing implementations, but the lack of strong authoring systems makes it hard to define what it means to be correct.

I spoke with a few people about how the packaged SWF format was an advantage for Flash because it was easy to have this content move across the internet in a viral fashion without losing any of the assets. Flash games, for example, are commonly hosted on multiple servers (often unknown to the original publisher) and still retain all the graphics and logic within the SWF file. The W3C application package proposal is something we could implement as a format that lets HTML/SVG content traverse networks intact. It’s not hard for such HTML/SVG applications to be made up of hundreds of individual assets that are easy to lose track of. Having a packaged format with clear semantics and security rules (eg. iframe in a zip) could be a really good feature for the modern web.

What else are we missing for SVG to gain critical mass? Post a reply below or find me on twitter!

FirefoxOS Dev Quick Start

B2G_dev_envI’m posting the steps I took to create the FirefoxOS dev environment for the Flame device. We use the Flame as our reference device on the Platform Rendering team. I had to re-do this recently on a new computer and I figure this might help others in the same boat. These steps assume you can already build the desktop version of Firefox on your computer.

  1. Get ADB
  2. Turn on ADB debugging on your device.
  3. Download the latest base image (v18D_nightly.zip at the time of this writing.)
  4. Unzip the base image archive and run the flash.sh script to update your phone to the latest base image. You’ll need to re-enable ADB debugging after this step.
  5. Clone the B2G repository and follow the prerequisite steps for local builds.
    Note: the device we target for the config.sh step is flame-kk (not the older flame device.)
  6. Get a coffee and wait for the long source download.
  7. Run ./build.sh in the B2G source directory to build.
  8. Run ./flash.sh in the B2G source directory to put your new build on to your phone.