Matt Brubeck: Planet Matt

Journal: So Gurgeh took his

Posted in Journal on August 17, 2015 12:26 AM (comments)

Iain Banks, in 1988 (!), correctly predicting the problems of suspense writing in the age of smart phones:

Stories set in the Culture in which Things Went Wrong tended to start with humans losing or forgetting or deliberately leaving behind their terminal. It was a conventional opening, the equivalent of straying off the path in the wild woods in one age, or a car breaking down at night on a lonely road in another. A terminal, in the shape of a ring, button, bracelet or pen or whatever, was your link with everybody and everything else in the Culture. With a terminal, you were never more than a question or a shout away from almost anything you wanted to know, or almost any help you could possibly need.

The Player of Games

Bookmarks: Understanding Hydroplane Races for the New Seattleite

Posted in Bookmarks on August 06, 2015 10:59 PM

"Before Microsoft, Amazon, Starbucks, etc., there were basically three major Seattle industries: (1) logging and lumber based industries like paper manufacturing; (2) maritime industries like fishing, shipbuilding, shipping, and the navy; (3) aerospace (i.e., Boeing). Vintage hydroplane racing represented the Seattle trifecta: Wooden boats with airplane engines!"

Journal: Let It Go, as sung by Taylor Swift

Posted in Journal on January 31, 2015 08:29 PM (comments)

(To the tune of Shake it Off. I blame cortex for inspiring me.)

My ice magic is a curse.
I think it's getting worse.
At least, that's what people say, mmm.
That's what people say, mmm.

My power's getting strong.
Father said that it was wrong.
That's why I conceal, mmm.
That's why I don't feel, mmm.

I froze a fountain, ran to the North Mountain.
It's like I've got this pounding, in my mind, saying it's gonna be alright.

'Cause Anna's gonna wed wed wed wed wed
Some guy that she just met met met met met.
Baby, I just wanna let let let let let,
Let it go, let it go.

Ice breakers gonna sled sled sled sled sled,
And reindeer gonna shed shed shed shed shed.
I'm just gonna let let let let let,
Let it go, let it go.

I'm sick of being nice.
I built a castle made of ice.
And that's what they don't know, mmm.
That's what they don't know, mmm.

I'm alone but now I'm free.
No right or wrong for me.
And that's what they don't see.
That's what they don't see.

I froze a fountain, ran to the North Mountain.
It's like I've got this pounding, in my mind, saying it's gonna be alright.

'Cause Anna's still in bed bed bed bed bed.
And Olaf lost his head head head.
Baby I'm just gonna let let let let let,
Let it go, let it go.

Trolls are gonna troll troll troll troll troll,
Snow men are made of snow snow snow snow snow.
I'm just gonna go go go go go,
Let it go, let it go.

Hey, hey, hey. While you've been getting down about the liars,
And the dirty dirty weasels of the world,
You could have been getting down to this. sick. beat.

My little sis brought her new boyfriend.
He's like "oh my god," so I'm just gonna shake.
And to the duke over there with the hella weird hair,
Won't you come on over baby, we can shake, shake, shake.

'Cause trolls are gonna troll troll troll troll troll,
Snow men are made of snow snow snow snow snow.
I'm just gonna go go go go go,
Let it go, let it go.

Ice breakers gonna sled sled sled sled sled,
And reindeer gonna shed shed shed shed shed.
I'm just gonna let let let let let,
Let it go, let it go.

I let it go, I let it go, oh.
I let it go, I let it go, oh.
I let it go, I let it go, oh.
I let it go, I let it go, oh.

Journal: A chip off the old block

Posted in Journal on November 30, 2014 06:18 AM (comments)

When E.’s new friend from school comes over to our house, the two of them like to spend half their time playing, and the other half sitting and reading books. Which is awesome.

Weblog: Let's build a browser engine! Part 7: Painting 101

Posted in Weblog on November 05, 2014 05:55 PM

I’m returning at last to my series on building a simple HTML rendering engine:

In this article, I will add very basic painting code. This code takes the tree of boxes from the layout module and turns them into an array of pixels. This process is also known as “rasterization.”

Browsers usually implement rasterization with the help of graphics APIs and libraries like Skia, Cairo, Direct2D, and so on. These APIs provide functions for painting polygons, lines, curves, gradients, and text. For now, I’m going to write my own rasterizer that can only paint one thing: rectangles.

Eventually I want to implement text rendering. At that point, I may throw away this toy painting code and switch to a “real” 2D graphics library. But for now, rectangles are sufficient to turn the output of my block layout algorithm into pictures.

Catching Up

Since my last post, I’ve made some small changes to the code from previous articles. These includes some minor refactoring, and some updates to keep the code compatible with the latest Rust nightly builds. None of these changes are vital to understanding the code, but if you’re curious, check the commit history.

Building the Display List

Before painting, we will walk through the layout tree and build a display list. This is a list of graphics operations like “draw a circle” or “draw a string of text.” Or in our case, just “draw a rectangle.”

Why put commands into a display list, rather than execute them immediately? The display list is useful for a several reasons. You can search it for items that will be completely covered up by later operations, and remove them to eliminate wasted painting. You can modify and re-use the display list in cases where you know only certain items have changed. And you can use the same display list to generate different types of output: for example, pixels for displaying on a screen, or vector graphics for sending to a printer.

Robinson’s display list is a vector of DisplayCommands. For now there is only one type of DisplayCommand, a solid-color rectangle:

type DisplayList = Vec<DisplayCommand>;

enum DisplayCommand {
    SolidColor(Color, Rect),
    // insert more commands here

To build the display list, we walk through the layout tree and generate a series of commands for each box. First we draw the box’s background, then we draw its borders and content on top of the background.

fn build_display_list(layout_root: &LayoutBox) -> DisplayList {
    let mut list = Vec::new();
    render_layout_box(&mut list, layout_root);
    return list;

fn render_layout_box(list: &mut DisplayList, layout_box: &LayoutBox) {
    render_background(list, layout_box);
    render_borders(list, layout_box);
    // TODO: render text

    for child in &layout_box.children {
        render_layout_box(list, child);

By default, HTML elements are stacked in the order they appear: If two elements overlap, the later one is drawn on top of the earlier one. This is reflected in our display list, which will draw the elements in the same order they appear in the DOM tree. If this code supported the z-index property, then individual elements would be able to override this stacking order, and we’d need to sort the display list accordingly.

The background is easy. It’s just a solid rectangle. If no background color is specified, then the background is transparent and we don’t need to generate a display command.

fn render_background(list: &mut DisplayList, layout_box: &LayoutBox) {
    get_color(layout_box, "background").map(|color|
        list.push(DisplayCommand::SolidColor(color, layout_box.dimensions.border_box())));

// Return the specified color for CSS property `name`, or None if no color was specified.
fn get_color(layout_box: &LayoutBox, name: &str) -> Option<Color> {
    match layout_box.box_type {
        BlockNode(style) | InlineNode(style) => match style.value(name) {
            Some(Value::ColorValue(color)) => Some(color),
            _ => None
        AnonymousBlock => None

The borders are similar, but instead of a single rectangle we draw four—one for each edge of the box.

fn render_borders(list: &mut DisplayList, layout_box: &LayoutBox) {
    let color = match get_color(layout_box, "border-color") {
        Some(color) => color,
        _ => return // bail out if no border-color is specified

    let d = &layout_box.dimensions;
    let border_box = d.border_box();

    // Left border
    list.push(DisplayCommand::SolidColor(color, Rect {
        x: border_box.x,
        y: border_box.y,
        width: d.border.left,
        height: border_box.height,

    // Right border
    list.push(DisplayCommand::SolidColor(color, Rect {
        x: border_box.x + border_box.width - d.border.right,
        y: border_box.y,
        width: d.border.right,
        height: border_box.height,

    // Top border
    list.push(DisplayCommand::SolidColor(color, Rect {
        x: border_box.x,
        y: border_box.y,
        width: border_box.width,

    // Bottom border
    list.push(DisplayCommand::SolidColor(color, Rect {
        x: border_box.x,
        y: border_box.y + border_box.height - d.border.bottom,
        width: border_box.width,
        height: d.border.bottom,

Next the rendering function will draw each of the box’s children, until the entire layout tree has been translated into display commands.


Now that we’ve built the display list, we need to turn it into pixels by executing each DisplayCommand. We’ll store the pixels in a Canvas:

struct Canvas {
    pixels: Vec<Color>,
    width: usize,
    height: usize,

impl Canvas {
    // Create a blank canvas
    fn new(width: usize, height: usize) -> Canvas {
        let white = Color { r: 255, g: 255, b: 255, a: 255 };
        return Canvas {
            pixels: repeat(white).take(width * height).collect(),
            width: width,
            height: height,
    // ...

To paint a rectangle on the canvas, we just loop through its rows and columns, using a helper method to make sure we don’t go outside the bounds of our canvas.

fn paint_item(&mut self, item: &DisplayCommand) {
    match item {
        &DisplayCommand::SolidColor(color, rect) => {
            // Clip the rectangle to the canvas boundaries.
            let x0 = rect.x.clamp(0.0, self.width as f32) as usize;
            let y0 = rect.y.clamp(0.0, self.height as f32) as usize;
            let x1 = (rect.x + rect.width).clamp(0.0, self.width as f32) as usize;
            let y1 = (rect.y + rect.height).clamp(0.0, self.height as f32) as usize;

            for y in (y0 .. y1) {
                for x in (x0 .. x1) {
                    // TODO: alpha compositing with existing pixel
                    self.pixels[x + y * self.width] = color;

Note that this code only works with opaque colors. If we added transparency (by reading the opacity property, or adding support for rgba() values in the CSS parser) then it would need to blend each new pixel with whatever it’s drawn on top of.

Now we can put everything together in the paint function, which builds a display list and then rasterizes it to a canvas:

// Paint a tree of LayoutBoxes to an array of pixels.
fn paint(layout_root: &LayoutBox, bounds: Rect) -> Canvas {
    let display_list = build_display_list(layout_root);
    let mut canvas = Canvas::new(bounds.width as usize, bounds.height as usize);
    for item in display_list {
    return canvas;

Lastly, we can write a few lines of code using the Rust Image library to save the array of pixels as a PNG file.

Pretty Pictures

At last, we’ve reached the end of our rendering pipeline. In under 1000 lines of code, robinson can now parse this HTML file:

<div class="a">
  <div class="b">
    <div class="c">
      <div class="d">
        <div class="e">
          <div class="f">
            <div class="g">

…and this CSS file:

* { display: block; padding: 12px; }
.a { background: #ff0000; }
.b { background: #ffa500; }
.c { background: #ffff00; }
.d { background: #008000; }
.e { background: #0000ff; }
.f { background: #4b0082; }
.g { background: #800080; }

…to produce this:



If you’re playing along at home, here are some things you might want to try:

  1. Write an alternate painting function that takes a display list and produces vector output (for example, an SVG file) instead of a raster image.

  2. Add support for opacity and alpha blending.

  3. Write a function to optimize the display list by culling items that are completely outside of the canvas bounds.

  4. If you’re familiar with OpenGL, write a hardware-accelerated painting function that uses GL shaders to draw the rectangles.

To Be Continued…

Now that we’ve got basic functionality for each stage in our rendering pipeline, it’s time to go back and fill in some of the missing features—in particular, inline layout and text rendering. Future articles may also add additional stages, like networking and scripting.

I’m going to give a short “Let’s build a browser engine!” talk at this month’s Bay Area Rust Meetup. The meetup is at 7pm tomorrow (Thursday, November 6) at Mozilla’s San Francisco office, and it will also feature talks on Servo by my fellow Servo developers. Video of the talks will be streamed live on Air Mozilla, and recordings will be published there later.

Weblog: A little randomness for Hacker News

Posted in Weblog on October 22, 2014 10:00 PM

In systems that rely heavily on “most popular” lists, like Reddit or Hacker News, the rich get richer while the poor stay poor. Since most people only look at the top of the charts, anything that’s not already listed has a much harder time being seen. You need visibility to get ratings, and you need ratings to get visibility.

Aggregators try to address this problem by promoting new items as well as popular ones. But this is hard to do effectively. For example, the “new” page at Hacker News gets only a fraction of the front page’s traffic. Most users want to see the best content, not wade through an unfiltered stream of links. Thus, very little input is available to decide which links get promoted to the front page.

As an experiment, I wrote a userscript that uses the Hacker News API to search for new or low-ranked links and randomly insert just one or two of them into the front page. It’s also available as a bookmarklet for those who can’t or don’t want to install the user script.

Install user script (may require a browser extension)

Randomize HN (drag to bookmark bar, or right-click to bookmark)

This gives readers a chance to see and vote on links that they otherwise wouldn’t, without altering their habits or wading through a ton of unfiltered content. Each user will see just one or two links per visit, but thanks to randomness a much larger number of links will be seen by the overall user population. My belief, though I can’t prove it, is that widespread use of this feature would improve the quality of the selection process.

The script isn’t perfect (search for FIXME in the source code for some known issues), but it works well enough to try out the idea. Unfortunately, the HN API doesn’t give access to all the data I’d like, and sometimes the script won’t find any suitable links to insert. (You can look at your browser’s console to see which which items were randomly inserted.) Ideally, this feature would be built in to Hacker News—and any other service that recommends “popular” items.

Bookmarks: Tips for buying notebook with decent graphics

Posted in Bookmarks on September 29, 2014 11:54 PM

Buying guide and benchmarks, including upcoming chipsets.

Weblog: Let's build a browser engine! Part 6: Block layout

Posted in Weblog on September 18, 2014 04:30 AM

Welcome back to my series on building a toy HTML rendering engine:

This article will continue the layout module that we started in Part 5. This time, we’ll add the ability to lay out block boxes. These are boxes that are stack vertically, such as headings and paragraphs.

To keep things simple, this code implements only normal flow: no floats, no absolute positioning, and no fixed positioning.

Traversing the Layout Tree

The entry point to this code is the layout function, which takes a takes a LayoutBox and calculates its dimensions. We’ll break this function into three cases, and implement only one of them for now:

impl LayoutBox {
    // Lay out a box and its descendants.
    fn layout(&mut self, containing_block: Dimensions) {
        match self.box_type {
            BlockNode(_) => self.layout_block(containing_block),
            InlineNode(_) => {} // TODO
            AnonymousBlock => {} // TODO

    // ...

A block’s layout depends on the dimensions of its containing block. For block boxes in normal flow, this is just the box’s parent. For the root element, it’s the size of the browser window (or “viewport”).

You may remember from the previous article that a block’s width depends on its parent, while its height depends on its children. This means that our code needs to traverse the tree top-down while calculating widths, so it can lay out the children after their parent’s width is known, and traverse bottom-up to calculate heights, so that a parent’s height is calculated after its children’s.

fn layout_block(&mut self, containing_block: Dimensions) {
    // Child width can depend on parent width, so we need to calculate
    // this box's width before laying out its children.

    // Determine where the box is located within its container.

    // Recursively lay out the children of this box.

    // Parent height can depend on child height, so `calculate_height`
    // must be called *after* the children are laid out.

This function performs a single traversal of the layout tree, doing width calculations on the way down and height calculations on the way back up. A real layout engine might perform several tree traversals, some top-down and some bottom-up.

Calculating the Width

The width calculation is the first step in the block layout function, and also the most complicated. I’ll walk through it step by step. To start, we need the values of the CSS width property and all the left and right edge sizes:

fn calculate_block_width(&mut self, containing_block: Dimensions) {
    let style = self.get_style_node();

    // `width` has initial value `auto`.
    let auto = Keyword("auto".to_string());
    let mut width = style.value("width").unwrap_or(auto.clone());

    // margin, border, and padding have initial value 0.
    let zero = Length(0.0, Px);

    let mut margin_left = style.lookup("margin-left", "margin", &zero);
    let mut margin_right = style.lookup("margin-right", "margin", &zero);

    let border_left = style.lookup("border-left-width", "border-width", &zero);
    let border_right = style.lookup("border-right-width", "border-width", &zero);

    let padding_left = style.lookup("padding-left", "padding", &zero);
    let padding_right = style.lookup("padding-right", "padding", &zero);

    // ...

This uses a helper function called lookup, which just tries a series of values in sequence. If the first property isn’t set, it tries the second one. If that’s not set either, it returns the given default value. This provides an incomplete (but simple) implementation of shorthand properties and initial values.

Note: This is similar to the following code in, say, JavaScript or Ruby:

margin_left = style["margin-left"] || style["margin"] || zero;

Since a child can’t change its parent’s width, it needs to make sure its own width fits the parent’s. The CSS spec expresses this as a set of constraints and an algorithm for solving them. The following code implements that algorithm.

First we add up the margin, padding, border, and content widths. The to_px helper method converts lengths to their numerical values. If a property is set to 'auto', it returns 0 so it doesn’t affect the sum.

let total = [&margin_left, &margin_right, &border_left, &border_right,
             &padding_left, &padding_right, &width].iter().map(|v| v.to_px()).sum();

This is the minimum horizontal space needed for the box. If this isn’t equal to the container width, we’ll need to adjust something to make it equal.

If the width or margins are set to 'auto', they can expand or contract to fit the available space. Following the spec, we first check if the box is too big. If so, we set any expandable margins to zero.

// If width is not auto and the total is wider than the container, treat auto margins as 0.
if width != auto && total > containing_block.content.width {
    if margin_left == auto {
        margin_left = Length(0.0, Px);
    if margin_right == auto {
        margin_right = Length(0.0, Px);

If the box is too large for its container, it overflows the container. If it’s too small, it will underflow, leaving extra space. We’ll calculate the underflow—the amount of extra space left in the container. (If this number is negative, it is actually an overflow.)

let underflow = containing_block.content.width - total;

We now follow the spec’s algorithm for eliminating any overflow or underflow by adjusting the expandable dimensions. If there are no 'auto' dimensions, we adjust the right margin. (Yes, this means the margin may be negative in the case of an overflow!)

match (width == auto, margin_left == auto, margin_right == auto) {
    // If the values are overconstrained, calculate margin_right.
    (false, false, false) => {
        margin_right = Length(margin_right.to_px() + underflow, Px);

    // If exactly one size is auto, its used value follows from the equality.
    (false, false, true) => { margin_right = Length(underflow, Px); }
    (false, true, false) => { margin_left  = Length(underflow, Px); }

    // If width is set to auto, any other auto values become 0.
    (true, _, _) => {
        if margin_left == auto { margin_left = Length(0.0, Px); }
        if margin_right == auto { margin_right = Length(0.0, Px); }

        if underflow >= 0.0 {
            // Expand width to fill the underflow.
            width = Length(underflow, Px);
        } else {
            // Width can't be negative. Adjust the right margin instead.
            width = Length(0.0, Px);
            margin_right = Length(margin_right.to_px() + underflow, Px);

    // If margin-left and margin-right are both auto, their used values are equal.
    (false, true, true) => {
        margin_left = Length(underflow / 2.0, Px);
        margin_right = Length(underflow / 2.0, Px);

At this point, the constraints are met and any 'auto' values have been converted to lengths. The results are the the used values for the horizontal box dimensions, which we will store in the layout tree. You can see the final code in


The next step is simpler. This function looks up the remanining margin/padding/border styles, and uses these along with the containing block dimensions to determine this block’s position on the page.

fn calculate_block_position(&mut self, containing_block: Dimensions) {
    let style = self.get_style_node();
    let d = &mut self.dimensions;

    // margin, border, and padding have initial value 0.
    let zero = Length(0.0, Px);

    // If margin-top or margin-bottom is `auto`, the used value is zero. = style.lookup("margin-top", "margin", &zero).to_px();
    d.margin.bottom = style.lookup("margin-bottom", "margin", &zero).to_px(); = style.lookup("border-top-width", "border-width", &zero).to_px();
    d.border.bottom = style.lookup("border-bottom-width", "border-width", &zero).to_px(); = style.lookup("padding-top", "padding", &zero).to_px();
    d.padding.bottom = style.lookup("padding-bottom", "padding", &zero).to_px();

    d.content.x = containing_block.content.x +
                  d.margin.left + d.border.left + d.padding.left;

    // Position the box below all the previous boxes in the container.
    d.content.y = containing_block.content.height + containing_block.content.y +
         + +;

Take a close look at that last statement, which sets the y position. This is what gives block layout its distinctive vertical stacking behavior. For this to work, we’ll need to make sure the parent’s content.height is updated after laying out each child.


Here’s the code that recursively lays out the box’s contents. As it loops through the child boxes, it keeps track of the total content height. This is used by the positioning code (above) to find the vertical position of the next child.

fn layout_block_children(&mut self) {
    let d = &mut self.dimensions;
    for child in &mut self.children {
        // Track the height so each child is laid out below the previous content.
        d.content.height = d.content.height + child.dimensions.margin_box().height;

The total vertical space taken up by each child is the height of its margin box, which we calculate like so:

impl Dimensions {
    // The area covered by the content area plus its padding.
    fn padding_box(self) -> Rect {
    // The area covered by the content area plus padding and borders.
    fn border_box(self) -> Rect {
    // The area covered by the content area plus padding, borders, and margin.
    fn margin_box(self) -> Rect {

impl Rect {
    fn expanded_by(self, edge: EdgeSizes) -> Rect {
        Rect {
            x: self.x - edge.left,
            y: self.y -,
            width: self.width + edge.left + edge.right,
            height: self.height + + edge.bottom,

For simplicity, this does not implement margin collapsing. A real layout engine would allow the bottom margin of one box to overlap the top margin of the next box, rather than placing each margin box completely below the previous one.

The ‘height’ Property

By default, the box’s height is equal to the height of its contents. But if the 'height' property is set to an explicit length, we’ll use that instead:

fn calculate_block_height(&mut self) {
    // If the height is set to an explicit length, use that exact length.
    // Otherwise, just keep the value set by `layout_block_children`.
    if let Some(Length(h, Px)) = self.get_style_node().value("height") {
        self.dimensions.content.height = h;

And that concludes the block layout algorithm. You can now call layout() on a styled HTML document, and it will spit out a bunch of rectangles with widths, heights, margins, etc. Cool, right?


Some extra ideas for the ambitious implementer:

  1. Collapsing vertical margins.

  2. Relative positioning.

  3. Parallelize the layout process, and measure the effect on performance.

If you try the parallelization project, you may want to separate the width calculation and the height calculation into two distinct passes. The top-down traversal for width is easy to parallelize just by spawning a separate task for each child. The height calculation is a little trickier, since you need to go back and adjust the y position of each child after its siblings are laid out.

To Be Continued…

Thank you to everyone who’s followed along this far!

These articles are taking longer and longer to write, as I journey further into unfamiliar areas of layout and rendering. There will be a longer hiatus before the next part as I experiment with font and graphics code, but I’ll resume the series as soon as I can.

Update: Part 7 is now ready.

Weblog: Let's build a browser engine! Part 5: Boxes

Posted in Weblog on September 08, 2014 11:16 PM

This is the latest in a series of articles about writing a simple HTML rendering engine:

This article will begin the layout module, which takes the style tree and translates it into a bunch of rectangles in a two-dimensional space. This is a big module, so I’m going to split it into several articles. Also, some of the code I share in this article may need to change as I write the code for the later parts.

The layout module’s input is the style tree from Part 4, and its output is yet another tree, the layout tree. This takes us one step further in our mini rendering pipeline:

I’ll start by talking about the basic HTML/CSS layout model. If you’ve ever learned to develop web pages you might be familiar with this already—but it may look a bit different from the implementer’s point of view.

The Box Model

Layout is all about boxes. A box is a rectangular section of a web page. It has a width, a height, and a position on the page. This rectangle is called the content area because it’s where the box’s content is drawn. The content may be text, image, video, or other boxes.

A box may also have padding, borders, and margins surrounding its content area. The CSS spec has a diagram showing how all these layers fit together.

Robinson stores a box’s content area and surrounding areas in the following structure. [Rust note: f32 is a 32-bit floating point type.]

// CSS box model. All sizes are in px.

struct Dimensions {
    // Position of the content area relative to the document origin:
    content: Rect,

    // Surrounding edges:
    padding: EdgeSizes,
    border: EdgeSizes,
    margin: EdgeSizes,

struct Rect {
    x: f32,
    y: f32,
    width: f32,
    height: f32,

struct EdgeSizes {
    left: f32,
    right: f32,
    top: f32,
    bottom: f32,

Block and Inline Layout

Note: This section contains diagrams that won't make sense if you are reading them without the associated visual styles. If you are reading this in a feed reader, try opening the original page in a regular browser tab. I also included text descriptions for those of you using screen readers or other assistive technologies.

The CSS display property determines which type of box an element generates. CSS defines several box types, each with its own layout rules. I’m only going to talk about two of them: block and inline.

I’ll use this bit of pseudo-HTML to illustrate the difference:


Block boxes are placed vertically within their container, from top to bottom.

a, b, c, d { display: block; }

Description: The diagram below shows four rectangles in a vertical stack.


Inline boxes are placed horizontally within their container, from left to right. If they reach the right edge of the container, they will wrap around and continue on a new line below.

a, b, c, d { display: inline; }

Description: The diagram below shows boxes `a`, `b`, and `c` in a horizontal line from left to right, and box `d` in the next line.


Each box must contain only block children, or only inline children. When an DOM element contains a mix of block and inline children, the layout engine inserts anonymous boxes to separate the two types. (These boxes are “anonymous” because they aren’t associated with nodes in the DOM tree.)

In this example, the inline boxes b and c are surrounded by an anonymous block box, shown in pink:

a    { display: block; }
b, c { display: inline; }
d    { display: block; }

Description: The diagram below shows three boxes in a vertical stack. The first is labeled `a`; the second contains two boxes in a horizonal row labeled `b` and `c`; the third box in the stack is labeled `d`.


Note that content grows vertically by default. That is, adding children to a container generally makes it taller, not wider. Another way to say this is that, by default, the width of a block or line depends on its container’s width, while the height of a container depends on its children’s heights.

This gets more complicated if you override the default values for properties like width and height, and way more complicated if you want to support features like vertical writing.

The Layout Tree

The layout tree is a collection of boxes. A box has dimensions, and it may contain child boxes.

struct LayoutBox<'a> {
    dimensions: Dimensions,
    box_type: BoxType<'a>,
    children: Vec<LayoutBox<'a>>,

A box can be a block node, an inline node, or an anonymous block box. (This will need to change when I implement text layout, because line wrapping can cause a single inline node to split into multiple boxes. But it will do for now.)

enum BoxType<'a> {
    BlockNode(&'a StyledNode<'a>),
    InlineNode(&'a StyledNode<'a>),

To build the layout tree, we need to look at the display property for each DOM node. I added some code to the style module to get the display value for a node. If there’s no specified value it returns the initial value, 'inline'.

enum Display {

impl StyledNode {
    // Return the specified value of a property if it exists, otherwise `None`.
    fn value(&self, name: &str) -> Option<Value> {
        self.specified_values.get(name).map(|v| v.clone())

    // The value of the `display` property (defaults to inline).
    fn display(&self) -> Display {
        match self.value("display") {
            Some(Keyword(s)) => match &*s {
                "block" => Display::Block,
                "none" => Display::None,
                _ => Display::Inline
            _ => Display::Inline

Now we can walk through the style tree, build a LayoutBox for each node, and then insert boxes for the node’s children. If a node’s display property is set to 'none' then it is not included in the layout tree.

// Build the tree of LayoutBoxes, but don't perform any layout calculations yet.
fn build_layout_tree<'a>(style_node: &'a StyledNode<'a>) -> LayoutBox<'a> {
    // Create the root box.
    let mut root = LayoutBox::new(match style_node.display() {
        Block => BlockNode(style_node),
        Inline => InlineNode(style_node),
        DisplayNone => panic!("Root node has display: none.")

    // Create the descendant boxes.
    for child in &style_node.children {
        match child.display() {
            Block => root.children.push(build_layout_tree(child)),
            Inline => root.get_inline_container().children.push(build_layout_tree(child)),
            DisplayNone => {} // Skip nodes with `display: none;`
    return root;

impl LayoutBox {
    // Constructor function
    fn new(box_type: BoxType) -> LayoutBox {
        LayoutBox {
            box_type: box_type,
            dimensions: Default::default(), // initially set all fields to 0.0
            children: Vec::new(),
    // ...

If a block node contains an inline child, create an anonymous block box to contain it. If there are several inline children in a row, put them all in the same anonymous container.

// Where a new inline child should go.
fn get_inline_container(&mut self) -> &mut LayoutBox {
    match self.box_type {
        InlineNode(_) | AnonymousBlock => self,
        BlockNode(_) => {
            // If we've just generated an anonymous block box, keep using it.
            // Otherwise, create a new one.
            match self.children.last() {
                Some(&LayoutBox { box_type: AnonymousBlock,..}) => {}
                _ => self.children.push(LayoutBox::new(AnonymousBlock))

This is intentionally simplified in a number of ways from the standard CSS box generation algorithm. For example, it doesn’t handle the case where an inline box contains a block-level child. Also, it generates an unnecessary anonymous box if a block-level node has only inline children.

To Be Continued…

Whew, that took longer than I expected. I think I’ll stop here for now, but don’t worry: Part 6 is coming soon, and will cover block-level layout.

Once block layout is finished, we could jump ahead to the next stage of the pipeline: painting! I think I might do that, because then we can finally see the rendering engine’s output as pretty pictures instead of just numbers.

However, the pictures will just be a bunch of colored rectangles, unless we finish the layout module by implementing inline layout and text layout. If I don’t implement those before moving on to painting, I hope to come back to them afterward.

Weblog: Let's build a browser engine! Part 4: Style

Posted in Weblog on August 23, 2014 10:45 PM

Welcome back to my series on building your own toy browser engine. If you’re just tuning in, you can find the previous episodes here:

This article will cover what the CSS standard calls assigning property values, or what I call the style module. This module takes DOM nodes and CSS rules as input, and matches them up to determine the value of each CSS property for any given node.

This part doesn’t contain a lot of code, since I didn’t implement the really complicated parts. However, I think what’s left is still quite interesting, and I’ll also explain how some of the missing pieces can be implemented.

The Style Tree

The output of robinson’s style module is something I call the style tree. Each node in this tree includes a pointer to a DOM node, plus its CSS property values:

// Map from CSS property names to values.
type PropertyMap = HashMap<String, Value>;

// A node with associated style data.
struct StyledNode<'a> {
    node: &'a Node, // pointer to a DOM node
    specified_values: PropertyMap,
    children: Vec<StyledNode<'a>>,

What’s with all the 'a stuff? Those are lifetimes, part of how Rust guarantees that pointers are memory-safe without requiring garbage collection. If you’re not working in Rust you can ignore them; they aren’t critical to the code’s meaning.

We could add new fields to the dom::Node struct instead of creating a new tree, but I wanted to keep style code out of the earlier “lessons.” This also gives me an opportunity to talk about the parallel trees that inhabit most rendering engines.

A browser engine module often takes one tree as input, and produces a different but related tree as output. For example, Gecko’s layout code takes a DOM tree and produces a frame tree, which is then used to build a view tree. Blink and WebKit transform the DOM tree into a render tree. Later stages in all these engines produce still more trees, including layer trees and widget trees.

The pipeline for our toy browser engine will look something like this, after we complete a few more stages:

In my implementation, each node in the DOM tree has exactly one node in the style tree. But in a more complicated pipeline stage, several input nodes could collapse into a single output node. Or an input node might expand into several output nodes, or be skipped completely. For example, the style tree could exclude elements whose display property is set to 'none'. (Instead I’ll remove these in the layout stage, because my code turned out a bit simpler that way.)

Selector Matching

The first step in building the style tree is selector matching. This will be very easy, since my CSS parser supports only simple selectors. You can tell whether a simple selector matches an element just by looking at the element itself. Matching compound selectors would require traversing the DOM tree to look at the element’s siblings, parents, etc.

fn matches(elem: &ElementData, selector: &Selector) -> bool {
    match *selector {
        Simple(ref simple_selector) => matches_simple_selector(elem, simple_selector)

To help, we’ll add some convenient ID and class accessors to our DOM element type. The class attribute can contain multiple class names separated by spaces, which we return in a hash table.

impl ElementData {
    pub fn id(&self) -> Option<&String> {

    pub fn classes(&self) -> HashSet<&str> {
        match self.attributes.get("class") {
            Some(classlist) => classlist.split(' ').collect(),
            None => HashSet::new()

To test whether a simple selector matches an element, just look at each selector component, and return false if the element doesn’t have a matching class, ID, or tag name.

fn matches_simple_selector(elem: &ElementData, selector: &SimpleSelector) -> bool {
    // Check type selector
    if selector.tag_name.iter().any(|name| elem.tag_name != *name) {
        return false;

    // Check ID selector
    if|id| != Some(id)) {
        return false;

    // Check class selectors
    let elem_classes = elem.classes();
    if selector.class.iter().any(|class| !elem_classes.contains(&**class)) {
        return false;

    // We didn't find any non-matching selector components.
    return true;

Rust note: This function uses the any method, which returns true if an iterator contains an element that passes the provided test. This is the same as the any function in Python (or Haskell), or the some method in JavaScript.

Building the Style Tree

Next we need to traverse the DOM tree. For each element in the tree, we will search the stylesheet for matching rules.

When comparing two rules that match the same element, we need to use the highest-specificity selector from each match. Because our CSS parser stores the selectors from most- to least-specific, we can stop as soon as we find a matching one, and return its specificity along with a pointer to the rule.

type MatchedRule<'a> = (Specificity, &'a Rule);

// If `rule` matches `elem`, return a `MatchedRule`. Otherwise return `None`.
fn match_rule<'a>(elem: &ElementData, rule: &'a Rule) -> Option<MatchedRule<'a>> {
    // Find the first (highest-specificity) matching selector.
        .find(|selector| matches(elem, *selector))
        .map(|selector| (selector.specificity(), rule))

To find all the rules that match an element we call filter_map, which does a linear scan through the style sheet, checking every rule and throwing out ones that don’t match. A real browser engine would speed this up by storing the rules in multiple hash tables based on tag name, id, class, etc.

// Find all CSS rules that match the given element.
fn matching_rules<'a>(elem: &ElementData, stylesheet: &'a Stylesheet) -> Vec<MatchedRule<'a>> {
    stylesheet.rules.iter().filter_map(|rule| match_rule(elem, rule)).collect()

Once we have the matching rules, we can find the specified values for the element. We insert each rule’s property values into a HashMap. We sort the matches by specificity, so the more-specific rules are processed after the less-specific ones, and can overwrite their values in the HashMap.

// Apply styles to a single element, returning the specified values.
fn specified_values(elem: &ElementData, stylesheet: &Stylesheet) -> PropertyMap {
    let mut values = HashMap::new();
    let mut rules = matching_rules(elem, stylesheet);

    // Go through the rules from lowest to highest specificity.
    rules.sort_by(|&(a, _), &(b, _)| a.cmp(&b));
    for (_, rule) in rules {
        for declaration in &rule.declarations {
            values.insert(, declaration.value.clone());
    return values;

Now we have everything we need to walk through the DOM tree and build the style tree. Note that selector matching works only on elements, so the specified values for a text node are just an empty map.

// Apply a stylesheet to an entire DOM tree, returning a StyledNode tree.
pub fn style_tree<'a>(root: &'a Node, stylesheet: &'a Stylesheet) -> StyledNode<'a> {
    StyledNode {
        node: root,
        specified_values: match root.node_type {
            Element(ref elem) => specified_values(elem, stylesheet),
            Text(_) => HashMap::new()
        children: root.children.iter().map(|child| style_tree(child, stylesheet)).collect(),

That’s all of robinson’s code for building the style tree. Next I’ll talk about some glaring omissions.

The Cascade

Style sheets provided by the author of a web page are called author style sheets. In addition to these, browsers also provide default styles via user agent style sheets. And they may allow users to add custom styles through user style sheets (like Gecko’s userContent.css).

The cascade defines which of these three “origins” takes precedence over another. There are six levels to the cascade: one for each origin’s “normal” declarations, plus one for each origin’s !important declarations.

Robinson’s style code does not implement the cascade; it takes only a single style sheet. The lack of a default style sheet means that HTML elements will not have any of the default styles you might expect. For example, the <head> element’s contents will not be hidden unless you explicitly add this rule to your style sheet:

head { display: none; }

Implementing the cascade should by fairly easy: Just track the origin of each rule, and sort declarations by origin and importance in addition to specificity. A simplified, two-level cascade should be enough to support the most common cases: normal user agent styles and normal author styles.

Computed Values

In addition to the “specified values” mentioned above, CSS defines initial, computed, used, and actual values.

Initial values are defaults for properties that aren’t specified in the cascade. Computed values are based on specified values, but may have some property-specific normalization rules applied.

Implementing these correctly requires separate code for each property, based on its definition in the CSS specs. This work is necessary for a real-world browser engine, but I’m hoping to avoid it in this toy project. In later stages, code that uses these values will (sort of) simulate initial values by using a default when the specified value is missing.

Used values and actual values are calculated during and after layout, which I’ll cover in future articles.


If text nodes can’t match selectors, how do they get colors and fonts and other styles? The answer is inheritance.

When a property is inherited, any node without a cascaded value will receive its parent’s value for that property. Some properties, like 'color', are inherited by default; others only if the cascade specifies the special value 'inherit'.

My code does not support inheritance. To implement it, you could pass the parent’s style data into the specified_values function, and use a hard-coded lookup table to decide which properties should be inherited.

Style Attributes

Any HTML element can include a style attribute containing a list of CSS declarations. There are no selectors, because these declarations automatically apply only to the element itself.

<span style="color: red; background: yellow;">

If you want to support the style attribute, make the specified_values function check for the attribute. If the attribute is present, pass it to parse_declarations from the CSS parser. Apply the resulting declarations after the normal author declarations, since the attribute is more specific than any CSS selector.


In addition to writing your own selector matching and value assignment code, for further exercise you can implement one or more of the missing pieces discussed above, in your own project or a fork of robinson:

  1. Cascading
  2. Initial and/or computed values
  3. Inheritance
  4. The style attribute

Also, if you extended the CSS parser from Part 3 to include compound selectors, you can now implement matching for those compound selectors.

To Be Continued…

Part 5 will introduce the layout module. I haven’t finished the code for this yet, so there will be another delay before I can start writing the article. I plan to split layout into at least two articles (one for block layout and one for inline layout, probably).

In the meantime, I’d love to see anything you’ve created based on these articles or exercises. If your code is online somewhere, feel free to add a link to the comments below! So far I have seen Martin Tomasi’s Java implementation and Pohl Longsine’s Swift version.

Weblog: Let's build a browser engine! Part 3: CSS

Posted in Weblog on August 13, 2014 07:30 PM

This is the third in a series of articles on building a toy browser rendering engine. Want to build your own? Start at the beginning to learn more:

This article introduces code for reading Cascading Style Sheets (CSS). As usual, I won’t try to cover everything in the spec. Instead, I tried to implement just enough to illustrate some concepts and produce input for later stages in the rendering pipeline.

Anatomy of a Stylesheet

Here’s an example of CSS source code:

h1, h2, h3 { margin: auto; color: #cc0000; }
div.note { margin-bottom: 20px; padding: 10px; }
#answer { display: none; }

Next I’ll walk through the css module from my toy browser engine, robinson. The code is written in Rust, though the concepts should translate pretty easily into other programming languages. Reading the previous articles first might help you understand some the code below.

A CSS stylesheet is a series of rules. (In the example stylesheet above, each line contains one rule.)

struct Stylesheet {
    rules: Vec<Rule>,

A rule includes one or more selectors separated by commas, followed by a series of declarations enclosed in braces.

struct Rule {
    selectors: Vec<Selector>,
    declarations: Vec<Declaration>,

A selector can be a simple selector, or it can be a chain of selectors joined by combinators. Robinson supports only simple selectors for now.

Note: Confusingly, the newer Selectors Level 3 standard uses the same terms to mean slightly different things. In this article I’ll mostly refer to CSS2.1. Although outdated, it’s a useful starting point because it’s smaller and more self-contained (compared to CSS3, which is split into myriad specs that depend on each other and CSS2.1).

In robinson, a simple selector can include a tag name, an ID prefixed by '#', any number of class names prefixed by '.', or some combination of the above. If the tag name is empty or '*' then it is a “universal selector” that can match any tag.

There are many other types of selector (especially in CSS3), but this will do for now.

enum Selector {

struct SimpleSelector {
    tag_name: Option<String>,
    id: Option<String>,
    class: Vec<String>,

A declaration is just a name/value pair, separated by a colon and ending with a semicolon. For example, "margin: auto;" is a declaration.

struct Declaration {
    name: String,
    value: Value,

My toy engine supports only a handful of CSS’s many value types.

enum Value {
    Length(f32, Unit),
    // insert more values here

enum Unit {
    // insert more units here

struct Color {
    r: u8,
    g: u8,
    b: u8,
    a: u8,

Rust note: u8 is an 8-bit unsigned integer, and f32 is a 32-bit float.

All other CSS syntax is unsupported, including @-rules, comments, and any selectors/values/units not mentioned above.


CSS has a regular grammar, making it easier to parse correctly than its quirky cousin HTML. When a standards-compliant CSS parser encounters a parse error, it discards the unrecognized part of the stylesheet but still processes the remaining portions. This is useful because it allows stylesheets to include new syntax but still produce well-defined output in older browsers.

Robinson uses a very simplistic (and totally not standards-compliant) parser, built the same way as the HTML parser from Part 2. Rather than go through the whole thing line-by-line again, I’ll just paste in a few snippets. For example, here is the code for parsing a single selector:

// Parse one simple selector, e.g.: `type#id.class1.class2.class3`
fn parse_simple_selector(&mut self) -> SimpleSelector {
    let mut selector = SimpleSelector { tag_name: None, id: None, class: Vec::new() };
    while !self.eof() {
        match self.next_char() {
            '#' => {
       = Some(self.parse_identifier());
            '.' => {
            '*' => {
                // universal selector
            c if valid_identifier_char(c) => {
                selector.tag_name = Some(self.parse_identifier());
            _ => break
    return selector;

Note the lack of error checking. Some malformed input like ### or *foo* will parse successfully and produce weird results. A real CSS parser would discard these invalid selectors.


Specificity is one of the ways a rendering engine decides which style overrides the other in a conflict. If a stylesheet contains two rules that match an element, the rule with the matching selector of higher specificity can override values from the one with lower specificity.

The specificity of a selector is based on its components. An ID selector is more specific than a class selector, which is more specific than a tag selector. Within each of these “levels,” more selectors beats fewer.

pub type Specificity = (usize, usize, usize);

impl Selector {
    pub fn specificity(&self) -> Specificity {
        let Selector::Simple(ref simple) = *self;
        let a =;
        let b = simple.class.len();
        let c = simple.tag_name.iter().count();
        (a, b, c)

(If we supported chained selectors, we could calculate the specificity of a chain just by adding up the specificities of its parts.)

The selectors for each rule are stored in a sorted vector, most-specific first. This will be important in matching, which I’ll cover in the next article.

// Parse a rule set: `<selectors> { <declarations> }`.
fn parse_rule(&mut self) -> Rule {
    Rule {
        selectors: self.parse_selectors(),
        declarations: self.parse_declarations()

// Parse a comma-separated list of selectors.
fn parse_selectors(&mut self) -> Vec<Selector> {
    let mut selectors = Vec::new();
    loop {
        match self.next_char() {
            ',' => { self.consume_char(); self.consume_whitespace(); }
            '{' => break, // start of declarations
            c   => panic!("Unexpected character {} in selector list", c)
    // Return selectors with highest specificity first, for use in matching.
    selectors.sort_by(|a,b| b.specificity().cmp(&a.specificity()));
    return selectors;

The rest of the CSS parser is fairly straightforward. You can read the whole thing on GitHub. And if you didn’t already do it for Part 2, this would be a great time to try out a parser generator. My hand-rolled parser gets the job done for simple example files, but it has a lot of hacky bits and will fail badly if you violate its assumptions. Someday I might replace it with one built on rust-peg or similar.


As before, you should decide which of these exercises you want to do, and skip the rest:

  1. Implement your own simplified CSS parser and specificity calculation.

  2. Extend robinson’s CSS parser to support more values, or one or more selector combinators.

  3. Extend the CSS parser to discard any declaration that contains a parse error, and follow the error handling rules to resume parsing after the end of the declaration.

  4. Make the HTML parser pass the contents of any <style> nodes to the CSS parser, and return a Document object that includes a list of Stylesheets in addition to the DOM tree.


Just like in Part 2, you can skip parsing by hard-coding CSS data structures directly into your program, or by writing them in an alternate format like JSON that you already have a parser for.

To Be Continued…

The next article will introduce the style module. This is where everything starts to come together, with selector matching to apply CSS styles to DOM nodes.

The pace of this series might slow down soon, since I’ll be busy later this month and I haven’t even written the code for some of the upcoming articles. I’ll keep them coming as fast as I can!

Weblog: Let's build a browser engine! Part 2: HTML

Posted in Weblog on August 11, 2014 03:00 PM

This is the second in a series of articles on building a toy browser rendering engine:

This article is about parsing HTML source code to produce a tree of DOM nodes. Parsing is a fascinating topic, but I don’t have the time or expertise to give it the introduction it deserves. You can get a detailed introduction to parsing from any good course or book on compilers. Or get a hands-on start by going through the documentation for a parser generator that works with your chosen programming language.

HTML has its own unique parsing algorithm. Unlike parsers for most programming languages and file formats, the HTML parsing algorithm does not reject invalid input. Instead it includes specific error-handling instructions, so web browsers can agree on how to display every web page, even ones that don’t conform to the syntax rules. Web browsers have to do this to be usable: Since non-conforming HTML has been supported since the early days of the web, it is now used in a huge portion of existing web pages.

A Simple HTML Dialect

I didn’t even try to implement the standard HTML parsing algorithm. Instead I wrote a basic parser for a tiny subset of HTML syntax. My parser can handle simple pages like this:

        <div id="main" class="test">
            <p>Hello <em>world</em>!</p>

The following syntax is allowed:

  • Balanced tags: <p>...</p>
  • Attributes with quoted values: id="main"
  • Text nodes: <em>world</em>

Everything else is unsupported, including:

  • Comments
  • Doctype declarations
  • Escaped characters (like &amp;) and CDATA sections
  • Self-closing tags: <br/> or <br> with no closing tag
  • Error handling (e.g. unbalanced or improperly nested tags)
  • Namespaces and other XHTML syntax: <html:body>
  • Character encoding detection

At each stage of this project I’m writing more or less the minimum code needed to support the later stages. But if you want to learn more about parsing theory and tools, you can be much more ambitious in your own project!

Example Code

Next, let’s walk through my toy HTML parser, keeping in mind that this is just one way to do it (and probably not the best way). Its structure is based loosely on the tokenizer module from Servo’s cssparser library. It has no real error handling; in most cases, it just aborts when faced with unexpected syntax. The code is in Rust, but I hope it’s fairly readable to anyone who’s used similar-looking languages like Java, C++, or C#. It makes use of the DOM data structures from part 1.

The parser stores its input string and a current position within the string. The position is the index of the next character we haven’t processed yet.

struct Parser {
    pos: usize, // "usize" is an unsigned integer, similar to "size_t" in C
    input: String,

We can use this to implement some simple methods for peeking at the next characters in the input:

impl Parser {
    // Read the current character without consuming it.
    fn next_char(&self) -> char {

    // Do the next characters start with the given string?
    fn starts_with(&self, s: &str) -> bool {
        self.input[self.pos ..].starts_with(s)

    // Return true if all input is consumed.
    fn eof(&self) -> bool {
        self.pos >= self.input.len()

    // ...

Rust strings are stored as UTF-8 byte arrays. To go to the next character, we can’t just advance by one byte. Instead we use char_indices which correctly handles multi-byte characters. (If our string used fixed-width characters, we could just increment pos by 1.)

// Return the current character, and advance self.pos to the next character.
fn consume_char(&mut self) -> char {
    let mut iter = self.input[self.pos..].char_indices();
    let (_, cur_char) =;
    let (next_pos, _) =, ' '));
    self.pos += next_pos;
    return cur_char;

Often we will want to consume a string of consecutive characters. The consume_while method consumes characters that meet a given condition, and returns them as a string. This method’s argument is a function that takes a char and returns a bool.

// Consume characters until `test` returns false.
fn consume_while<F>(&mut self, test: F) -> String
        where F: Fn(char) -> bool {
    let mut result = String::new();
    while !self.eof() && test(self.next_char()) {
    return result;

We can use this to ignore a sequence of space characters, or to consume a string of alphanumeric characters:

// Consume and discard zero or more whitespace characters.
fn consume_whitespace(&mut self) {

// Parse a tag or attribute name.
fn parse_tag_name(&mut self) -> String {
    self.consume_while(|c| match c {
        'a'...'z' | 'A'...'Z' | '0'...'9' => true,
        _ => false

Now we’re ready to start parsing HTML. To parse a single node, we look at its first character to see if it is an element or a text node. In our simplified version of HTML, a text node can contain any character except <.

// Parse a single node.
fn parse_node(&mut self) -> dom::Node {
    match self.next_char() {
        '<' => self.parse_element(),
        _   => self.parse_text()

// Parse a text node.
fn parse_text(&mut self) -> dom::Node {
    dom::text(self.consume_while(|c| c != '<'))

An element is more complicated. It includes opening and closing tags, and between them any number of child nodes:

// Parse a single element, including its open tag, contents, and closing tag.
fn parse_element(&mut self) -> dom::Node {
    // Opening tag.
    assert!(self.consume_char() == '<');
    let tag_name = self.parse_tag_name();
    let attrs = self.parse_attributes();
    assert!(self.consume_char() == '>');

    // Contents.
    let children = self.parse_nodes();

    // Closing tag.
    assert!(self.consume_char() == '<');
    assert!(self.consume_char() == '/');
    assert!(self.parse_tag_name() == tag_name);
    assert!(self.consume_char() == '>');

    return dom::elem(tag_name, attrs, children);

Parsing attributes is pretty easy in our simplified syntax. Until we reach the end of the opening tag (>) we repeatedly look for a name followed by = and then a string enclosed in quotes.

// Parse a single name="value" pair.
fn parse_attr(&mut self) -> (String, String) {
    let name = self.parse_tag_name();
    assert!(self.consume_char() == '=');
    let value = self.parse_attr_value();
    return (name, value);

// Parse a quoted value.
fn parse_attr_value(&mut self) -> String {
    let open_quote = self.consume_char();
    assert!(open_quote == '"' || open_quote == '\'');
    let value = self.consume_while(|c| c != open_quote);
    assert!(self.consume_char() == open_quote);
    return value;

// Parse a list of name="value" pairs, separated by whitespace.
fn parse_attributes(&mut self) -> dom::AttrMap {
    let mut attributes = HashMap::new();
    loop {
        if self.next_char() == '>' {
        let (name, value) = self.parse_attr();
        attributes.insert(name, value);
    return attributes;

To parse the child nodes, we recursively call parse_node in a loop until we reach the closing tag. This function returns a Vec, which is Rust’s name for a growable array.

// Parse a sequence of sibling nodes.
fn parse_nodes(&mut self) -> Vec<dom::Node> {
    let mut nodes = Vec::new();
    loop {
        if self.eof() || self.starts_with("</") {
    return nodes;

Finally, we can put this all together to parse an entire HTML document into a DOM tree. This function will create a root node for the document if it doesn’t include one explicitly; this is similar to what a real HTML parser does.

// Parse an HTML document and return the root element.
pub fn parse(source: String) -> dom::Node {
    let mut nodes = Parser { pos: 0, input: source }.parse_nodes();

    // If the document contains a root element, just return it. Otherwise, create one.
    if nodes.len() == 1 {
    } else {
        dom::elem("html".to_string(), HashMap::new(), nodes)

That’s it! The entire code for the robinson HTML parser. The whole thing weighs in at just over 100 lines of code (not counting blank lines and comments). If you use a good library or parser generator, you can probably build a similar toy parser in even less space.


Here are a few alternate ways to try this out yourself. As before, you can choose one or more of them and ignore the others.

  1. Build a parser (either “by hand” or with a library or parser generator) that takes a subset of HTML as input and produces a tree of DOM nodes.

  2. Modify robinson’s HTML parser to add some missing features, like comments. Or replace it with a better parser, perhaps built with a library or generator.

  3. Create an invalid HTML file that causes your parser (or mine) to fail. Modify the parser to recover from the error and produce a DOM tree for your test file.


If you want to skip parsing completely, you can build a DOM tree programmatically instead, by adding some code like this to your program (in pseudo-code; adjust it to match the DOM code you wrote in Part 1):

// <html><body>Hello, world!</body></html>
let root = element("html");
let body = element("body");
body.children.push(text("Hello, world!"));

Or you can find an existing HTML parser and incorporate it into your program.

The next article in this series will cover CSS data structures and parsing.