Let’s build a browser engine!

by Matt Brubeck


limpet.net/mbrubeck

You want to understand how a web browser works

because…

  • you’re curious.
  • you want to fix a bug in your browser.
  • you want to write better web applications.
  • you want to propose new web standards or features.

You didn’t learn about layout engines in school.

(Or if you did, please tell me about it!)

If a course on browser internals existed, what would it look like?

You will never understand any browser as well as the one you wrote yourself.

Let’s build a toy browser engine!

What’s missing?

  • networking
  • images
  • scripting

HTML Parsing


<body>
    <h1>Title</h1>
    <div id="main" class="test">
        <p>Hello <em>world</em>!</p>
    </div>
</body>
          

HTML Subset

Supported:

  • Balanced, nested tags: <p></p>
  • Attributes with quoted values: id="main"
  • Text nodes: <em>world</em>

HTML Subset

Not supported

  • Comments
  • Doctype declarations
  • Error handling (e.g. unbalanced or improperly nested tags)
  • Escaped characters (like &amp;)
  • CDATA sections
  • Self-closing tags: <br/> or <br> with no closing tag
  • Namespaces and other XHTML syntax: <html:body>
  • Character encoding detection

(a.k.a. “everything else”)

HTML Parser


struct Parser {
    pos: usize,
    input: String,
}

impl Parser {
    fn next_char(&self) -> char {
        self.input.char_at(self.pos)
    }

    fn consume_char(&mut self) -> char {
        let range = self.input.char_range_at(self.pos);
        self.pos = range.next;
        return range.ch;
    }
}
        

HTML Parser


fn consume_while(&mut self, test: |char| -> bool) -> String
    let mut result = String::new();
    while !self.eof() && test(self.next_char()) {
        result.push(self.consume_char());
    }
    return result;
}

fn parse_tag_name(&mut self) -> String {
    self.consume_while(|c| match c {
        'a'...'z' | 'A'...'Z' | '0'...'9' => true,
        _ => false
    })
}
        

Pipeline

Layout

My width depends on my parent. My height depends on my children.

…so calculate width top-down, then calculate height bottom-up.

Layout (Servo)

  1. Bubble widths (bottom-up)

  2. Assign widths (top-down)

  3. Assign height (bottom-up or in-order)

    • Includes text layout and floats
  4. Construct display list (top-down)

    • Assigns absolute positions

Layout (Robinson)


fn layout_block(&mut self, containing_block: Dimensions) {
    // Child width can depend on parent width, so we need to calculate
    // this box's width before laying out its children.
    self.calculate_block_width(containing_block);
    // Determine where the box is located within its container.
    self.calculate_block_position(containing_block);

    // Recursively lay out the children of this box.
    self.layout_block_children();

    // Parent height can depend on child height, so `calculate_height`
    // must be called after the children are laid out.
    self.calculate_block_height();
}
        

Content width: Block-level, non-replaced elements in normal flow

If all of the above have a computed value other than 'auto', the values are said to be "over-constrained" and one of the used values will have to be different from its computed value. If the 'direction' property of the containing block has the value 'ltr', the specified value of 'margin-right' is ignored and the value is calculated so as to make the equality true. If the value of 'direction' is 'rtl', this happens to 'margin-left' instead.

If there is exactly one value specified as 'auto', its used value follows from the equality.

If 'width' is set to 'auto', any other 'auto' values become '0' and 'width' follows from the resulting equality.

If both 'margin-left' and 'margin-right' are 'auto', their used values are equal. This horizontally centers the element with respect to the edges of the containing block.

http://www.w3.org/TR/CSS2/visudet.html#blockwidth


match (width == auto, margin_left == auto, margin_right == auto) {
    // If the values are overconstrained, calculate margin_right.
    (false, false, false) => {
        margin_right = Length(margin_right.to_px() + underflow, Px);
    }
    // If exactly one size is auto, its used value follows from the equality.
    (false, false, true) => { margin_right = Length(underflow, Px); }
    (false, true, false) => { margin_left  = Length(underflow, Px); }

    // If width is set to auto, any other auto values become 0.
    (true, _, _) => {
        if margin_left == auto { margin_left = Length(0.0, Px); }
        if margin_right == auto { margin_right = Length(0.0, Px); }
        if underflow >= 0.0 {
            // Expand width to fill the underflow.
            width = Length(underflow, Px);
        } else {
            // Width can't be negative. Adjust the right margin instead.
            width = Length(0.0, Px);
            margin_right = Length(margin_right.to_px() + underflow, Px);
        }
    }
    // If margin-left and margin-right are both auto, their used values are equal.
    (false, true, true) => {
        margin_left = Length(underflow / 2.0, Px);
        margin_right = Length(underflow / 2.0, Px);
    }
}
        
  • github.com/mbrubeck/robinson
  • ~900 lines of Rust code
  • parses a subset of HTML and CSS
  • simple selector matching
  • block layout (non-replaced, regular flow)
  • rasterization

<html>
  <head>
    <title>Test</title>
  </head>
  <div class="outer">
    <p class="inner"></p>
    <p class="inner" id="two"></p>
  </div>
</html>
          

            head { display: none; }
            .outer {
              background: #00ccff;
              border-color: #666666;
              border-width: 2px;
              margin: 50px;
              padding: 50px;
            }
            .inner {
              border-color: #cc0000;
              border-width: 4px;
              height: 100px;
              margin-bottom: 20px;
              width: 500px;
            }
            .inner#two { background: #ffff00; }