Matt Brubeck

13 Aug 2014

Let's build a browser engine!

Part 3: CSS

This is the third in a series of articles on building a toy browser rendering engine. Want to build your own? Start at the beginning to learn more:

This article introduces code for reading Cascading Style Sheets (CSS). As usual, I won’t try to cover everything in the spec. Instead, I tried to implement just enough to illustrate some concepts and produce input for later stages in the rendering pipeline.

Anatomy of a Stylesheet

Here’s an example of CSS source code:

h1, h2, h3 { margin: auto; color: #cc0000; }
div.note { margin-bottom: 20px; padding: 10px; }
#answer { display: none; }

Next I’ll walk through the css module from my toy browser engine, robinson. The code is written in Rust, though the concepts should translate pretty easily into other programming languages. Reading the previous articles first might help you understand some the code below.

A CSS stylesheet is a series of rules. (In the example stylesheet above, each line contains one rule.)

struct Stylesheet {
    rules: Vec<Rule>,
}

A rule includes one or more selectors separated by commas, followed by a series of declarations enclosed in braces.

struct Rule {
    selectors: Vec<Selector>,
    declarations: Vec<Declaration>,
}

A selector can be a simple selector, or it can be a chain of selectors joined by combinators. Robinson supports only simple selectors for now.

Note: Confusingly, the newer Selectors Level 3 standard uses the same terms to mean slightly different things. In this article I’ll mostly refer to CSS2.1. Although outdated, it’s a useful starting point because it’s smaller and more self-contained (compared to CSS3, which is split into myriad specs that depend on each other and CSS2.1).

In robinson, a simple selector can include a tag name, an ID prefixed by '#', any number of class names prefixed by '.', or some combination of the above. If the tag name is empty or '*' then it is a “universal selector” that can match any tag.

There are many other types of selector (especially in CSS3), but this will do for now.

enum Selector {
    Simple(SimpleSelector),
}

struct SimpleSelector {
    tag_name: Option<String>,
    id: Option<String>,
    class: Vec<String>,
}

A declaration is just a name/value pair, separated by a colon and ending with a semicolon. For example, "margin: auto;" is a declaration.

struct Declaration {
    name: String,
    value: Value,
}

My toy engine supports only a handful of CSS’s many value types.

enum Value {
    Keyword(String),
    Length(f32, Unit),
    ColorValue(Color),
    // insert more values here
}

enum Unit {
    Px,
    // insert more units here
}

struct Color {
    r: u8,
    g: u8,
    b: u8,
    a: u8,
}

Rust note: u8 is an 8-bit unsigned integer, and f32 is a 32-bit float.

All other CSS syntax is unsupported, including @-rules, comments, and any selectors/values/units not mentioned above.

Parsing

CSS has a straightforward grammar, making it easier to parse correctly than its quirky cousin HTML. When a standards-compliant CSS parser encounters a parse error, it discards the unrecognized part of the stylesheet but still processes the remaining portions. This is useful because it allows stylesheets to include new syntax but still produce well-defined output in older browsers.

Robinson uses a very simplistic (and totally not standards-compliant) parser, built the same way as the HTML parser from Part 2. Rather than go through the whole thing line-by-line again, I’ll just paste in a few snippets. For example, here is the code for parsing a single selector:

// Parse one simple selector, e.g.: `type#id.class1.class2.class3`
fn parse_simple_selector(&mut self) -> SimpleSelector {
    let mut selector = SimpleSelector { tag_name: None, id: None, class: Vec::new() };
    while !self.eof() {
        match self.next_char() {
            '#' => {
                self.consume_char();
                selector.id = Some(self.parse_identifier());
            }
            '.' => {
                self.consume_char();
                selector.class.push(self.parse_identifier());
            }
            '*' => {
                // universal selector
                self.consume_char();
            }
            c if valid_identifier_char(c) => {
                selector.tag_name = Some(self.parse_identifier());
            }
            _ => break
        }
    }
    return selector;
}

fn valid_identifier_char(c: char) -> bool {
    // TODO: Include U+00A0 and higher.
    matches!(c, 'a'..='z' | 'A'..='Z' | '0'..='9' | '-' | '_')
}

Note the lack of error checking. Some malformed input like ### or *foo* will parse successfully and produce weird results. A real CSS parser would discard these invalid selectors.

Specificity

Specificity is one of the ways a rendering engine decides which style overrides the other in a conflict. If a stylesheet contains two rules that match an element, the rule with the matching selector of higher specificity can override values from the one with lower specificity.

The specificity of a selector is based on its components. An ID selector is more specific than a class selector, which is more specific than a tag selector. Within each of these “levels,” more selectors beats fewer.

pub type Specificity = (usize, usize, usize);

impl Selector {
    pub fn specificity(&self) -> Specificity {
        // http://www.w3.org/TR/selectors/#specificity
        let Selector::Simple(ref simple) = *self;
        let a = simple.id.iter().count();
        let b = simple.class.len();
        let c = simple.tag_name.iter().count();
        return (a, b, c);
    }
}

(If we supported chained selectors, we could calculate the specificity of a chain just by adding up the specificities of its parts.)

The selectors for each rule are stored in a sorted vector, most-specific first. This will be important in matching, which I’ll cover in the next article.

// Parse a rule set: `<selectors> { <declarations> }`.
fn parse_rule(&mut self) -> Rule {
    Rule {
        selectors: self.parse_selectors(),
        declarations: self.parse_declarations()
    }
}

// Parse a comma-separated list of selectors.
fn parse_selectors(&mut self) -> Vec<Selector> {
    let mut selectors = Vec::new();
    loop {
        selectors.push(Selector::Simple(self.parse_simple_selector()));
        self.consume_whitespace();
        match self.next_char() {
            ',' => { self.consume_char(); self.consume_whitespace(); }
            '{' => break, // start of declarations
            c   => panic!("Unexpected character {} in selector list", c)
        }
    }
    // Return selectors with highest specificity first, for use in matching.
    selectors.sort_by(|a,b| b.specificity().cmp(&a.specificity()));
    return selectors;
}

The rest of the CSS parser is fairly straightforward. You can read the whole thing on GitHub. And if you didn’t already do it for Part 2, this would be a great time to try out a parser generator. My hand-rolled parser gets the job done for simple example files, but it has a lot of hacky bits and will fail badly if you violate its assumptions. Someday I might replace it with one built on rust-peg or similar.

Exercises

As before, you should decide which of these exercises you want to do, and skip the rest:

  1. Implement your own simplified CSS parser and specificity calculation.

  2. Extend robinson’s CSS parser to support more values, or one or more selector combinators.

  3. Extend the CSS parser to discard any declaration that contains a parse error, and follow the error handling rules to resume parsing after the end of the declaration.

  4. Make the HTML parser pass the contents of any <style> nodes to the CSS parser, and return a Document object that includes a list of Stylesheets in addition to the DOM tree.

Shortcuts

Just like in Part 2, you can skip parsing by hard-coding CSS data structures directly into your program, or by writing them in an alternate format like JSON that you already have a parser for.

To Be Continued…

The next article will introduce the style module. This is where everything starts to come together, with selector matching to apply CSS styles to DOM nodes.

The pace of this series might slow down soon, since I’ll be busy later this month and I haven’t even written the code for some of the upcoming articles. I’ll keep them coming as fast as I can!