13 Aug 2014
This is the third in a series of articles on building a toy browser rendering engine. Want to build your own? Start at the beginning to learn more:
This article introduces code for reading Cascading Style Sheets (CSS). As usual, I won't try to cover everything in the spec. Instead, I tried to implement just enough to illustrate some concepts and produce input for later stages in the rendering pipeline.
Here's an example of CSS source code:
Next I'll walk through the css module from my toy browser engine, robinson. The code is written in Rust, though the concepts should translate pretty easily into other programming languages. Reading the previous articles first might help you understand some the code below.
A CSS stylesheet is a series of rules. (In the example stylesheet above, each line contains one rule.)
A rule includes one or more selectors separated by commas, followed by a series of declarations enclosed in braces.
A selector can be a simple selector, or it can be a chain of selectors joined by combinators. Robinson supports only simple selectors for now.
Note: Confusingly, the newer Selectors Level 3 standard uses the same terms to mean slightly different things. In this article I'll mostly refer to CSS2.1. Although outdated, it's a useful starting point because it's smaller and more self-contained (compared to CSS3, which is split into myriad specs that depend on each other and CSS2.1).
In robinson, a simple selector can include a tag name, an ID prefixed by
'#', any number of class names prefixed by
'.', or some combination of the
above. If the tag name is empty or
'*' then it is a "universal selector" that
can match any tag.
There are many other types of selector (especially in CSS3), but this will do for now.
A declaration is just a name/value pair, separated by a colon and ending
with a semicolon. For example,
"margin: auto;" is a declaration.
My toy engine supports only a handful of CSS's many value types.
u8is an 8-bit unsigned integer, and
f32is a 32-bit float.
All other CSS syntax is unsupported, including @-rules, comments, and any selectors/values/units not mentioned above.
CSS has a straightforward grammar, making it easier to parse correctly than its quirky cousin HTML. When a standards-compliant CSS parser encounters a parse error, it discards the unrecognized part of the stylesheet but still processes the remaining portions. This is useful because it allows stylesheets to include new syntax but still produce well-defined output in older browsers.
Robinson uses a very simplistic (and totally not standards-compliant) parser, built the same way as the HTML parser from Part 2. Rather than go through the whole thing line-by-line again, I'll just paste in a few snippets. For example, here is the code for parsing a single selector:
Note the lack of error checking. Some malformed input like
will parse successfully and produce weird results. A real CSS parser would
discard these invalid selectors.
Specificity is one of the ways a rendering engine decides which style overrides the other in a conflict. If a stylesheet contains two rules that match an element, the rule with the matching selector of higher specificity can override values from the one with lower specificity.
The specificity of a selector is based on its components. An ID selector is more specific than a class selector, which is more specific than a tag selector. Within each of these "levels," more selectors beats fewer.
(If we supported chained selectors, we could calculate the specificity of a chain just by adding up the specificities of its parts.)
The selectors for each rule are stored in a sorted vector, most-specific first. This will be important in matching, which I'll cover in the next article.
The rest of the CSS parser is fairly straightforward. You can read the whole thing on GitHub. And if you didn't already do it for Part 2, this would be a great time to try out a parser generator. My hand-rolled parser gets the job done for simple example files, but it has a lot of hacky bits and will fail badly if you violate its assumptions. Someday I might replace it with one built on rust-peg or similar.
As before, you should decide which of these exercises you want to do, and skip the rest:
Implement your own simplified CSS parser and specificity calculation.
Extend robinson's CSS parser to support more values, or one or more selector combinators.
Extend the CSS parser to discard any declaration that contains a parse error, and follow the error handling rules to resume parsing after the end of the declaration.
Make the HTML parser pass the contents of any
<style> nodes to the CSS
parser, and return a Document object that includes a list of Stylesheets in
addition to the DOM tree.
Just like in Part 2, you can skip parsing by hard-coding CSS data structures directly into your program, or by writing them in an alternate format like JSON that you already have a parser for.
The next article will introduce the
style module. This is where
everything starts to come together, with selector matching to apply CSS styles
to DOM nodes.
The pace of this series might slow down soon, since I'll be busy later this month and I haven't even written the code for some of the upcoming articles. I'll keep them coming as fast as I can!