In my previous article, I was talking about Node.js and why it’s fast. Today, I want to talk about V8.
To understand why that is, you need to know some basics of V8 implementation. It’s a huge topic, so I will only explain key features of V8 in this post. If you want more details, such as hidden classes, SSA, IC,… they will be in my next article.
In computer science, an abstract syntax tree (AST), or just a syntax tree, is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code. The syntax is “abstract” in not representing every detail appearing in the real syntax.
What can we do with AST? Before doing any optimisations on your code, it first needs to be transformed into a lower level representation. That’s essentially what AST is. AST contains all the required information about your code for V8 to analyse.
In the beginning, all of your code is transformed into AST and going through a full-codegen compiler.
It takes AST, walks over the nodes and emits calls to a macro-assembler directly. The result of this operation is a generic native code.
That’s it. Nothing special. No optimisations are performed here, complicated cases of your code are handled by emitting calls to runtime procedures, all local variables are stored on the heap, etc.
The most interesting part is when V8 sees that your function is hot and it’s time to optimise it. That’s when a Crankshaft compiler comes into play.
As I mentioned before, the full-codegen compiler generates a generic native code with the code that collects type-feedback information about your functions. When function becomes hot (hot function is a function that is called often), Crankshaft can use AST with that information to compile optimised code for your function. Afterwards, optimised function will replace un-optimised using on-stack replacement (OSR).
However, the optimised function doesn’t cover all the cases. If something went wrong with types, for instance, function returns float number instead of integers, optimised function will be de-optimised and replaced with old un-optimised code. We don’t want that, do we?
For example, you have a function that adds two numbers:
const add = (a, b) => a + b; // Let's say we have a lot of calls like this add(5, 2); // ... add(10, 20);
When you call this function many times with integers only, type-feedback information comprises information that our a and b arguments are integers. Using that information and AST of this function, Crankshaft can optimise this function. But everything would break if you made a call like this:
add(2.5, 1); // float number as the first argument
Based on the previous type-feedback information, Crankshaft assumes that only integers are going through this function, but we’re passing it a float number. There is no optimised code to handle that case, so it just de-optimises.
You might ask, how does all this magic work in Crankshaft? Well, there are a few parts that work in the Crankshaft compiler together:
- Type feedback (already discussed above);
- Hydrogen compiler;
- Lithium compiler;
Hydrogen takes AST with type-feedback information as its input. Based on that information, it generates high-level intermediate representation (HIR) which comprises a control-flow graph (CFG) in static-single assignment form (SSA).
During the generation of HIR, several optimisations are applied, such as constant folding, method in-lining,… (I will talk about V8 optimisation tricks in the next article).
The result is an optimised control-flow graph (CFG) that is used as input to the next compiler — Lithium compiler, which generates the actual code.
The lithium compiler takes optimised HIR and translates it to a machine-specific low-level intermediate representation (LIR). The LIR is conceptually similar to machine code, but still mostly platform-independent.
During LIR generation Crankshaft still can apply some low-level optimisations to it. After LIR was generated, Crankshaft generates a sequence of native instructions for each lithium instruction.
Afterwards, the resulting native instructions are executed.
I will talk about V8 optimisation tricks, ways to profile bottlenecks in your code and how to look for de-optimisations, investigating the control-flow graph (CFG) in the following articles.
Eugene Obrezkov aka ghaiklor, Developer Advocate at Onix-Systems, Kirovohrad, Ukraine.