Hi everyone! My name is Eugene Obrezkov, and today I want to talk about one of the “scariest” platforms — Node.js. I will answer one of the most complicated questions about Node.js — “How does Node.js work?”
I will present this article as if Node.js didn’t exist at all. This way, it should be easier for you to understand what’s going on under the hood.
The code found in this post is taken from existing NodeJS sources, so after reading this article, you should be more comfortable with NodeJS.
What do we need this for?
The first question that may come to your mind — “What do we need this for?”
Here, I’d like to quote Vyacheslav Egorov:
Just Do It!
Let’s go back to 2009, when Node.js started its way.
How would we do this? The first thing that comes to my mind is…
Not really! Here are the questions that need to be answered:
Do we need all the DOM stuff that the browser gives us? — No! It’s overhead.
Do we need browser at all? No!
Virtual Machine (VM)
VM provides a high-level abstraction — that of a high-level programming language (compared to the low-level ISA abstraction of the system).
VM is designed to execute a single computer program by providing an abstracted and platform-independent program execution environment.
I suggest that we choose Google’s V8, why? Because it’s faster than other VMs. I think you’ll agree that execution speed is important for back-end.
Let’s look at V8 and how it can help to build Node.js.
Via V8 Templates!
You can create a set of templates and then use them. Accordingly, you can have as many templates as you want.
And V8 has two types of templates: Function Templates and Object Templates.
Finally, we can create simple Node.js, combining all the techniques described above 🙂
What if we don’t need fs right now? What if we don’t need crypto features at all? What about not getting modules from global scope, but requiring them on demand? What about not writing C++ code in one big file with all the C++ callbacks in there? So question mark means…
Let’s start with C++ Module Loader first.
C++ Module Loader
There will be a lot of C++ code here, so try not to lose your mind 🙂
Let’s start with basics of all module loaders. Each module loader must have a variable that contains all modules (or information on how to get them). Let’s declare C++ structure to store information about C++ modules and name it node_module.
We can store information about existing modules in this structure. As a result, we have a simple dictionary of all available C++ modules.
I will not explain all the fields from the structure above, but I want you to pay attention to one. In nm_filename we can store filename of our module, so we know where to load it from. In nm_register_func and nm_context_register_func, we can store functions we need to call when the module is required. These functions will instantiate Template instance. And nm_modname can store module name (not filename).
Next, we need to implement helper methods that work with this structure. We can write a simple method that can save information to our node_module structure and then use this method in our module definitions. Let’s call it node_module_register.
As you can see, all we are doing here is just saving new information about module into our structure node_module.
Now we can simplify registering process using a macro. Let’s declare a macro you can use in your C++ module. This macro is just a wrapper for node_module_register method.
First macro is a wrapper for node_module_register method. The other one is just a wrapper for previous macro with some pre-defined arguments. As a result we have a macro that accepts two arguments: modname and regfunc. When it’s called, we are saving new module information in our node_module structure. What do modname and regfunc mean? Well… modname is just our module name, like fs, for instance. regfunc is a module method we talked about earlier. This method should be responsible for V8 Template initialization and assigning it to ObjectTemplate.
As you can see, we can declare each C++ module within a macro that accepts module name (modname) and initialization function (regfunc) that will be called when the module is required. All we need to do is just create C++ methods that can read that information from node_module structure and call regfunc method.
Let’s write a simple method that will search for a module in the node_module structure by its name. We’ll call it get_builtin_module.
This will return declared module if name matches the nm_modname from node_module structure.
At this stage, you can call process.binding(‘fs’) for instance, and get native bindings for it.
Here is an example of a built-in module with omitted logic for simplicity.
Hopefully, you are still following along.
We need to require both types. That means we need to know how to grab NativeModule from Node.js and Module from your working directory.
Let’s start with NativeModule first.
Now, to load a module, you call NativeModule.require() method with module name you want to load. This will first check if module already exists in cache, if so — gets it from cache, otherwise the module is compiled, cached and returned as exports object.
Let’s inspect cache and compile methods now.
All cache does is just setting NativeModule instance to a static object _cache in NativeModule.
Another thing is Module loader implementation.
Module loader implementation is the same as with NativeModule, the difference is that sources are not taken from node_natives.h header file, but from files we can read with fs native module. So we are doing all the same stuff as wrap, cache and compile, only with sources read from the file.
Great, now we know how to require native modules or modules from your working directory.
Node.js Runtime Library?
We can start with proxying all our native modules to global scope and setting up other global variables. It’s just a lot of code that does something like global.Buffer = NativeModule.require(‘buffer’) or global.process = process.
There’s not much I can add here, it’s simple, if you want more details though, you can look at src/node.js file in node repository. This file is executing at Node.js runtime and uses all the techniques, described in this article.
But all the above can’t do any asynchronous stuff yet. All the operations like fs.readFile() are synchronous at this point.
What do we need for asynchronous operations? An event loop…
Event loop is message dispatcher that waits for and dispatches events or messages in a program. It works by making a request to some internal or external event provider (which blocks the request until an event has arrived), and then it calls the relevant event handler (dispatches the event). The event loop may be used with a reactor if the event provider follows the file interface which can be selected or polled. The event loop almost always operates asynchronously with the message originator.
V8 can accept event loop as an argument when you are creating V8 Environment. But before setting up an event loop to V8 we need to implement it first…
So, we can include libuv sources into Node.js and create V8 Environment with libuv default event loop in there. Here is an implementation.
CreateEnvironment method accepts libuv event loop as a loop argument. We can call Environment::New from V8 namespace and send there libuv event loop and then configure it in V8 Environment. That’s how Node.js became asynchronous.
I’d like to talk about libuv more and tell you how it works, but that’s another story for another time 🙂
Thanks to everyone who has read this post to the end. I hope you enjoyed it and learned something new. If you found any issues or something, comment and I’ll reply as soon as possible.
Eugene Obrezkov, Technical Leader at Onix-Systems, Kirovohrad, Ukraine.