Wednesday, September 05, 2012

A Meaningful Client Side Alternative To node require()

TL;DR

Here you have the solution to all your problems ... no? So take a break, and read through :)

Current Status

Using a generic "module loader" for JavaScript files is becoming a common technique, and full of wins, not only on the node.js server side. There are different client side alternatives, not entirely based over the module concept, and apparently only one concrete proposal, called AMD, able to work in both server and client following a build/parsing procedure.
However, as somebody pointed out already ...

AMD is Not the Answer

This post does not even tell everything bad about AMD problems, and I am getting there, but it's surely a good starting point.
Tobie article is another resource that inspired somehow what I am going to propose here, so before discriminating this or that technique, I hope you'll find time to understand the whole problem ... found it? Let's go on then :)

The Beauty Of node.js require Function

I don't think I should even spend many words here ...

// module: compress

// in the global scope, it's just a module
// so no pollution to the global/window object
// no need to put everything in yet another closure
var fs = require("fs");

// just as example
this.compress = function (file, fn) {
fs
.createReadStream(file)
.pipe(
// check this out!
require("zlib").createGzip()
)
.pipe(
fs.createWriteStream(file + ".gz")
).on("close", function () {
var content = fs.readFileSync(file + ".gz");
fs.unlinkSync(file + ".gz");
fn(content);
})
;
};

So basically, with a synchronous require(module) call we can decide whenever we want where that module should be included in our logic. This means we don't need to think in advance to all needed dependencies for our current module, as well as this doesn't require to split in many nested functions our asynchronous logic.
Look at above example ... in node world the require("fs") is performed basically in every single module ... the FileSystem is that important, hell yeah!
We dont' care much about requiring it at the very beginning of above stupid module 'cause surely that operation will cost nothing! Each module is cached indeed so there's actually zero impact if we use a require hundreds of time inline ... really, it's that fast, but the very first one might cost a bit.
This is why require("zlib") is loaded only once, and only when it's needed ... so that memory, disk, cpu, and everything else, can live in peace before the very first call of that exported method while after that, that call will cost again nothing.

Is a mandatory nested function call through AMD logic as fast as above example? I tell you, NO, but this is not the real problem, is it?

AMD Does Not Truly Scale

... as simple as that. If you need a build process behind something that could fail in any case, I would say you are doing it wrong. I am talking about current era, where internet connection could be there or not ... I am talking about mobile and I give you a very basic example.
I have created my freaking cool website or webapp that loads asynchronously everything and I am on the road with my supa-dupa smart phone.
When "suddenly, le wild spot without internet coverage appears"!
Now I press that lovely button in the lovely application that loads asynchronously the lovely next part of my action and guess what happens ... are you there? Nothing!
That instant without my mobile network will screw up my flow, will break my action, will let me wait "forever" and in most common cases, it will block me to use properly the app.
Of course, if that action required server side interaction, that action won't work in any case ... but how come I don't even see a notification? How come pieces of my application are not showing up telling me that there is actually a problem, rather than letting me wait without any notification, 'couse probably even notification logic was created as AMD module?

And what if connection is simply bad? The whole application or website is responding decently but that part, oh gosh how much I was planning to use that app part, is taking ages to load! How would you feel in front of such program?

AMD Resolves Lazy/Circular Load Synchronously

That's correct ... gotcha! Have you ever read what you should do in order to resolve modules that load asynchronously other modules inside themselves?

//Inside b.js:
define(["require", "a"],
function(require, a) {
//"a" in this case will be null if a also asked for b,
//a circular dependency.
return function(title) {
// me: Excuse me ... WUT?
return require("a").doSomething();
}
}
);

So here the thing, if "by accident" you have a circular dependency you should use require() as node.js does ... oh well ...

On Circular Dependencies / Cycles

This topic is one of the biggest paradox about programming ... we try to decouple everything, specially when it comes to write modules, so that not a single part of the app should be aware of the surrounding environment ... isn't it? Then someone said that circular dependencies are bad ... but how come?
Here an example, a truly stupid one:
Hi, I am a human, the human module, and I am completely sufficient, but I need to go in a place that would take too much by my own ... so I need the module car.

Hi, I am a car, the module car, and I am fabulous by my own, but I need the module human to be able to go somewhere.

A partner, better than a car, could also explain the fact we actually think in circular references all the time ... am I wrong?
The AMD take in this case is that "we should change the require logic when this happens and we should be aware in advance in order to solve this" ... yeah, nice, so now in AMD we have two ways to require and return what we export ... and once again, in my humble opinion, this does not scale ... at all!

Double Effort, Double Code!

With AMD we don't only need to change our AMD code/style/logic when things are not known in advance, as showed before ... we also need to write code for modules that is not compatible with node.js, resulting in such piece of redundant code that I believe nobody truly want to write more than once in a programmer life.
Take an excellent library as lodash is, and check what it has to do in order to be compatible with AMD too ...

// expose Lo-Dash
// some AMD build optimizers, like r.js, check for specific condition patterns like the following:
if (typeof define == 'function' && typeof define.amd == 'object' && define.amd) {
// Expose Lo-Dash to the global object even when an AMD loader is present in
// case Lo-Dash was injected by a third-party script and not intended to be
// loaded as a module. The global assignment can be reverted in the Lo-Dash
// module via its `noConflict()` method.
window._ = lodash;

// define as an anonymous module so, through path mapping, it can be
// referenced as the "underscore" module
define(function() {
return lodash;
});
}
// check for `exports` after `define` in case a build optimizer adds an `exports` object
else if (freeExports) {
// in Node.js or RingoJS v0.8.0+
if (typeof module == 'object' && module && module.exports == freeExports) {
(module.exports = lodash)._ = lodash;
}
// in Narwhal or RingoJS v0.7.0-
else {
freeExports._ = lodash;
}
}
else {
// in a browser or Rhino
window._ = lodash;
}

Now ... ask yourself, is this what you want to write for every single module you gonna create that might work in both server and client side?

Combined AMD Modules

Here another possibility, super cool ... we can combine all modules together into a single file: YEAH! But if this what you do for your project, don't you wonder what is the point then to use all those "asynchronous ready" callbacks if these will be executed in a synchronous way in production? Was that different syntax truly needed? And what about JS engines parsing time? Is processing the whole project at once in a single file a good thing for both Desktop and Mobile?
Why are you developing asynchronously with all those nested callbacks if you provide a synchronous build? Is the code size affected? Does all this make any sense?

AMD, The Good Part... Maybe

OK, there must be some part of this logic that conquered many developers out there .. and I must admit AMD "solved with nonchalance" the fact JavaScript, in the client side, has always had problems with the global scope pollution.
The fact AMD forces us to write the module inside a function that receives already arguments with other needed modules, is a win ... but wait a second, wasn't that boring before that nobody until now wrote a single bloody closure to avoid global scope pollution?
I think AMD is a side effect, with all possible noble and good purpose, of a general misunderstanding of how is JS code sharing across libraries.
Let's remember we never even thought about modules until we started clashing with all possible each other polluting namespaces, global variables, Object.prototype, and any sort of crap, thinking we are the only script ever running in a web page ... isn't it?
So kudos for AMD, at least there is a function, but where the hack is the "use strict" directive suggested for every single bloody AMD module in any example you can find in the documentation? where is the global pollution problem solver, if developers are not educated or warned about the problem itself?

node.js require ain't gold neither

When network, roundtrips and latency come into the game, the node.js require() solution does not fit, scale, work, neither.
If you understand how Loading from node_modules Folders logic works, and you have an extra look into All Together diagram, you will realize that all those checks, performed through an HTTP connection, won't ever make sense on the client side.
Are we stuck then? Or there's some tiny little brick in the middle that is not used, common, public, or discussed yet?

A node require() Like, For Client Side

Eventually, here where I was heading since the beginning: my require_client proposal ... gosh it took long!
Rewind ... How about having the best from all techniques described "here and there" in order to:
  • avoid big files parsing, everything is parsed once and on demand
  • provide an easy to use builder for production ready single file code
  • use one single syntax/model/style to define modules for both node or client side
  • solve cycles exactly as node does
  • forget 10 lines of checks to export a bloody module
  • organize namespaces in folders
  • obtain excellent performance
  • make a single file distributable
... and finally, compare results against all other techniques?
Here, the AMD loader, versus the inline and DOM script injection loader, versus the dev version of my proposal, and finally the production/compiled version of my proposal ... how about that? You can use any console, profiler, dev tool/thing you want, to compare results, it's about a 150Kb total amount of scripts with most of them loaded ASAP, and one loaded on "load" event.
You can measure jquery, underscore, backbonejs, and the ironically added as last script head.js script there within their loading/parsing/ready time.

Reading Results

If you think nowadays the DOMContentLoaded event is all you need to startup faster your web page/app, you are probably wrong. DOMContentLoaded event means almost nothing for real UX, because a user that has a DOM ready but can't use, because modules and logic are not loaded yet, or see, because CSS and images have not been resolved yet, the page/app, is simply a "user waiting" rather than interacting and nothing else.
Accordingly, if you consider the code flow and the time everything is ready to be used, the compiled require() method is the best option you have: it's freaking fast!

"use strict" included

The best part I could think about, is the "use strict"; directive by default automatically prepended to any module that is going to be parsed.
This is a huge advantage for client side code because while we are able to create as many var as we want in the module scope, the engine parser and compiler will instantly raise an error the line and column we forgot a variable declaration. All other safer things are in place and working but .. you know, maybe you don't want this?
That's why the require_client compiler makes the strict configuration property easy to spot, configure, and change ... as long as you know why you are doing that, everything is fine.

How Does It Work

The compiler includes a 360 bytes once minzipped function that when is not optimized simply works through XHR.
This function could be the very only global function you need, since every module is evaluated sandboxed and with all node.js module behaviors.
You can export a function itself, you can export a module, you can require inside a module, you can do 90% of what you could do in a node.js environment.
You don't need to take care of global variables definition, those won't affect other modules.
What you should do, is to remember that this is the client so the path you choose in the development version, is the root, as any node_modules folder would be.
If you clone the repository, you can test via copy and paste the resulting build/require.js in whatever browser console traps such require_client ~/folder_you_put_staff/require_client/js or require_client ~/folder_you_put_staff/require_client/cycles.
require("a") in the first case and require("main") in the second.
In order to obtain a similar portable function you should create a folder with all scripts and point to that folder via require_client so that a script with all inclusions will be created.

A Basic Example

So here what's the require_client script is able to produce.
Let's imagine this is our project folder structure:

project/
css/
js/
require.dev.js
index.html

The index.html file can simply have a single script in its header that includes require.dev.js and the bootstrap module through require("main");, as example.
So, let's imagine we have module a, b, and main inside the js folder, OK?
require_client project/js project/require.js
This call will produce the require.js file such as:

/*! (C) Andrea Giammarchi */
var require=function(c,d,e){function l(n,m){return m?n:g[n]}function b(o){var m=a[o]={},n={id:o,parent:c,filename:l(o,h),web:h};n[k]=m;d("global","module",k,(e.strict?"'use strict';":"")+l(o)).call(m,c,n,m);j.call(m=n[k],i)||(m[i]=h);return m}function f(m){return j.call(a,m)?a[m]:a[m]=b(m)}var k="exports",i="loaded",h=!0,a={},j=a.hasOwnProperty,g={
"a": "console.log(\"a starting\");exports.done=false;var b=require(\"b\");console.log(\"in a, b.done = \"+b.done);exports.done=true;console.log(\"a done\");",
"main": "console.log(\"main starting\");var a=require(\"a\");var b=require(\"b\");console.log(\"in main, a.done=\"+a.done+\", b.done=\"+b.done);",
"b": "console.log(\"b starting\");exports.done=false;var a=require(\"a\");console.log(\"in b, a.done = \"+a.done);exports.done=true;console.log(\"b done\");"
};f.config=e;f.main=c;return f}(this,Function,{
strict:true
});
Now the index.html could simply include reuiqre.js rather than the dev version ;)
Above program is the same showed in node.js API documentation about cycles. If you copy and paste this code in any console and you write after require("main"); you'll see the expected result.
As summary, require_client is able to minify and place inline all your scripts, creating modules names based on files and folders hierarchy.
All modules will be evaluated with a global object, already available, as well as module, exports, and the latter used as the module context.
The simple object based cache system will ensure these modules are evaluated once and never again.

What's YAGNI

Few things, that could change, are not in on purpose. As example, the module.parent is always the global object, since in fact, it's in the global scope, through Function compilation, that the module will be parsed the very first time. Not sure we need a complicated mechanism to chain calls and also this mechanism is error prone.
If you have 2 scripts requiring the same module, first come, first serve. The second one should not affect runtime the already parsed module changing suddenly the module.parent property ... you know what I mean?
The path resolution is a no-go, rather than trying to fix all possible messes with paths and OS, put your files in a single JS folder and consider that one your virtual node_modules one for clients.
If you have folder links inside the JS folder it's OK, but if you have recursive links you are screwed. Please keep it simple while I think how to avoid such problem within the require_client logic, thanks.

What Is Not There Yet

If your project is more than a megabyte minzipped, you might want to be able to split in different chunks the code so that the second, last, injected, require, won't disturb the first one. This is not in place yet since this free time project was born for a small/medium web app I am working on that will be out pretty soon ... an app that surely does not need such amount of code as many other web app should NOT ... but you know, I've been talking about scaling so much that a note about the fact this solution won't scale so nicely with massive projects is a must say.

UpdateJust landed a version that does not cause conflicts with require itself. The first defined require will be the one used everywhere so it's now possible to name a project and include it in the main page so ... parallel projects are now available ;)

If you would like to reuse node modules that work in the client side too, you needto copy them inside the path folder.
The configuration object is the one you can find at the end of the require.js file ... there are two defaults there, but you can always change them via require.config.path = "different"; as well as you can drop the require.config.strict = false; directive so that modules will be evaluated without "use strict"; directive.
Anything else? Dunno ... you might come up with some hint, question, suggestion. And thanks for reading :), I know it was a long one!

Last Thoughts

If AMD and RequireJS comes with a compiler able to make everything already available somehow, think how much pointless become the optimization once you can have already available all dependencies without needing to write JS code in a different way, regardless it's for node.js or the client web normal code.
There are NOT really many excuse to keep polluting the global scope with variables, we have so may alternatives today that keep doing it would result as evil as any worst technique you can embrace in JS world.

9 comments:

Daniel Steigerwald said...

Google Closure done it well.

Andrea Giammarchi said...

Daniel I don't understand your comment ... if it's about require() VS Google Closure it's pointless 'cause Google Closure does not support CommonJS require() module syntax, neither is able to lazy evaluate compressed code.

If it's about suggeting the usage of google closure compiler, rather than UglifyJS or YUICompressor, then it might be another alternative but imho not that needed.

If it was about something else, please explain, thanks:)

Gianluca Guarini said...

Hi Andrea,
I have tested your script (I really like your idea) and it works perfectly but right now it is able to load only javascript modules and it doesn't allow us to require javascript templates (or anything else). I think that the power of require.js is in its plugins that can extend its core covering all developers needs. By the way it is a great proposal and I hope to get soon an extended version of your script.

Bravo!

Andrea Giammarchi said...

Gianluca ... whatever node JS based plugin you are talking about is automatically compatible with this implementation of require ... or maybe I did not understand what kind of "templates" you are talking about? Please tell me more, thanks :)

Gianluca Guarini said...

Ciao Andrea,
I think you didn't understand what I meant, please let me clarify my topic:
when we implement an AMD structure we want be able also to load and compile our HTML using javascript templates (check this out requirejs-tpl and organizing backbone using modules).

This means that we need something like this:

require("my_javascript_library");
require("my_javascript_model");
require("my_javascript_template.html");

Require.js solves this problem with an elegant solution:

define([
"my_javascript_library",
"my_javascript_model",
"text!my_javascript_template.html"
], function(lib,model,tpl) {
// ok now I can compile my model having a nice js template
});

I think it is not that hard to extend your script having also this feature.
Hoping that my previous post is more clear right now.

Andrea Giammarchi said...

so the template AMD double win is that these are not lazy loaded and these are precompiled ... so basically AMD is a 100% non-sense ... awesome.

Anyway, it's a matter of defining modules able to export a pre-compiled version of the template.

The require_client in this case should be run *after* the generic "create_templates" put file.html.js inside the same folder? ... not sure how node.js solves this ... but I'll try to provide an example or a solution that scales, cheers.

Andrea Giammarchi said...

Gianluca ... after long meditation on it ... I have decided that require() has nothing to do with templates and these two topics should be split.

require() is about modules ... templating is another story. Another task to put in the build process, not something directly able to affect code modules logic/mechanism ... imho

Gianluca Guarini said...

Hi Andrea,
I have tested requrejs, your require proposal and the default script include on a complex real project. Before telling you the result I should tell you that I was an AMD fan but then luckly I changed my opinion.

1 - using requirejs or your proposal it is almost impossible for someone else to start working on a project because it is hard to have an overview about all the js modules used ( A developer should open each js file trying to understand where and when a module is included and why )

2 - nodejs is using the require method just because it has no DOM environment in which include other modules so this is the smartest solution to solve this problem; so why then if we have already the DOM to include our modules we should do it in a different way?

3 - requirejs and your require script are both no standard and the performance improvement is irrelevant

4 - It could be useful to use requirejs or require_client to package our files in production but so why we should reinvent the wheel if there are smartest solutions like grunt or Codekit doing it as well without many ceremonies?

5 - The AMD technology allows us to load asynchronously javascript templates and in most cases they are less then 30rd line of code so why we can not embed them directly into the page? Do we really need all these http requests?

My win right now is to use the default script tag to emebed my modules and then to minify everything in production with Codekit, I think it is not too hard to keep the dev environment clean for a good web developer having this kind of initial asset.

kind regards
Gianluca

p.s. se dovessi passare a Zurigo mi farebbe piacere bere una birretta per approfondire queste tematiche e ovviamente sarrebbe un onore conoscerti vista la tua grande competenza cheers

Andrea Giammarchi said...

1 - same with anything else you are using ... you can use the require version that is not packed and produce the build so no differences with any other solution you are adopting right now ... you don't build during development

2 - I am using smartest solution too then ... cause I have all files in my inline script but I load them synchronously, runtime, and only once, caching them, when needed. Same concept indeed with the build version for production which means, better performance. Your current approach is to load all files in memory at once, even if you don't need them. Node does not do it, neither this proposal.

3 - performance irrelevant ... prove it on mobile or consider that Google and many others have adopted this technique of lazy evaluating because of performance. I guess we didn't just wake up thinking we improved them ... right?

4 - no wheel reinvented here because there's no tool able to produce this proposal. None of them, but I could create some plugin for grunt I guess ... good hint

5 - AMD allows you to load asynch then you say you don't want to load async and bring templates together, pre compiled, 'cause in your case these are few lines ... well, this does not scale plus require() modules has *nothing* to do with templates. I have no idea why AMD decided to include templates ... it seems so wrong for me I don't think I even care? Sorry but mixing templating and modules ... uhm, MVC anybody?

Keep env clean is not a tool matter, is a developer matter.
With this proposal nothing is less clean except you need one script, the require one, and eventually to require the page/module you want ... that's it.

In any case ... you really don't have to use this technique, if you are happy with AMD :)

I like node.js module system, the API is locked as ultra stable, and nobody complained there.

On web, nobody thought about this solution, everything included, still evaluated as modules.

It won't be the first, neither the last one, I wrote something that in a couple of years will be widely used so, if right now it's just me ... it's OK :D