Update: please read the section on unsafe transformations.
This package implements a general-purpose JavaScript
parser/compressor/beautifier toolkit. It is developed on NodeJS, but it
should work on any JavaScript platform supporting the CommonJS module system
(and if your platform of choice doesn’t support CommonJS, you can easily
implement it, or discard the exports.*
lines from UglifyJS sources).
The tokenizer/parser generates an abstract syntax tree from JS code. You can then traverse the AST to learn more about the code, or do various manipulations on it. This part is implemented in parse-js.js and it’s a port to JavaScript of the excellent parse-js Common Lisp library from Marijn Haverbeke.
( See cl-uglify-js if you’re looking for the Common Lisp version of UglifyJS. )
The second part of this package, implemented in process.js, inspects and manipulates the AST generated by the parser to provide the following:
- ability to re-generate JavaScript code from the AST. Optionally indented—you can use this if you want to “beautify” a program that has been compressed, so that you can inspect the source. But you can also run our code generator to print out an AST without any whitespace, so you achieve compression as well.
- shorten variable names (usually to single characters). Our mangler will
analyze the code and generate proper variable names, depending on scope
and usage, and is smart enough to deal with globals defined elsewhere, or
with
eval()
calls orwith{}
statements. In short, ifeval()
orwith{}
are used in some scope, then all variables in that scope and any variables in the parent scopes will remain unmangled, and any references to such variables remain unmangled as well. - various small optimizations that may lead to faster code but certainly
lead to smaller code. Where possible, we do the following:
- foo[“bar”] ==> foo.bar
- remove block brackets
{}
- join consecutive var declarations: var a = 10; var b = 20; ==> var a=10,b=20;
- resolve simple constant expressions: 1 +2 * 3 ==> 7. We only do the replacement if the result occupies less bytes; for example 1/3 would translate to 0.333333333333, so in this case we don’t replace it.
- consecutive statements in blocks are merged into a sequence; in many cases, this leaves blocks with a single statement, so then we can remove the block brackets.
- various optimizations for IF statements:
- if (foo) bar(); else baz(); ==> foo?bar():baz();
- if (!foo) bar(); else baz(); ==> foo?baz():bar();
- if (foo) bar(); ==> foo&&bar();
- if (!foo) bar(); ==> foo||bar();
- if (foo) return bar(); else return baz(); ==> return foo?bar():baz();
- if (foo) return bar(); else something(); ==> {if(foo)return bar();something()}
- remove some unreachable code and warn about it (code that follows a
return
,throw
,break
orcontinue
statement, except function/variable declarations).
UglifyJS tries its best to achieve great compression while leaving the semantics of the code intact. In general, if your code logic is broken by UglifyJS then it’s a bug in UglifyJS and you should report it and I should fix it. :-)
However, I opted to include the following potentially unsafe transformations as default behavior. Discussion is welcome, if you have ideas of how to handle this better, or any objections to these optimizations, please let me know.
The following transformations occur:
new Array(1, 2, 3, 4) => [1,2,3,4]
Array(a, b, c) => [a,b,c]
new Array(5) => Array(5)
new Array(a) => Array(a)
These are all safe if the Array name isn’t redefined. JavaScript does allow one to globally redefine Array (and pretty much everything, in fact) but I personally don’t see why would anyone do that.
UglifyJS does handle the case where Array is redefined locally, or even
globally but with a function
or var
declaration. Therefore, in the
following cases UglifyJS doesn’t touch calls or instantiations of Array:
// case 1. globally declared variable
var Array;
new Array(1, 2, 3);
Array(a, b);
// or (can be declared later)
new Array(1, 2, 3);
var Array;
// or (can be a function)
new Array(1, 2, 3);
function Array() { ... }
// case 2. declared in a function
(function(){
a = new Array(1, 2, 3);
b = Array(5, 6);
var Array;
})();
// or
(function(Array){
return Array(5, 6, 7);
})();
// or
(function(){
return new Array(1, 2, 3, 4);
function Array() { ... }
})();
// etc.
There is a helper script now — bin/uglifyjs
— that uses the library to
compress a script using the maximum compression settings. Synopsis:
uglifyjs [ options... ] [ filename ]
filename
should be the last argument and should name the file from which
to read the JavaScript code. If you don’t specify it, it will read code
from STDIN.
Supported options:
-b
or--beautify
— output indented code; when passed, additional options control the beautifier:-i N
or--indent N
— indentation level (number of spaces)-q
or--quote-keys
— quote keys in literal objects (by default, only keys that cannot be identifier names will be quotes).
-nm
or--no-mangle
— don’t mangle variable names-ns
or--no-squeeze
— don’t callast_squeeze()
(which does various optimizations that result in smaller, less readable code).-mt
or--mangle-toplevel
— mangle names in the toplevel scope too (by default we don’t do this).--no-seqs
— whenast_squeeze()
is called (thus, unless you pass--no-squeeze
) it will reduce consecutive statements in blocks into a sequence. For example, “a = 10; b = 20; foo();” will be written as “a=10,b=20,foo();”. In various occasions, this allows us to discard the block brackets (since the block becomes a single statement). This is ON by default because it seems safe and saves a few hundred bytes on some libs that I tested it on, but pass--no-seqs
to disable it.--no-dead-code
— by default, UglifyJS will remove code that is obviously unreachable (code that follows areturn
,throw
,break
orcontinue
statement and is not a function/variable declaration). Pass this option to disable this optimization.-nc
or--no-copyright
— by default,uglifyjs
will keep the initial comment tokens in the generated code (assumed to be copyright information etc.). If you pass this it will discard it.-o filename
or--output filename
— put the result infilename
. If this isn’t given, the result goes to standard output (or see next one).--overwrite
— if the code is read from a file (not from STDIN) and you pass--overwrite
then the output will be written in the same file.--ast
— pass this if you want to get the Abstract Syntax Tree instead of JavaScript as output. Useful for debugging or learning more about the internals.-v
or--verbose
— output some notes on STDERR (for now just how long each operation takes).--extra
— enable additional optimizations that have not yet been extensively tested. These might, or might not, break your code. If you find a bug using this option, please report a test case.--unsafe
— enable other additional optimizations that are known to be unsafe in some contrived situations, but could still be generally useful. For now only this:- foo.toString() ==> foo+””
--max-line-len
(default 32K characters) — add a newline after around 32K characters. I’ve seen both FF and Chrome croak when all the code was on a single line of around 670K. Pass –max-line-len 0 to disable this safety feature.
Symlink the lib directory as ~/.node\_libraries/uglifyjs, so that the require calls in the following sample will work:
var jsp = require("uglifyjs/parse-js");
var pro = require("uglifyjs/process");
var orig_code = "... JS code here";
var ast = jsp.parse(orig_code); // parse code and get the initial AST
ast = pro.ast_mangle(ast); // get a new AST with mangled names
ast = pro.ast_squeeze(ast); // get an AST with compression optimizations
var final_code = pro.gen_code(ast); // compressed code here
The above performs the full compression that is possible right now. As you
can see, there are a sequence of steps which you can apply. For example if
you want compressed output but for some reason you don’t want to mangle
variable names, you would simply skip the line that calls
pro.ast_mangle(ast)
.
Some of these functions take optional arguments. Here’s a description:
jsp.parse(code, strict_semicolons)
– parses JS code and returns an AST.strict_semicolons
is optional and defaults tofalse
. If you passtrue
then the parser will throw an error when it expects a semicolon and it doesn’t find it. For most JS code you don’t want that, but it’s useful if you want to strictly sanitize your code.pro.ast_mangle(ast, do_toplevel)
– generates a new AST containing mangled (compressed) variable and function names. By default it doesn’t touch the names defined in the toplevel scope, but if you passtrue
as second argument it will compress them as well.pro.ast_squeeze(ast, options)
– employs further optimizations designed to reduce the size of the code thatgen_code
would generate from the AST. Returns a new AST.options
can be a hash; the supported options are:make_seqs
(default true) which will cause consecutive statements in a block to be merged using the “sequence” (comma) operatordead_code
(default true) which will remove unreachable code.
pro.gen_code(ast, beautify)
– generates JS code from the AST. By default it’s minified, but if you passtrue
for the second argument it will be nicely formatted and indented. Additionally, you can control the behavior by passing a hash forbeautify
, where the following options are supported (below you can see the default values):indent_start: 0
– initial indentation in spacesindent_level: 4
– indentation level, in spaces (pass an even number)quote_keys: false
– if you passtrue
it will quote all keys in literal objects
The beautifier can be used as a general purpose indentation tool. It’s useful when you want to make a minified file readable. One limitation, though, is that it discards all comments, so you don’t really want to use it to reformat your code, unless you don’t have, or don’t care about, comments.
In fact it’s not the beautifier who discards comments — they are dumped at the parsing stage, when we build the initial AST. Comments don’t really make sense in the AST, and while we could add nodes for them, it would be inconvenient because we’d have to add special rules to ignore them at all the processing stages.
(XXX: this is somewhat outdated. On the jQuery source code we beat Closure by 168 bytes (560 after gzip) and by many seconds.)
There are a few popular JS minifiers nowadays – the two most well known being the GoogleClosure (GCL) compiler and the YUI compressor. For some reason they are both written in Java. I didn’t really hope to beat any of them, but finally I did – UglifyJS compresses better than the YUI compressor, and safer than GoogleClosure.
I tested it on two big libraries. DynarchLIB is my own, and it’s big enough to contain probably all the JavaScript tricks known to mankind. jQuery is definitely the most popular JavaScript library (to some people, it’s a synonym to JavaScript itself).
I cannot swear that there are no bugs in the generated codes, but they appear to work fine.
Compression results:
Library | Orig. size | UglifyJS | YUI | GCL |
---|---|---|---|---|
DynarchLIB | 636896 | 241441 | 246452 (+5011) | 240439 (-1002) (buggy) |
jQuery | 163855 | 72006 | 79702 (+7696) | 71858 (-148) |
UglifyJS is the fastest to run. On my laptop UglifyJS takes 1.35s for DynarchLIB, while YUI takes 2.7s and GCL takes 6.5s.
GoogleClosure does a lot of smart ass optimizations. I had to strive really
hard to get close to it. It should be possible to even beat it, but then
again, GCL has a gazillion lines of code and runs terribly slow, so I’m not
sure it worths spending the effort to save a few bytes. Also, GCL doesn’t
cope with eval()
or with{}
– it just dumps a warning and proceeds to
mangle names anyway; my DynarchLIB compiled with it is buggy because of
this.
UglifyJS consists of ~1100 lines of code for the tokenizer/parser, and ~1100 lines for the compressor and code generator. That should make it very maintainable and easily extensible, so I would say it has a good place in this field and it’s bound to become the de-facto standard JS minifier. And I shall rule the world. :-) Use it, and spread the word!
Unfortunately, for the time being there is no automated test suite. But I ran the compressor manually on non-trivial code, and then I tested that the generated code works as expected. A few hundred times.
DynarchLIB was started in times when there was no good JS minifier. Therefore I was quite religious about trying to write short code manually, and as such DL contains a lot of syntactic hacks[1] such as “foo == bar ? a = 10 : b = 20”, though the more readable version would clearly be to use “if/else”.
Since the parser/compressor runs fine on DL and jQuery, I’m quite confident that it’s solid enough for production use. If you can identify any bugs, I’d love to hear about them (use the Google Group or email me directly).
[1] I even reported a few bugs and suggested some fixes in the original parse-js library, and Marijn pushed fixes literally in minutes.
- Project at GitHub: http://github.com/mishoo/UglifyJS
- Google Group: http://groups.google.com/group/uglifyjs
- Common Lisp JS parser: http://marijn.haverbeke.nl/parse-js/
- JS-to-Lisp compiler: http://github.com/marijnh/js
- Common Lisp JS uglifier: http://github.com/mishoo/cl-uglify-js
UglifyJS is released under the BSD license:
Copyright 2010 (c) Mihai Bazon <[email protected]> Based on parse-js (http://marijn.haverbeke.nl/parse-js/). Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.