Grammar
JavaScript has one of the most challenging grammar to parse, this tutorial details all the sweat and tears I had while learning it.
LL(1) Grammar
According to Wikipedia,
an LL grammar is a context-free grammar that can be parsed by an LL parser, which parses the input from Left to right
The first L means the scanning the source from Left to right, and the second L means the construction of a Leftmost derivation tree.
Context-free and the (1) in LL(1) means a tree can be constructed by just peeking at the next token and nothing else.
LL Grammars are of particular interest in academia because we are lazy human beings and we want to write programs that generate parsers automatically so we don't need to write parsers by hand.
Unfortunately, most industrial programming languages do not have a nice LL(1) grammar, and this applies to JavaScript too.
INFO
Mozilla started the jsparagus project a few years ago and wrote a LALR parser generator in Python. They haven't updated it much in the past two years and they sent a strong message at the end of js-quirks.md
What have we learned today?
- Do not write a JS parser.
- JavaScript has some syntactic horrors in it. But hey, you don't make the world's most widely used programming language by avoiding all mistakes. You do it by shipping a serviceable tool, in the right circumstances, for the right users.
The only practical way to parse JavaScript is to write a recursive descent parser by hand because of the nature of its grammar, so let's learn all the quirks in the grammar before we shoot ourselves in the foot.
The list below starts simple and will become difficult to grasp, so please take grab a coffee and take your time.
Identifiers
There are three types of identifiers defined in #sec-identifiers,
IdentifierReference[Yield, Await] :
BindingIdentifier[Yield, Await] :
LabelIdentifier[Yield, Await] :VariableDeclaration[In, Yield, Await] : BindingIdentifier[?Yield, ?Await] Initializer[?In, ?Yield, ?Await] opt
Initializer[In, Yield, Await] : = AssignmentExpression[?In, ?Yield, ?Await]
PrimaryExpression[Yield, Await] :
IdentifierReference[?Yield, ?Await]
``````rust
pub struct BindingIdentifier {
pub node: Node,
pub name: Atom,
}
pub struct IdentifierReference {
pub node: Node,
pub name: Atom,
}
``````rust
// https://github.com/swc-project/swc/blob/f9c4eff94a133fa497778328fa0734aa22d5697c/crates/swc_ecma_parser/src/parser/class_and_fn.rs#L85
fn parse_class_inner(
&mut self,
_start: BytePos,
class_start: BytePos,
decorators: Vec<Decorator>,
is_ident_required: bool,
) -> PResult<(Option<Ident>, Class)> {
self.strict_mode().parse_with(|p| {
expect!(p, "class");EscapeSequence :: LegacyOctalEscapeSequence NonOctalDecimalEscapeSequence
It is a Syntax Error if the source text matched by this production is strict mode code.
https://github.com/tc39/test262/blob/747bed2e8aaafe8fdf2c65e8a10dd7ae64f66c47/test/language/literals/string/legacy-octal-escape-sequence-prologue-strict.js#L16-L19
``````javaScript
function foo(
value = (function() {
return "\01";
}()),
) {
"use strict";
return value;
}FunctionDeclaration : FunctionExpression :
It is a Syntax Error if FunctionBodyContainsUseStrict of FunctionBody is true and IsSimpleParameterList of FormalParameters is false.
function foo(a, b) {
"use strict";
if (b === void 0) b = "\01";
}
``````javascript
> fn = function () {};
> fn.name
< "fn"
> (fn) = function () {};
> fn.name
< ''Statement[Yield, Await, Return] : ... lots of statements
Declaration[Yield, Await] : ... declarations
if (x) {
function foo() {}
} else function bar() {}
``````javascript
<Foo
bar={() => {
baz: "quaz";
}}
/>
// ^^^^^^^^^^^ `LabelledStatement`
``````javascript
let a;
let = foo;
let instanceof x;
let + 1;
while (true) let;
a = let[0];RelationalExpression[In, Yield, Await] : [+In] RelationalExpression[+In, ?Yield, ?Await] in ShiftExpression[?Yield, ?Await] [+In] PrivateIdentifier in ShiftExpression[?Yield, ?Await]
Note 2: The [In] grammar parameter is needed to avoid confusing the in operator in a relational expression with the in operator in a for statement.
https://github.com/acornjs/acorn/blob/11735729c4ebe590e406f952059813f250a4cbd1/acorn/src/scope.js#L30-L35
``````javascript
function foo() {
if (true) {
var bar;
function bar() {} // redeclaration error
}
}
``````javascript
function foo() {
var bar;
function bar() {}
}
``````rust
// https://github.com/rome/tools/blob/5a059c0413baf1d54436ac0c149a829f0dfd1f4d/crates/rome_js_parser/src/state.rs#L404-L425
pub(crate) struct ParsingContextFlags: u8 {
/// Whether the parser is in a generator function like `function* a() {}`
/// Matches the `Yield` parameter in the ECMA spec
const IN_GENERATOR = 1 << 0;
/// Whether the parser is inside a function
const IN_FUNCTION = 1 << 1;
/// Whatever the parser is inside a constructor
const IN_CONSTRUCTOR = 1 << 2;
/// Is async allowed in this context. Either because it's an async function or top level await is supported.
/// Equivalent to the `Async` generator in the ECMA spec
const IN_ASYNC = 1 << 3;
/// Whether the parser is parsing a top-level statement (not inside a class, function, parameter) or not
const TOP_LEVEL = 1 << 4;
/// Whether the parser is in an iteration or switch statement and
/// `break` is allowed.
const BREAK_ALLOWED = 1 << 5;
/// Whether the parser is in an iteration statement and `continue` is allowed.
const CONTINUE_ALLOWED = 1 << 6;extend interface AssignmentExpression { left: Pattern; }
interface VariableDeclarator <: Node {
type: "VariableDeclarator";
id: Pattern;
init: Expression | null;
}interface Identifier <: Expression, Pattern { type: "Identifier"; name: string; }
interface ObjectPattern <: Pattern { type: "ObjectPattern"; properties: [ AssignmentProperty ]; }
interface ArrayPattern <: Pattern { type: "ArrayPattern"; elements: [ Pattern | null ]; }
// AssignmentExpression:
{ foo } = bar;
^^^ IdentifierReference
[ foo ] = bar;
^^^ IdentifierReference
// VariableDeclarator
var { foo } = bar;
^^^ BindingIdentifier
var [ foo ] = bar;
^^^ BindingIdentifier
``````rust
enum Pattern {
Identifier, // Is this a `BindingIdentifier` or a `IdentifierReference`?
ArrayPattern,
ObjectPattern,
}13.15 Assignment Operators
AssignmentExpression[In, Yield, Await] : LeftHandSideExpression[?Yield, ?Await] = AssignmentExpression[?In, ?Yield, ?Await]
13.15.5 Destructuring Assignment
In certain circumstances when processing an instance of the production AssignmentExpression : LeftHandSideExpression = AssignmentExpression the interpretation of LeftHandSideExpression is refined using the following grammar:
AssignmentPattern[Yield, Await] : ObjectAssignmentPattern[?Yield, ?Await] ArrayAssignmentPattern[?Yield, ?Await]
14.3.2 Variable Statement
VariableDeclaration[In, Yield, Await] :
BindingIdentifier[?Yield, ?Await] Initializer[?In, ?Yield, ?Await]opt
BindingPattern[?Yield, ?Await] Initializer[?In, ?Yield, ?Await]
``````rust
enum BindingPattern {
BindingIdentifier,
ObjectBindingPattern,
ArrayBindingPattern,
}
enum AssignmentPattern {
IdentifierReference,
ObjectAssignmentPattern,
ArrayAssignmentPattern,
}
``````javascript
a / b;
a / / regex /;
a /= / regex /;
/ regex / / b;
/=/ / /=/;InputElementDiv :: WhiteSpace LineTerminator Comment CommonToken DivPunctuator <---------- the / and /= token RightBracePunctuator
InputElementRegExp :: WhiteSpace LineTerminator Comment CommonToken RightBracePunctuator RegularExpressionLiteral <-------- the RegExp token
a / / regex /
^ ------------ PrimaryExpression:: IdentifierReference
^ ---------- MultiplicativeExpression: MultiplicativeExpression MultiplicativeOperator ExponentiationExpression
^^^^^^^^ - PrimaryExpression: RegularExpressionLiteralRegularExpressionLiteral :: / RegularExpressionBody / RegularExpressionFlags
PrimaryExpression[Yield, Await] :
CoverParenthesizedExpressionAndArrowParameterList[?Yield, ?Await]
When processing an instance of the production
PrimaryExpression[Yield, Await] : CoverParenthesizedExpressionAndArrowParameterList[?Yield, ?Await]
the interpretation of CoverParenthesizedExpressionAndArrowParameterList is refined using the following grammar:
ParenthesizedExpression[Yield, Await] :
( Expression[+In, ?Yield, ?Await] )ArrowFunction[In, Yield, Await] : ArrowParameters[?Yield, ?Await] [no LineTerminator here] => ConciseBody[?In]
ArrowParameters[Yield, Await] : BindingIdentifier[?Yield, ?Await] CoverParenthesizedExpressionAndArrowParameterList[?Yield, ?Await]
let foo = (a, b, c); // SequenceExpression
let bar = (a, b, c) => {}; // ArrowExpression
^^^^^^^^^ CoverParenthesizedExpressionAndArrowParameterListCallExpression : CoverCallExpressionAndAsyncArrowHead
When processing an instance of the production CallExpression : CoverCallExpressionAndAsyncArrowHead the interpretation of CoverCallExpressionAndAsyncArrowHead is refined using the following grammar:
CallMemberExpression[Yield, Await] : MemberExpression[?Yield, ?Await] Arguments[?Yield, ?Await]
AsyncArrowFunction[In, Yield, Await] :
CoverCallExpressionAndAsyncArrowHead[?Yield, ?Await] [no LineTerminator here] => AsyncConciseBody[?In]
CoverCallExpressionAndAsyncArrowHead[Yield, Await] :
MemberExpression[?Yield, ?Await] Arguments[?Yield, ?Await]
When processing an instance of the production
AsyncArrowFunction : CoverCallExpressionAndAsyncArrowHead => AsyncConciseBody
the interpretation of CoverCallExpressionAndAsyncArrowHead is refined using the following grammar:
AsyncArrowHead :
async [no LineTerminator here] ArrowFormalParameters[~Yield, +Await]
``````javascript
async (a, b, c); // CallExpression
async (a, b, c) => {} // AsyncArrowFunction
^^^^^^^^^^^^^^^ CoverCallExpressionAndAsyncArrowHead13.2.5 Object Initializer
ObjectLiteral[Yield, Await] : ...
PropertyDefinition[Yield, Await] : CoverInitializedName[?Yield, ?Await]
Note 3: In certain contexts, ObjectLiteral is used as a cover grammar for a more restricted secondary grammar. The CoverInitializedName production is necessary to fully cover these secondary grammars. However, use of this production results in an early Syntax Error in normal contexts where an actual ObjectLiteral is expected.
13.2.5.1 Static Semantics: Early Errors
In addition to describing an actual object initializer the ObjectLiteral productions are also used as a cover grammar for ObjectAssignmentPattern and may be recognized as part of a CoverParenthesizedExpressionAndArrowParameterList. When ObjectLiteral appears in a context where ObjectAssignmentPattern is required the following Early Error rules are not applied. In addition, they are not applied when initially parsing a CoverParenthesizedExpressionAndArrowParameterList or CoverCallExpressionAndAsyncArrowHead.
PropertyDefinition : CoverInitializedName I* t is a Syntax Error if any source text is matched by this production.
13.15.1 Static Semantics: Early Errors
AssignmentExpression : LeftHandSideExpression = AssignmentExpression
If LeftHandSideExpression is an ObjectLiteral or an ArrayLiteral, the following Early Error rules are applied:
* LeftHandSideExpression must cover an AssignmentPattern.
``````javascript
({ prop = value } = {}); // ObjectAssignmentPattern
({ prop: value }); // ObjectLiteral with SyntaxError
``````javascript
let { x = 1 } = ({ x = 1 } = { x: 1 });
```