Options
All
  • Public
  • Public/Protected
  • All
Menu

Chevrotain

Build Status Coverage Status NPM

Chevrotain

Chevrotain is a high performance fault Tolerant Javascript parsing DSL for building recursive decent parsers.

Chevrotain is NOT a parser generator. it solves the same kind of problems as a parser generator, just without the code generation phase.

Features

  • Lexer engine based on RexExps.

    • Supports Token location tracking.
    • Supports Token skipping (whitespace/comments/...)
    • Allows prioritising shorter matches (Keywords vs Identifiers).
    • No code generation The Lexer does not require any code generation phase.
  • Parsing DSL for creating the parsing rules.

    • No code generation - the DSL is just javascript not a new external language, what is written is what will be run, this speeds up development, makes debugging trivial and provides great flexibility for inserting custom actions into the grammar.
    • Strong Error Recovery capabilities based on Antlr3's algorithms.
    • Automatic lookahead calculation for LL(1) grammars.
    • In addition custom lookahead logic can be provided explicitly.
    • Backtracking support.
  • High performance see: performance comparison

  • Grammar Introspection, the grammar's structure is known and exposed this can be used to implement features such as automatically generated syntax diagrams or Syntactic error recovery.

  • Well tested with ~100% code coverage

Installation

  • npm: npm install chevrotain
  • Bower bower install chevrotain
  • or download directly from github releases:
    • the 'chevrotain-binaries-...' files contain the compiled javascript code.

Usage example JSON Parser:

  • The following example uses several features of ES6 (fat arrow/classes). These are not mandatory for using Chevrotain, they just make the example clearer. The example is also provided in ES5 syntax

step 1: define your Tokens:


    var Token = require("chevrotain").Token

    class Keyword extends Token { static PATTERN = NA }
    class True extends Keyword { static PATTERN = /true/ }
    class False extends Keyword { static PATTERN = /false/ }
    class Null extends Keyword { static PATTERN = /null/ }
    class LCurly extends Token { static PATTERN = /{/ }
    class RCurly extends Token { static PATTERN = /}/ }
    class LSquare extends Token { static PATTERN = /\[/ }
    class RSquare extends Token { static PATTERN = /]/ }
    class Comma extends Token { static PATTERN = /,/ }
    class Colon extends Token { static PATTERN = /:/ }
    class StringLiteral extends Token { static PATTERN = /"(:?[^\\"]+|\\(:?[bfnrtv"\\/]|u[0-9a-fA-F]{4}))*"/}
    class NumberLiteral extends Token { static PATTERN = /-?(0|[1-9]\d*)(\.\d+)?([eE][+-]?\d+)?/ }
    class WhiteSpace extends Token {
        static PATTERN = /\s+/
        static GROUP = SKIPPED
    }

step 2: create a lexer from the Token definitions:


    var Lexer = require("chevrotain").Lexer

    var JsonLexer = new chevrotain.Lexer([WhiteSpace, NumberLiteral, StringLiteral,
        RCurly, LCurly, LSquare, RSquare, Comma, Colon, True, False, Null])

step 3: define the parsing rules:


    var Parser = require("chevrotain").Parser

    class JsonParser extends Parser {

       constructor(input) {
           Parser.performSelfAnalysis(this)
       }

       public object = this.RULE("object", () => {
           this.CONSUME(LCurly)
           this.OPTION(() => {
               this.SUBRULE(this.objectItem)
               this.MANY(() => {
                   this.CONSUME(Comma)
                   this.SUBRULE2(this.objectItem) 
               })
           })
           this.CONSUME(RCurly)
       })

       public objectItem = this.RULE("objectItem", () => {
           this.CONSUME(StringLiteral)
           this.CONSUME(Colon)
           this.SUBRULE(this.value)
       })

       public array = this.RULE("array", () => {
           this.CONSUME(LSquare)
           this.OPTION(() => {
               this.SUBRULE(this.value)
               this.MANY(() => {
                   this.CONSUME(Comma)
                   this.SUBRULE2(this.value)
               })
           })
           this.CONSUME(RSquare)
       })

       public value = this.RULE("value", () => {
           this.OR([
               {ALT: () => {this.CONSUME(StringLiteral)}},
               {ALT: () => {this.CONSUME(NumberLiteral)}},
               {ALT: () => {this.SUBRULE(this.object)}},
               {ALT: () => {this.SUBRULE(this.array)}},
               {ALT: () => {this.CONSUME(True)}},
               {ALT: () => {this.CONSUME(False)}},
               {ALT: () => {this.CONSUME(Null)}}
           ], "a value")
       })
    }

step 4: add custom actions to the grammar defined in step 3

  • this shows the modification for just two grammar rules.

    public object = this.RULE("object", () => {
       var items = []

       this.CONSUME(LCurly)
       this.OPTION(() => {
           items.push(this.SUBRULE(this.objectItem))  // .push to collect the objectItems
           this.MANY(() => {
               this.CONSUME(Comma)
               items.push(this.SUBRULE2(this.objectItem))  // .push to collect the objectItems 
           })
       })
       this.CONSUME(RCurly)

       // merge all the objectItems
       var obj = {}
       items.forEach((item) => {
          obj[item.itemName] = item.itemValue      
       })
       return obj
    })

    public objectItem = this.RULE("objectItem", () => {
       var nameToken = this.CONSUME(StringLiteral)
       this.CONSUME(Colon)
       var value = this.SUBRULE(this.value) // assumes SUBRULE(this.value) returns the JS value (null/number/string/...)

       var itemNameString = nameToken.image // nameToken.image to get the literalString the lexer consumed
       var itemName = itemNameString.substr(1, itemNameString.length - 2) // chop off the string quotes
       return {itemName:itemName, itemValue:value}  
    })
     ...

step 5: wrap it all together


    function lexAndParse(text) {
        var lexResult = JsonLexer.tokenize(text)
        var parser = new JsonParser(lexResult.tokens)
        return parser.object()
    }

Getting Started

The best way to start is by looking at some runable (and debugable) examples:

Documentation

Dependencies

Only a single dependency to lodash.

Compatibility

The Generated artifact(chevrotain.js) should run on any modern Javascript ES5.1 runtime.

  • The CI build runs the tests under Node.js.
  • Additionally local testing is done on latest versions of Chrome/Firefox/IE.
  • The dependency to lodash is imported via UMD, in order to make chevrotain.js portable to multiple environments.

Generated using TypeDoc