A Look into Abstract Syntax Tree ( ASTs )
Abstract Syntax Tree is representation for code of for any language in form tree with syntax tokens as their nodes. In compilation or interpretation process, it is an intermediate process mostly for identifying syntax errors. But, Javascript ecosystem has taken it a step ahead! This representation is now used by incredible tools like Babel, eslint, jscodeshift and many more. These tools have made a great impact on people using javascript for there day-to-day work, ranging from improving developer experience and productivity for an individual to increasing performance and reach of products for companies. This note will give a small insight into what and how of ASTs and how to leverage it to create next big tool in the programming world!
Expectation: Knowledge of tree data structure and javascript syntax
What is Abstract Syntax Tree (AST)?
Most programming languages can be represented in form of a tree, where each "word" in the language will be a node and relationship between them will be edges. This presentation is known as Syntax Tree
.
The "word" here is any character of group of characters which have special meaning in the programming language. For example in let a = "apple";
has 5 words in it ( let
, a
, =
, "apple"
and ;
).
In the world of compilers, these words are known as Syntax Token
. Although, Syntax Tree
can be pretty difficult to interpret when read through by a human. This is where Abstract Syntax Tree comes to picture!
In Abstract syntax tree, apart from nodes for Syntax Token
s, there are, say, (meta) words, which are wrapper around individual word or collection of words.
Each node in an AST contains multiple properties which describes exact usage of this node's instance like type
, children
or value
depending on specification of the language, and it's AST.
Specification (or something similar) of a language refers to a document which contains detailed description rules by which language is written For many languages, this also contains information on nodes in ASTs and how they are related. For example:
Why are ASTs are used?
In process of compiling code from one language to another, creation of syntax tree is an intermediate step for most compilers and interpreters. Syntax trees serves many purposes, like validating syntax of the code written, identifying potential errors in the code and many more.
For sake of time, I am only going to focus on ASTs for purpose of transforming code. As AST representation of the code is just a tree with nodes of types predefined in specification, A person can just modify this tree following the rules of the language to produce source code different from the one before.
When we are at it, One can also write script to convert AST of one language / specification to another, thus producing same code for a different language 🤯. Many developer tools like Linters, Formatters and code mods and more are at core AST validators and transformers.
AST Explorer
Try AST Explorer and visualize how a given source code can be represented in AST
Quick Tip! If you want to deep dive on AST of a snippet, just click on it on source code, and it's AST will be highlighted in AST view on right side. Also, Feel free to read up on specification of a Node type for more in depth knowledge.
How to leverage AST for next big thing ?
If you are as much pumped up as me, you would like to know how to incorporate AST into you next developer tooling adventure? There are 2 pieces of software that you must have to get going with this:
- Source Code to AST parser : Using this you will get AST of the original source code you want to transform (or just read up).
- AST to Source Code writer : This is required for converting AST back into new source code corresponding to changes done to the AST.
For Example, for Javascript source code, you can use Babel
or SWC
for both.
After you have those two, you need to write up code to transform / traverse the AST and boom your own custom tool is ready ✨.
I have myself hacking with it for a while now, below are few of them for inspiration:
graphql-tag-swc-plugin : SWC plugin which parses gql
tag literals at build time, Thus avoiding runtime execution of gql
tag and improving runtime performance of the code.
const AstronautQuery = gql` query AstronautQuery($name: String!) { searchAstronaut(query: $name) { ...Astronaut } } ${...ASTRONAUT_FRAGMENT} `; // converts to const AstronautQuery = { // .... };
P.S. Under the hood it converts GraphQL compatible AST to SWC compatible AST for a JS object 😎
Next up
If you want to explore / learn more, you can start by writing up some simple babel plugins, eslint rules or code mods. Feel free to reach me out and share what you did or need some help :)
Happy hacking!