Syntax Analysis (Parsing)
What Is Syntax Analysis?
Once the lexer has broken source code into tokens, syntax analysis (or parsing) takes over. It arranges these tokens into a parse tree (or syntax tree) based on the grammar of the programming language.
We are just trying to ask -
“Do these tokens form a valid statement in our language?”
Grammar and Parsers
The syntax of a programming language is defined using a context-free grammar (CFG):
-
A set of rules (productions) that describe how statements are structured
-
For example:
stmt → if ( expr ) stmt expr → ID > NUM
Parsers are built to check input against these rules.
Types of Parsers
Two major categories:
- Top-down Parsers (predictive, recursive descent)
- Try to build the parse tree from the root down
- Simpler to understand and implement
- Bottom-up Parsers (shift-reduce, LR parsers)
- Build from the leaves up
- Handle a wider range of grammars
Syntax Errors
Parsers detect syntax errors like a missing parenthesis or a wrong operator order.
Example:
if (x > 0 // missing closing )
The parser reports this before moving to the next phase.
Conclusion
- Parsing arranges tokens into a parse tree using context-free grammar
- The parse tree is often simplified into an abstract syntax tree (AST) for later phases.
- It checks if the code is syntactically valid
- Tools like Yacc/Bison are used to generate parsers in real compilers