Polyglot Parser
C, Treesitter
PoC for Project 1: Implement MultiLang Parser
This project implements a multilanguage parser for 3 languages (python, javascript, ruby) as part of the Proof of Concept for MetaCall's Project 1. The aim was take in source files and parse them using Tree-sitter API, create AST and reduced IR. Adapting the generated IR to a unified standard format. It also explores dependency graphs (minimal for this PoC) which is necessary for Function Mesh and Intellisense.
DEMO VIDEO: https://www.youtube.com/watch?v=RP18MPPS8g0
How to Use
[NOTE] This PoC is currently tested on a MacOS environment and might cause errors on a different setup
# build (Ninja)
cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Release
cmake --build build -j
# for help
./build/polygot_parser -h
# parse a single file
./build/polygot_parser -f examples/example.js
# parse a directory recursively
./build/polygot_parser -d examples/
CLI Reference
polygot_parser -f <file> parse one file
polygot_parser -f <file1> <file2> parse more than one file
polygot_parser -d <directory> parse all supported files in a directory
Other options:
-o <output.json> write JSON to file (default: stdout)
-h, --help show this help
The output is the unified JSON format which can be consumed by the VS Code extension (Intellisense) and the Function Mesh.
FileTree Structure
├─ CMakeLists.txt
├─ README.md
├─ src/
│ ├─ main.c # main cli logic is defined here
│ ├─ parser.c # the parser logic is handled here
│ └─ parser.h
├─ adapters/ # extracts language specific queries
│ ├─ adapters.h
│ ├─ adapters.c
│ ├─ python_adapter.c
│ ├─ js_adapter.c
│ └─ ruby_adapter.c
├─ ir/
│ ├─ ir.h
│ └─ ir.c # creates the Intermediate Representation(IR)
├─ graph/
│ ├─ graph.h
│ └─ graph.c # creates a minimal dependency graph
├─ exporter/
│ ├─ mc_export.h
│ └─ mc_export.c # exports to a json output which can be consumed later
├─ tests/
└─ examples/ # some example files
- CMakeLists.txt - Build configuration and dependencies
- src/main.c - Main CLI entry point
- src/parser.c - parsing flow and file traversal
- adapters - tree-sitter extration for languages(py,js,rb)
- ir - normalized IR for symbols and exports
- graph- dependency graph builder with a type associated with its edges and nodes
- exporter - JSON export of IR and graph
Architecture
JSON Schema
{
"languages": {
"<lang>": {
"functions": [
{ "name": "sum", "args": ["a","b"], "exported": true }
],
"classes": [
{ "name": "Calculator", "args": [], "exported": true }
],
"objects": [
{ "name": "CONFIG", "exported": true }
]
}
},
"graph": {
"edges": [
{
"from": "examples/example.js",
"from_kind": "file", // file | symbol | module
"to": "examples/example.py",
"to_kind": "module", // file | symbol | module
"type": "require", // import | require | define | export | member_of
"lang": "js"
}
]
}
}
Example Output
HELP Command
Output for example.js
Integration with Metacall VSCode Extension?
As you can see example.py has different functions defined with arguments
I called these functions in the JS File and gave it different arguments and as you can see there are the error wriggles! Comes from the VSCode Extension :)
This is due to the generated parser output: