Polyglot Parser

C, Treesitter

PoC for Project 1: Implement MultiLang Parser

This project implements a multilanguage parser for 3 languages (python, javascript, ruby) as part of the Proof of Concept for MetaCall's Project 1. The aim was take in source files and parse them using Tree-sitter API, create AST and reduced IR. Adapting the generated IR to a unified standard format. It also explores dependency graphs (minimal for this PoC) which is necessary for Function Mesh and Intellisense.

DEMO VIDEO: https://www.youtube.com/watch?v=RP18MPPS8g0

How to Use

[NOTE] This PoC is currently tested on a MacOS environment and might cause errors on a different setup

# build (Ninja)
cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Release
cmake --build build -j

# for help
./build/polygot_parser -h

# parse a single file
./build/polygot_parser -f examples/example.js

# parse a directory recursively
./build/polygot_parser -d examples/

CLI Reference

  polygot_parser -f <file>           parse one file
  polygot_parser -f <file1> <file2>  parse more than one file
  polygot_parser -d <directory>      parse all supported files in a directory

Other options:
  -o <output.json>        write JSON to file (default: stdout)
  -h, --help              show this help

The output is the unified JSON format which can be consumed by the VS Code extension (Intellisense) and the Function Mesh.

FileTree Structure

├─ CMakeLists.txt
├─ README.md
├─ src/
│  ├─ main.c                      # main cli logic is defined here
│  ├─ parser.c                    # the parser logic is handled here
│  └─ parser.h
├─ adapters/                       # extracts language specific queries
│  ├─ adapters.h
│  ├─ adapters.c
│  ├─ python_adapter.c           
│  ├─ js_adapter.c              
│  └─ ruby_adapter.c        
├─ ir/
│  ├─ ir.h                  
│  └─ ir.c                  # creates the Intermediate Representation(IR)
├─ graph/
│  ├─ graph.h
│  └─ graph.c              # creates a minimal dependency graph
├─ exporter/
│  ├─ mc_export.h   
│  └─ mc_export.c        # exports to a json output which can be consumed later
├─ tests/               
└─ examples/        # some example files 
  • CMakeLists.txt - Build configuration and dependencies
  • src/main.c - Main CLI entry point
  • src/parser.c - parsing flow and file traversal
  • adapters - tree-sitter extration for languages(py,js,rb)
  • ir - normalized IR for symbols and exports
  • graph- dependency graph builder with a type associated with its edges and nodes
  • exporter - JSON export of IR and graph

Architecture

image

JSON Schema

{
  "languages": {
    "<lang>": {
      "functions": [
        { "name": "sum", "args": ["a","b"], "exported": true }
      ],
      "classes": [
        { "name": "Calculator", "args": [], "exported": true }
      ],
      "objects": [
        { "name": "CONFIG", "exported": true }
      ]
    }
  },
  "graph": {
    "edges": [
      {
        "from": "examples/example.js",
        "from_kind": "file",        // file | symbol | module
        "to": "examples/example.py",
        "to_kind": "module",       // file | symbol | module
        "type": "require",         // import | require | define | export | member_of
        "lang": "js"
      }
    ]
  }
}

Example Output

HELP Command

image

Output for example.js

image

Integration with Metacall VSCode Extension?

As you can see example.py has different functions defined with arguments image

I called these functions in the JS File and gave it different arguments and as you can see there are the error wriggles! Comes from the VSCode Extension :)

image

This is due to the generated parser output:

image