The Seen compiler is a 100% self-hosted compiler written in Seen. It follows a 5-stage pipeline and compiles through LLVM for native code generation.
Source (.seen)
→ Lexer (tokenize with language-specific keywords)
→ Parser (recursive descent → AST)
→ Type Checker (inference, validation, smart casts)
→ IR Generator (AST → LLVM IR, three-pass)
→ LLVM Backend (opt → ThinLTO → lld link)
→ Native Binary
Location: compiler_seen/src/lexer/
The SeenLexer class tokenizes source code into a stream of tokens.
KeywordManager loads TOML files from languages/ for multi-language support{expr} inside strings)byteAt() for performanceEntry point: tokenize(source: String, language: String) -> Array<Token>
Location: compiler_seen/src/parser/
Converts the token stream into an AST (Abstract Syntax Tree).
ProgramNode, FunctionNode, ClassNode, StatementNode, ParserExpressionNodeEntry point: parse(tokens: Array<Token>) -> ProgramNode
| Node | Description |
|---|---|
ProgramNode |
Root of the AST, contains all top-level declarations |
FunctionNode |
Function declaration with parameters, body, return type |
ClassNode |
Class with fields, methods, inheritance |
StatementNode |
Variable declarations, assignments, control flow |
ParserExpressionNode |
Literals, binary ops, method calls, etc. |
ItemNode |
Function parameters and items |
ParamNode |
Parameter declarations |
ImportSymbolNode |
Import declarations |
Location: compiler_seen/src/typechecker/
Performs type inference and validation on the AST.
TypeChecker class manages type inferenceEnvironment manages scoped symbol tables (push/pop scope)Entry point: check(program: ProgramNode) -> TypeInferenceResult
Location: compiler_seen/src/bootstrap/
Orchestrates the first three stages.
run_frontend() chains lexer → parser → type checkerFrontendResult with diagnosticsFrontendDiagnostic with source locationsLocation: compiler_seen/src/ir/ and compiler_seen/src/codegen/
Converts AST to LLVM IR using a three-pass approach:
The IR generator is split across 13 modules:
| Module | Purpose |
|---|---|
llvm_ir_gen.seen |
Main IR generation (14,300 lines) |
ir_declarations.seen |
Runtime function declarations |
ir_decl_features.seen |
Feature-specific declarations |
ir_decl_parser.seen |
Parser-related declarations |
ir_decl_runtime.seen |
Core runtime declarations |
ir_method_gen.seen |
Method generation |
ir_optimization.seen |
CSE cache, call count, utilities |
ir_string_collect.seen |
String constant collection |
ir_type_dispatch.seen |
Type dispatch tables |
ir_type_info.seen |
Type information |
ir_type_mapping.seen |
Seen types → LLVM types |
ir_type_ops.seen |
Type operations |
ir_type_tables.seen |
Type lookup tables |
Location: compiler_seen/src/codegen/llvm_backend.seen
Full LLVM compilation pipeline:
.ll (LLVM IR)
→ opt -O3 (optimization with vectorization, loop unrolling, inlining)
→ opt --thinlto-bc (ThinLTO bitcode)
→ clang -O3 -flto=thin -fuse-ld=lld (link)
→ Native Binary
Location: compiler_seen/src/codegen/c_gen.seen
Emits C99 code as a portable fallback. Compiled with GCC.
Each module's IR is generated in a forked child process:
.ll fileThis provides ~1.74x speedup on cold builds.
Content-addressed IR cache with per-module granularity:
Cache key format (v3):
hash("v3:" + declarationsHash + ":" + modulePath + ":" + moduleSourceHash)
declarationsHash -- hash of cross-module interfaces (struct layouts, function signatures)moduleSourceHash -- hash of the module's source codeEditing a function body in one module only recompiles that module. Cross-module interface changes (new functions, changed signatures) invalidate all caches.
Cache locations:
.seen_cache/ -- source-level cache/tmp/seen_ir_cache -- IR content-addressed cache/tmp/seen_thinlto_cache -- ThinLTO linker cache| File | Size | Purpose |
|---|---|---|
compiler_seen/src/main.seen |
-- | Production compiler entry point |
compiler_seen/src/main_compiler.seen |
-- | Stage1 compiler driver |
compiler_seen/src/bootstrap/frontend.seen |
-- | Frontend orchestration |
compiler_seen/src/codegen/llvm_backend.seen |
16KB | LLVM code generation |
compiler_seen/src/codegen/generator.seen |
41KB | Backend framework |
compiler_seen/src/ir/ir_generator.seen |
-- | Main LLVM IR generation |
compiler_seen/src/codegen/llvm_ir_gen.seen |
14K+ lines | Core IR generator |
compiler_seen/src/types/struct_layouts.seen |
-- | Canonical struct field layouts |
| Seen Type | LLVM IR Type |
|---|---|
Int |
i64 |
Float |
double |
Bool |
i1 |
String |
%SeenString (struct: {i64, ptr} = len + data) |
Char |
i64 (Unicode code point) |
Array<T> |
%SeenArray* (struct: {i64, i64, i64, ptr}) |
Class |
ptr (heap-allocated, handle-based) |
| Simple enum | i64 |
| Data enum | ptr (heap-allocated) |
compiler_seen/src/./scripts/safe_rebuild.sh to verify bootstrapSee Bootstrap System for details.
