The Seen compiler is self-hosted and compiles through LLVM for native code generation. The shipped release binary uses compiler_seen/src/main_compiler.seen as its command entrypoint.
Source (.seen)
-> Lexer
-> Parser
-> Type checker
-> Multi-module LLVM IR generation
-> opt/llc or target compiler tools
-> Native binary or target artifact
Location: compiler_seen/src/lexer/
languages/<lang>/./// ... /// block comments.Location: compiler_seen/src/parser/
real_parser.seen.effect(Token), @using, @operator, nullable/nullish forms, when, closures, sealed classes, traits/interfaces, module namespace aliases, facade component functions, named arguments, trailing/named slot blocks, UI state / computed / uiEffect constructs, and hot-reload-facing shared-module patterns.Location: compiler_seen/src/typechecker/
Location: compiler_seen/src/bootstrap/
The bootstrap frontend wraps lexing, parsing, and type checking into the compatibility entrypoints used by Stage 1, Stage 2, the LSP, and package declaration scanning.
New bootstrap helper modules must be reachable from main_compiler.seen imports as well as from the embedded compiler-module list. That keeps older bootstrap compilers from treating new helper calls as external declarations with the wrong ABI during self-hosted rebuilds.
Location: compiler_seen/src/codegen/
The LLVM generator is now split into focused driver and helper modules rather than a single monolithic implementation. llvm_ir_gen.seen is the public facade; state-based helpers handle declarations, modules, functions, calls, binary expressions, method calls, statements, literals, member/index access, control flow, runtime declarations, and target-specific state.
Generation is organized around:
Package artifacts participate in code generation through interface indexes and object manifests: dependency declarations are scanned, provided modules are skipped for codegen, and prebuilt objects are linked into the final binary.
The refactor intentionally leaves llvm_ir_gen.seen boring. It owns the compatibility API, bridges legacy facade fields into shared state, and delegates real lowering work to smaller modules. A quick rule of thumb:
| Module family | What belongs there |
|---|---|
ir_decl_* |
declaration scanning, runtime declarations, type registration |
ir_module_* |
module entry/tail emission, string constants, object-unit flow |
ir_function_* |
function identity, attributes, entry/exit state, body setup |
ir_call_* and ir_method_* |
call planning, receiver handling, argument lowering |
ir_stmt_* and ir_*_driver |
statement/expression orchestration |
ir_*_emit and ir_*_plan |
leaf emission and small planning decisions |
Comments in these files should explain the boundary or invariant, not restate the line of code below them. Good comments answer questions such as "why is this state copied here?", "why does this pass run before that one?", or "what must be true when this helper returns?".
The shipped compiler supports the LLVM backend. It can emit native binaries and target artifacts for the platforms listed in CLI Reference. Important target controls include --target, --target-cpu, --simd, --sanitize, --pgo-generate, --pgo-use, --pic, and --object-manifest.
The compiler uses source-level and IR-level caches:
.seen_cache//tmp/seen_ir_cache/tmp/seen_thinlto_cachetarget/seen-build/runtime-objects/target/seen-build/release-lto/Cache-v4 keys use stable module identities rather than temporary bootstrap overlay paths. Source/object reuse is scoped by the compiler binary hash, compiler ABI signature, project declaration hash, module body hash, LLVM tool versions, target/profile settings, LTO/PIC/sanitizer/PGO flags, and runtime payload signatures. Body-only edits should miss the changed module's object key without flushing otherwise valid neighboring cache entries, while compiler codegen/layout changes reject stale objects automatically.
Normal multi-module compiler builds use bounded worker pools for IR generation and optimizer work. Guarded scripts derive SEEN_JOBS and SEEN_OPT_JOBS from memory caps and CPU count; the compiler also accepts --jobs <n> and --opt-jobs <n>. Low-memory and bootstrap verification paths can still force serial execution with --no-fork; guarded scripts also export SEEN_MEMORY_LIMIT_BYTES so runtime allocation-heavy compiler phases fail with Seen diagnostics instead of depending on host OOM behavior.
Release builds keep the full merged-IR LTO path by default for performance. Memory-constrained callers can pass --no-merged-release-lto to stay on the bounded per-module ThinLTO path. Warm release builds can reuse a signature-keyed merged-LTO object while preserving the default merged-LTO mode.
seen compile --emit-module-ir-dir <dir> --stop-after-ir writes raw per-module LLVM IR into a caller-owned directory and exits before object emission/linking. Packaging and cross-build scripts use this instead of scraping global /tmp module artifacts.
SEEN_TRACE_BUILD=<path> writes JSONL build events from rebuild scripts and compiler phases such as module discovery, declaration scan, cache hashing, IR/object emission, runtime object reuse, release merge, release-LTO mode, and link. SEEN_BUILD_TRACE=<path> remains a compatibility alias. Compiler trace events use millisecond timestamps and escaped JSON fields.
| Area | Purpose |
|---|---|
compiler_seen/src/main_compiler.seen |
Shipped compiler CLI and bootstrap driver |
compiler_seen/src/main.seen |
Higher-level CLI wrapper source, not the current release entrypoint |
compiler_seen/src/bootstrap/ |
Frontend orchestration and diagnostic compatibility |
compiler_seen/src/lexer/ |
Tokenization and multilingual keyword loading |
compiler_seen/src/parser/ |
AST construction |
compiler_seen/src/typechecker/ |
Type, effect, and deterministic-mode checks |
compiler_seen/src/codegen/ |
LLVM IR generation, runtime declarations, backend helpers |
seen_std/src/ |
Standard library modules |
seen_runtime/ |
C runtime primitives linked by Seen programs |
| Seen Type | LLVM IR Shape |
|---|---|
Int |
i64 |
Float |
double |
Bool |
i1 |
String |
%SeenString ({ i64, ptr }) |
Char |
i64 |
Array<T> |
runtime array handle/pointer |
| Class/value handles | pointer or handle depending on lowering path |
| Simple enum | integer tag |
| Data enum | allocated payload/tag representation |
scripts/safe_rebuild.sh only with explicit memory limits derived from current system memory.See Bootstrap System for the staged rebuild workflow.
