NCA: NC Abstract Low-level Language - Insidious Fiddler's Blog

NCA is a typed, target-independent low-level language that sits between NC source and machine code. It replaces raw asm() as the primary codegen target inside ncc and can optionally be authored by hand for performance-critical paths.

1. Overview

NCA serves three roles:

Compiler IR — ncc lowers NC source through MIR into NCA, then lowers NCA to machine code per target.
Handwritten low-level code — developers write .nca files for hot loops, crypto, memory routines, and runtime internals. One source file produces correct code for every supported target.
Debugging format — ncc -emit-nca dumps the textual NCA representation at any stage for inspection.

1.1 Design Principles

No physical registers in source. Authors write named values and stack slots; the backend owns register allocation, prologue/epilogue emission, and instruction selection.
No fallthrough. Every block ends in an explicit terminator (jmp, br, ret, trap, unreachable). This makes CFG validation trivial and simplifies SSA construction.
Scalar values only at runtime. NCA values are scalars or addresses. Aggregates exist only in memory (stack slots, data sections). There are no struct-typed SSA values.
Target predicates, not ifdefs. Top-level when clauses gate declarations by target properties. The compiler selects the most specific matching definition.
Deterministic semantics. No undef, no poison, no implicit undefined behavior. Integer arithmetic wraps. Uninitialized stack memory is zero-filled in debug builds and explicitly unspecified (but not poisoned) in release builds.
Portability by default, escape hatches by exception. The portable core covers the vast majority of handwritten low-level code. Raw asm() remains available for the irreducible platform-specific remainder.

1.2 Non-Goals

NCA does not attempt to cover:

Process entry stubs (synthesized per target by the backend).
Signal/interrupt trampolines.
Exact context-switch register save/restore.
Raw syscalls by numeric convention (wrapped by the platform runtime).
Architecture-specific SIMD/vector intrinsics (backend auto-vectorizes where possible; explicit vectors are a future extension).

2. Source Model

2.1 File Extension and Compilation

NCA source files use the .nca extension. They live alongside normal .nc files and participate in the same NC source layout. The file path, not an in-file declaration, determines where the declarations belong.

For example, std/os.nca maps to the NC path std/os. If std/os.nc also exists, both files are compiled together as part of the same source set.

2.2 Relationship to NC Modules

NCA files participate in NC’s existing module/file organization:

A .nca file contributes declarations to the same path-derived namespace as the surrounding NC source.
pub symbols follow NC’s normal cross-file visibility rules.
extern declarations reference symbols from other linked code or external libraries.

There is no separate unit declaration and no NCA-specific module layer on top of NC’s existing source tree.

2.3 Symbol Visibility

Modifier	Within compilation	Use case
(none)	File-local only	Internal helpers, private data.
`pub`	Visible by NC’s normal symbol rules	Cross-file functions and data.
`extern`	Declaration only	C library imports, runtime imports, linked code.

Dynamic export policy is intentionally not a per-function NCA attribute in v1. If NC later needs explicit shared-library exports, that should be defined at the higher-level build/package layer rather than as a separate NCA-only export keyword.

2.4 Relationship to NC’s `asm()` Builtin

NC’s existing asm() builtin remains as an unstructured, target-specific escape hatch. It emits raw textual assembly for a single target and bypasses all NCA type checking, register allocation, and portability guarantees.

NCA replaces asm() for the vast majority of low-level use cases. The intended migration path is:

Code that was using asm() for performance (memory routines, crypto, hot loops) should move to .nca files.
Code that genuinely requires target-specific instructions not in NCA’s portable core (e.g., specific cache control, platform MSRs, interrupt manipulation) stays in asm() or moves to when-gated NCA functions containing a small asm() fragment.

A future phase may introduce inline NCA blocks within .nc files (see Open Questions), but .nca files are the primary interface.

2.5 Symbol Mangling

Internal symbols use the scheme:

N$<module_path>$<identifier>$<signature_hash>

Where <module_path> is derived from the relative source path (for example, std/os.nca -> std/os). <signature_hash> is a stable hash of the canonical parameter and return types. Symbols declared with c use their bare identifier with no mangling.

3. Type System

NCA has a small, fixed set of types. All runtime values are scalar.

3.1 Integer Types

Type	Width	Signedness
`i8`	8	signed
`u8`	8	unsigned
`i16`	16	signed
`u16`	16	unsigned
`i32`	32	signed
`u32`	32	unsigned
`i64`	64	signed
`u64`	64	unsigned
`iptr`	64*	signed
`uptr`	64*	unsigned

*iptr and uptr match the target pointer width. On all currently supported targets this is 64 bits.

3.2 Floating-Point Types

Type	Width	Semantics
`f32`	32	IEEE 754 binary32
`f64`	64	IEEE 754 binary64

3.3 Other Types

Type	Width	Description
`bool`	8	0 or 1. Matches NC’s bool representation.
`addr`	64*	Opaque raw memory address.

addr is not interchangeable with integer types. Conversion requires explicit uptr.to.addr / addr.to.uptr operations.

3.4 Relationship to NC Types

NCA’s type set is intentionally wider than NC’s surface type set. NC exposes a small high-level type system (int, uint, float, bool, byte, char, str, arrays, tuples, optionals, errors, maps, and user-defined structs/enums), while NCA adds fixed-width integers (i8 through u64), f32, addr, iptr, and uptr because low-level code, memory layout, and C interop require them.

The compiler maps NC’s primitive numeric and boolean types to NCA scalars during MIR -> NCA lowering:

NC type	NCA lowering
`int`	`i64`
`uint`	`u64`
`float`	`f64`
`bool`	`bool`
`byte`	`u8`

Higher-level NC types such as char, str, arrays, tuples, optionals, maps, and errors lower structurally rather than as simple aliases; section 10 defines the interop rules.

NCA does not define aliases for NC type names. Handwritten .nca code always uses the NCA spelling (i64, not int).

3.5 No Aggregate Values

Structs, arrays, strings, and other compound types do not exist as NCA values. They live in memory and are accessed through typed loads and stores at explicit offsets. The compiler provides size_of, align_of, and offset_of as compile-time layout queries (see section 6).

3.6 Data Layout Algorithm

NC uses a single, target-independent layout algorithm for all struct and array types. This ensures that size_of, align_of, and offset_of produce identical results on every supported target.

Struct layout rules:

Fields are laid out in declaration order.
Each field is aligned to its natural alignment (the alignment of its type).
Padding bytes are inserted before a field if the current offset is not a multiple of the field’s alignment.
The struct’s overall alignment is the maximum alignment of any field.
The struct’s total size is rounded up to a multiple of its overall alignment (trailing padding).

Natural alignment of scalar types:

Type	Alignment
`i8`, `u8`, `bool`	1
`i16`, `u16`	2
`i32`, `u32`, `f32`	4
`i64`, `u64`, `f64`, `addr`, `iptr`, `uptr`	8

Array layout: Elements are tightly packed at the element’s natural alignment. size_of(T[N]) = size_of(T) * N, rounded up to align_of(T).

This layout matches the C layout (System V ABI) on all current targets (amd64, arm64), which means NC structs passed by pointer to c functions have compatible memory layout without conversion. If a future target requires different C layout rules, the compiler will insert conversion code at c boundaries rather than changing the canonical NC layout.

4. Syntax

NCA uses a line-oriented syntax with one instruction per line. Comments use //. The grammar is designed to be unambiguous without a separate lexer mode.

4.1 File Structure

nc 1

<declarations...>

The nc <version> header is mandatory and must be the first non-blank, non-comment line. It declares which NC language version the .nca file targets. The parser rejects files with an unsupported version.

A file contains zero or more top-level declarations: fn, extern fn, extern data, data, and when blocks.

4.1.1 Literal Syntax

Integer literals support decimal, hexadecimal, octal, and binary notation. Underscores may appear between digits for readability and are ignored by the parser.

42                   // decimal
0xFF_00_AA           // hexadecimal
0o777                // octal
0b1010_0011          // binary
-1                   // negative (unary minus is part of the literal for constants)

Float literals use standard decimal notation with an optional exponent:

3.14
1.0e-6
0.0

String literals use double quotes with standard C-style escapes (\n, \t, \\, \0, \xHH):

b"hello\nworld"      // raw bytes, u8[11]
c"hello\0embedded"   // null-terminated, includes the explicit \0 plus trailing \0

4.1.2 Declaration Order

Within a .nca file, declaration order does not matter. A function may call another function defined later in the same file. A data declaration may reference a symbol defined below it. The compiler processes all top-level declarations in a file before validating function bodies.

Within a function body, standard SSA dominance rules apply: a value must be defined before it is used, and must dominate every use. This is a structural property of the block-structured SSA form, not a source-ordering rule.

Stack slot declarations must appear before any blocks in the function, but their order relative to each other does not matter.

4.1.3 Line Continuation

NCA is line-oriented: one instruction per line. However, certain constructs span multiple lines.

Bracket continuation. Inside [] (data initializers, switch arms) and () (function parameters, block parameters, call arguments), newlines are treated as whitespace. This allows multi-line data initializers and long parameter lists without an explicit continuation character.

pub data crc32_table : u32[256] rodata align(64) = [
    0x00000000, 0x77073096, 0xEE0E612C, 0x1DB71064,
    0xE3630B12, 0x94643B84, 0x0D6D6A3E, 0x7A6A5AA8,
    // ... remaining entries
]

No backslash continuation. There is no \ line continuation. If a single instruction is too long, refactor it into multiple instructions using intermediate values. This keeps parsing unambiguous and the validator simple.

4.2 Function Definitions

[pub] fn <name>(<params>) [-> <return_types>], <callconv>[, <attributes>...] {
    [stack <slot_name> : <type>[<count>], align(<n>)]...

    <label>[(<block_params>)]:
        <instruction>
        ...
        <terminator>

    ...
}

Functions are SSA-form with block parameters (no phi nodes). Every function explicitly spells its calling convention. Every block ends with exactly one terminator. The entry block is the first block in the function body.

Example: portable memcpy

pub fn memcpy(dst: addr, src: addr, n: uptr) -> addr, nc {
    entry:
        %zero = const.uptr 0
        jmp loop(%zero)

    loop(%i: uptr):
        %done = cmp.ge.uptr %i, n
        br %done, exit, body(%i)

    body(%i: uptr):
        %src_p = addr.add src, %i
        %x = load.u8 %src_p
        %dst_p = addr.add dst, %i
        store.u8 %dst_p, %x
        %next = add.uptr %i, 1
        jmp loop(%next)

    exit:
        ret dst
}

4.3 Extern Declarations

extern fn puts(s: addr) -> i32, c
extern fn malloc(size: uptr) -> addr, c
extern data errno : i32

Extern functions and data are resolved at link time. The target profile determines which shared library provides each symbol (see section 12).

4.4 Data Declarations

[pub] data <name> : <type>[<count>?] [<section_class>] [align(<n>)] = <initializer>

Section classes: rodata, data, bss, tls.

pub data crc32_table : u32[256] rodata align(64) = [
    0x00000000, 0x77073096, 0xEE0E612C, ...
]

data tls_seed : u64 tls = 0

data zero_page : u8[4096] bss align(4096)

String literals produce byte arrays:

b"Hello" -> u8[5] (raw bytes, no terminator)
c"Hello" -> u8[6] (null-terminated)

Address-valued initializers support relocations:

data vtable : addr[3] rodata = [
    addr.of some_fn,
    addr.of other_fn,
    addr.of third_fn
]

Partial initialization. NCA does not support partial initialization. A data declaration must initialize all elements or none:

Fully initialized: data table : u32[4] rodata = [1, 2, 3, 4]
Zero-initialized: data table : u32[4] bss

If you need a table where most entries are zero but a few are nonzero, define it as fully initialized with all values spelled out, or initialize it at runtime by writing to a BSS-allocated buffer.

The element count in the type must match the initializer length exactly. The compiler rejects mismatches:

data table : u32[4] rodata = [1, 2, 3]       // error: expected 4 elements, got 3
data table : u32[4] rodata = [1, 2, 3, 4, 5] // error: expected 4 elements, got 5

Inferred count. If the count is omitted, it is inferred from the initializer:

data table : u32[] rodata = [1, 2, 3, 4]       // count inferred as 4
data hello : u8[] rodata = c"Hello, world!"    // count inferred from string + null

4.5 Target Predicates

Top-level when blocks conditionally include declarations based on target properties.

when arch.amd64 {
    pub fn fast_crc32(data: addr, len: uptr) -> u32, nc {
        // amd64-optimized implementation
        ...
    }
}

when arch.arm64 {
    pub fn fast_crc32(data: addr, len: uptr) -> u32, nc {
        // arm64-optimized implementation
        ...
    }
}

// Generic fallback (no `when` clause)
pub fn fast_crc32(data: addr, len: uptr) -> u32, nc {
    // portable scalar implementation
    ...
}

The compiler selects the most specific matching definition. A definition without a when clause matches all targets and serves as the fallback. Section 4.6 defines the exact resolution algorithm, including ambiguity handling.

Available predicate atoms:

Predicate atom	Meaning
`arch.amd64`	Target architecture is amd64.
`arch.arm64`	Target architecture is arm64.
`os.linux`	Target OS is Linux.
`os.darwin`	Target OS is Darwin.
`endian.little`	Target endianness is little-endian.
`endian.big`	Target endianness is big-endian.
`feature.aes`	Target exposes AES support.
`feature.crc32`	Target exposes CRC32 support.
`feature.popcnt`	Target exposes POPCNT support.
`ptr_bits.64`	Target pointer width is 64 bits.

Predicates compose with and:

when arch.amd64 and feature.aes {
    ...
}

4.6 When-Predicate Resolution

When multiple definitions of the same symbol exist, the compiler must select exactly one definition for the active target.

Definitions

A candidate set is all definitions of a given symbol (same name, same signature) within one module path’s merged source set: the .nca file plus any sibling .nc file with the same relative stem. A candidate is either bare (no when clause) or guarded (has a when clause).

The specificity of a candidate is the number of predicate atoms in its when clause. A bare candidate has specificity 0.

A candidate matches a target if every predicate atom in its when clause is true for that target. A bare candidate matches all targets.

Algorithm

For a given symbol and target:

Collect all candidates that match the target.
If the set is empty, emit a compile error: "no definition of <symbol> matches target <profile>".
Find the maximum specificity among matching candidates.
If exactly one candidate has that maximum specificity, select it.
If multiple candidates share the maximum specificity, emit a compile error: "ambiguous definitions of <symbol> for target <profile>", listing the conflicting candidates with their source locations and when clauses.

Example

Given these definitions and target linux-amd64 with feature.aes:

// (A) specificity 0 -- matches everything
pub fn encrypt(src: addr, dst: addr, len: uptr), nc { ... }

// (B) specificity 1 -- matches any amd64
when arch.amd64 {
    pub fn encrypt(src: addr, dst: addr, len: uptr), nc { ... }
}

// (C) specificity 2 -- matches amd64 with AES
when arch.amd64 and feature.aes {
    pub fn encrypt(src: addr, dst: addr, len: uptr), nc { ... }
}

All three match. Maximum specificity is 2 (candidate C). C is selected.

On target linux-arm64, only A matches. A is selected.

On target linux-amd64 without feature.aes, A and B match. Maximum specificity is 1 (candidate B). B is selected.

Ambiguity Example

when arch.amd64 {
    pub fn hash(data: addr, len: uptr) -> u64, nc { ... }
}

when os.linux {
    pub fn hash(data: addr, len: uptr) -> u64, nc { ... }
}

On linux-amd64, both match with specificity 1. This is an ambiguous error. The fix is to either combine them (when arch.amd64 and os.linux), remove one, or add a bare fallback and keep only one guarded variant at specificity 1.

Signature Matching

Two candidates are considered definitions of the same symbol if and only if:

They have the same identifier name.
They have the same parameter types in the same order.
They have the same return types in the same order.
They have the same calling convention.

If two candidates share a name but differ in any of the above, they are distinct overloads and follow NC’s normal overloading rules. Handwritten .nca should not declare multiple overloads of the same name; in practice this case mainly arises from NC source or compiler-generated NCA. when resolution only operates within one overload set at a time.

Cross-File Interaction

Predicate resolution is per module path. Two .nca files at different source paths do not participate in the same candidate set because their mangled symbols differ by module path.

The intent is that all when variants of a symbol live in the same .nca file, or in the .nc / .nca pair for the same relative stem. The linker never sees when; it only sees the single winning definition emitted for that module path.

If two separate translation units still emit the same final symbol name (for example via c ABI names, or through some future explicit symbol override mechanism), that is a duplicate-definition link error, not predicate resolution.

Extern Declarations Inside `when` Blocks

extern declarations may appear inside when blocks. This is useful when a C library function exists on one platform but not another:

when os.linux {
    extern fn epoll_create1(flags: i32) -> i32, c
}

when os.darwin {
    extern fn kqueue() -> i32, c
}

A when-gated extern declaration is only visible to code inside the same when block or to code whose own when predicate implies the extern’s predicate. Calling a when-gated extern from an ungated function is a compile error:

pub fn make_poller() -> i32, nc {
    entry:
        %fd = call epoll_create1(0)  // error: epoll_create1 only exists when os.linux
        ret %fd
}

The fix is to gate the calling function too, or provide platform-specific implementations behind when blocks with a common fallback.

Interaction with Bare NC Definitions

If crypto.nc defines a function encrypt and crypto.nca also defines encrypt, these are two definitions of the same symbol in the same module path. The compiler merges them into the same candidate set and applies the normal resolution rules:

If the .nc definition has no when equivalent (it never does; when is NCA-only), it acts as a bare candidate with specificity 0.
If the .nca definition has a when clause with specificity >= 1, the .nca version wins on matching targets and the .nc version serves as the fallback.
If both are bare (specificity 0), it is an ambiguous error.

This allows a pattern where .nc provides a readable pure-NC fallback and .nca provides an optimized version for specific targets, all under the same public API.

Missing Definitions

If no candidate matches the current target and no bare fallback exists, the compiler emits a target-specific error:

error: no definition of `fast_crc32` matches target `darwin-arm64`
  note: candidates exist for: arch.amd64, arch.amd64 and feature.crc32
  note: add a bare fallback definition, or add a `when arch.arm64` variant

A symbol with when-gated definitions but no bare fallback is valid only if at least one candidate matches the selected build target.

The compiler checks coverage only for the active target being built. It does not try to prove that every possible future target would be covered.

5. Function Qualifiers

Function signatures always spell their calling convention explicitly after the parameter list and return type.

Qualifier	Meaning
`nc`	NC stable calling convention.
`c`	Platform C calling convention.
`frameptr`	Always emit a frame pointer (default in debug builds).

There is no default calling convention in NCA v1.

The compiler infers whether a function is leaf and whether it can return normally from the actual CFG, so there are no leaf or noreturn attributes.

NCA also does not define a standalone export attribute. Any eventual shared-library export mechanism should come from NC’s higher-level packaging/build rules.

5.1 Calling Conventions

nc (NC stable ABI): The documented, stable ABI for handwritten NCA and normal NC-generated calls. Parameter and return value placement is defined per target but guaranteed stable across compiler versions.

c (Platform C ABI): The target’s native C calling convention (System V AMD64 ABI on Linux/Darwin amd64, AAPCS64 on arm64). Used for FFI with C libraries. Functions with c use their bare identifier (no mangling).

6. Compile-Time Layout Queries

NCA provides three compile-time operators for accessing NC type layout information. These resolve to integer constants during compilation.

Operator	Returns
`size_of(T)`	Size of type `T` in bytes.
`align_of(T)`	Alignment of type `T` in bytes.
`offset_of(T.field)`	Byte offset of `field` within type `T`.

These reference NC type metadata emitted by the frontend. They replace the need for generated headers (compare Go’s go_asm.h).

pub fn reader_pos(r: addr) -> uptr, nc {
    entry:
        %off = const.uptr offset_of(Reader.pos)
        %p = addr.add r, %off
        %val = load.uptr %p
        ret %val
}

6.1 Constant Expressions

Certain positions in NCA require compile-time-known values: data initializers, const.<type> operands, stack slot sizes, and alignment attributes. NCA supports a restricted set of compile-time arithmetic in these positions.

Allowed in constant expressions:

Integer literals.
size_of(T), align_of(T), offset_of(T.field).
Binary operators +, -, *, /, % on integer constants.
Parenthesized subexpressions.

Not allowed in constant expressions:

Values defined by %name = ... instructions.
Function calls.
addr.of in arithmetic.
Floating-point arithmetic.

Examples:

// Valid: offset arithmetic for nested struct access
%p = addr.add %base, offset_of(Outer.inner) + offset_of(Inner.field)

// Valid: stack slot sized to struct
stack buf : u8[size_of(MyStruct)], align(align_of(MyStruct))

// Invalid: cannot do arithmetic on addr.of
data bad : u64 rodata = addr.of some_sym + 8  // error: addr.of is not an integer constant

Link-time constants. addr.of <symbol> is a link-time constant, not a compile-time integer constant. It may appear in data initializers, but not in compile-time arithmetic expressions. If you need “address of symbol + byte offset,” compute it at runtime:

%base = addr.of some_struct
%p = addr.add %base, 8

7. Instruction Set

All typed instructions spell their full type suffix explicitly. The validator rejects abbreviated spellings such as cmp.ge.u or conversions that rely on implicit source or destination types.

7.1 Constants

%x = const.<type> <value>        // integer or float literal
%p = addr.of <symbol>            // address of a function or data symbol
%n = addr.null                   // null address

7.2 Integer Arithmetic

All integer arithmetic wraps modulo 2^N for the operand width.

%r = add.<type> %a, %b
%r = sub.<type> %a, %b
%r = mul.<type> %a, %b
%r = udiv.<type> %a, %b          // unsigned division (traps on zero divisor)
%r = sdiv.<type> %a, %b          // signed division (traps on zero divisor or MIN/-1)
%r = urem.<type> %a, %b          // unsigned remainder
%r = srem.<type> %a, %b          // signed remainder
%r = neg.<type> %a               // two's complement negation

Extended arithmetic

%r, %carry = uaddc.<type> %a, %b, %c_in   // add with carry
%r, %borrow = usubb.<type> %a, %b, %b_in  // subtract with borrow
%hi = umulh.<type> %a, %b                  // unsigned multiply high half
%hi = smulh.<type> %a, %b                  // signed multiply high half

Checked arithmetic

%r, %ov = add.ov.<type> %a, %b
%r, %ov = sub.ov.<type> %a, %b
%r, %ov = mul.ov.<type> %a, %b

For checked arithmetic, signedness comes from <type>. For example, add.ov.i64 uses signed overflow rules, while add.ov.u64 uses unsigned carry/overflow rules.

7.3 Bitwise Operations

%r = and.<type> %a, %b
%r = or.<type> %a, %b
%r = xor.<type> %a, %b
%r = not.<type> %a
%r = shl.<type> %a, %b           // shift left (count masked to width-1)
%r = lshr.<type> %a, %b          // logical shift right
%r = ashr.<type> %a, %b          // arithmetic shift right
%r = bswap.<type> %a             // byte swap
%r = clz.<type> %a               // count leading zeros
%r = ctz.<type> %a               // count trailing zeros
%r = popcnt.<type> %a            // population count
%r = rotl.<type> %a, %b          // rotate left
%r = rotr.<type> %a, %b          // rotate right

7.4 Comparison and Selection

Comparisons produce bool.

%r = cmp.eq.<type> %a, %b        // equal
%r = cmp.ne.<type> %a, %b        // not equal
%r = cmp.lt.<type> %a, %b
%r = cmp.le.<type> %a, %b
%r = cmp.gt.<type> %a, %b
%r = cmp.ge.<type> %a, %b

For integer and pointer-sized integer types, signedness comes from <type>. For example, cmp.lt.i64 is signed, while cmp.lt.u64 is unsigned. For addr, only cmp.eq.addr and cmp.ne.addr are valid.

Floating-point comparisons:

%r = cmp.oeq.<ftype> %a, %b      // ordered equal
%r = cmp.une.<ftype> %a, %b      // unordered not equal
%r = cmp.olt.<ftype> %a, %b      // ordered less than
%r = cmp.ole.<ftype> %a, %b
%r = cmp.ogt.<ftype> %a, %b
%r = cmp.oge.<ftype> %a, %b
%r = cmp.ord.<ftype> %a, %b      // both operands are not NaN
%r = cmp.uno.<ftype> %a, %b      // either operand is NaN

Selection:

%r = select.<type> %cond, %a, %b  // if cond then a else b

7.5 Floating-Point Arithmetic

All floating-point operations follow IEEE 754 with round-to-nearest-even.

%r = fadd.<ftype> %a, %b
%r = fsub.<ftype> %a, %b
%r = fmul.<ftype> %a, %b
%r = fdiv.<ftype> %a, %b
%r = frem.<ftype> %a, %b         // IEEE remainder
%r = fneg.<ftype> %a
%r = fabs.<ftype> %a
%r = sqrt.<ftype> %a
%r = copysign.<ftype> %a, %b
%r = fmin.<ftype> %a, %b         // IEEE 754-2008 minimum
%r = fmax.<ftype> %a, %b         // IEEE 754-2008 maximum

7.6 Conversions

All conversions use a uniform <src>.to.<dst> spelling. The validator checks that the operand type really is <src> and rejects unsupported source/destination pairs.

%r = <src>.to.<dst> %a

Common cases:

%r = u8.to.u64 %a        // zero-extend
%r = i8.to.i64 %a        // sign-extend
%r = u64.to.u8 %a        // truncate
%r = f64.to.i64 %a       // float -> signed int (truncate toward zero)
%r = f64.to.u64 %a       // float -> unsigned int
%r = i64.to.f64 %a       // signed int -> float
%r = u64.to.f64 %a       // unsigned int -> float
%r = f32.to.f64 %a       // widen float
%r = f64.to.f32 %a       // narrow float
%r = addr.to.uptr %a     // address -> integer
%r = uptr.to.addr %a     // integer -> address
%r = bool.to.u32 %a      // bool -> integer (0 or 1)
%r = u32.to.bool %a      // integer -> bool (nonzero = true)

7.7 Memory Operations

%r = load.<type> %addr                     // load from address
store.<type> %addr, %value                 // store to address

%r = load.<type> %addr, order(<ordering>)  // atomic load
store.<type> %addr, %value, order(<ordering>)  // atomic store

Endian-explicit loads and stores for binary protocols:

%r = load.le.<type> %addr         // load little-endian
%r = load.be.<type> %addr         // load big-endian
store.le.<type> %addr, %value
store.be.<type> %addr, %value

Bulk memory:

memcpy %dst, %src, %len           // non-overlapping copy
memmove %dst, %src, %len          // overlapping-safe copy
memset %dst, %val: u8, %len       // fill memory

7.8 Address Arithmetic

%r = addr.add %base, %offset               // base + offset (offset is iptr or uptr)
%r = addr.sub %a, %b                       // address difference

addr.add accepts an offset of type iptr (signed) or uptr (unsigned). Signed offsets are useful for negative-direction pointer arithmetic; unsigned offsets are natural for index-based access. The result is always addr.

addr.sub takes two addr operands and produces an iptr result: the signed byte distance between the two addresses. This is the only instruction whose result type differs from its operand types; the validator infers iptr automatically.

7.9 Stack Slots

Stack slots are declared at the top of a function, before any blocks. They allocate addressable memory in the function’s stack frame.

stack buf : u8[4096], align(16)
stack tmp : u64[1]

The address of a stack slot is obtained with:

%p = addr.of.stack buf

7.10 Control Flow

jmp <label>[(<args>)]                      // unconditional jump
br %cond: bool, <true_label>, <false_label>[(<args>)]  // conditional branch
switch %val: <int_type>, default <label> [
    <const> -> <label>,
    ...
]
ret [<values>]                             // return (zero or more values)
trap                                       // abnormal termination
unreachable                                // assert unreachable (UB if reached)

7.11 Calls

%r = call <fn_name>(<args>)                // direct call
%r = call.indirect %addr: addr (<args>) -> <return_types>, <callconv>
                                            // indirect call (signature required)
tailcall <fn_name>(<args>)                 // tail call (must be a terminator)

Tail call constraints: tailcall is only valid when the callee’s signature matches the caller’s return type(s) and calling convention. The callee must not be c unless the caller is also c. The compiler may reject tailcall and fall back to a regular call + ret if the target ABI or stack layout makes a true tail call infeasible (e.g., callee requires more stack argument space than the caller allocated). A rejected tailcall emits a diagnostic at -Wtailcall.

switch semantics: The integer value is compared against each constant arm. If a match is found, control transfers to that arm’s label. If no match is found, control goes to the default label. All arm constants must be the same type as the switched value and must be unique. The backend may lower switch to a jump table, binary search, or linear scan depending on density and target heuristics.

7.12 Atomic Operations

%old = atomic.rmw.add.<type> %addr, %val, order(<ordering>)
%old = atomic.rmw.sub.<type> %addr, %val, order(<ordering>)
%old = atomic.rmw.and.<type> %addr, %val, order(<ordering>)
%old = atomic.rmw.or.<type> %addr, %val, order(<ordering>)
%old = atomic.rmw.xor.<type> %addr, %val, order(<ordering>)
%old = atomic.rmw.xchg.<type> %addr, %val, order(<ordering>)

%old, %ok = cmpxchg.<type> %addr, %expected, %desired, order(<succ>, <fail>)

fence order(<ordering>)

Orderings: relaxed, acquire, release, acq_rel, seq_cst.

8. Memory Model

8.1 Alignment

All loads and stores require natural alignment by default. A load.u32 requires 4-byte alignment, a load.u64 requires 8-byte alignment, and so on. Misaligned access is undefined behavior.

For situations requiring unaligned access (binary protocol parsing, packed serialization), explicit unaligned variants are provided:

%r = load.unaligned.<type> %addr
store.unaligned.<type> %addr, %value

These are slower on most targets but always correct regardless of address alignment.

Stack slots are aligned to at least the natural alignment of their element type, or to the alignment specified by the align() attribute, whichever is greater.

8.2 Non-Atomic Memory Ordering

Non-atomic loads and stores have no ordering guarantees with respect to other threads. The compiler and hardware are free to reorder, merge, or eliminate non-atomic memory operations as long as single-threaded semantics are preserved.

To enforce ordering between non-atomic operations and atomic operations, use fence. To enforce ordering between two atomic operations, use the appropriate memory ordering on the operations themselves.

Concurrent non-atomic access to the same memory location where at least one access is a write is a data race and constitutes undefined behavior, exactly as in C11/C++11.

8.3 Atomic Memory Ordering

Atomic operations use the C11/C++11 memory model. The five orderings are:

relaxed — No ordering constraints beyond atomicity.
acquire — Subsequent reads/writes cannot be reordered before this operation.
release — Prior reads/writes cannot be reordered after this operation.
acq_rel — Combines acquire and release.
seq_cst — Total order across all seq_cst operations.

Atomic operations are restricted to types u8, u16, u32, u64, i8, i16, i32, i64, uptr, iptr, and addr. The backend guarantees lock-free atomics for all supported widths on all current targets (amd64 and arm64 both provide lock-free atomics up to 64 bits).

8.4 Address Provenance

addr values carry implicit provenance: an address derived from a stack slot is valid only within that function’s lifetime, and an address derived from a data symbol is valid for the program’s lifetime. The compiler does not track provenance formally in NCA v1, but violating provenance (e.g., using a stack address after the function returns) is undefined behavior.

Fabricating addresses from integers via uptr.to.addr produces addresses with no provenance. Such addresses may only be used to access memory that the program has independently established is valid (e.g., memory-mapped I/O regions, addresses returned by mmap via the runtime).

8.5 Bool Invariant

A bool value must always contain exactly 0 or 1. Creating a bool with any other bit pattern (e.g., by loading a u8 and treating it as bool without an explicit integer-to-bool conversion) is undefined behavior. An explicit <int>.to.bool conversion normalizes any nonzero value to 1.

9. ABI Definition

9.1 `nc` Stable ABI

The nc convention is the stable handwritten-NCA ABI. It is defined per target and guaranteed not to change within a major NC version.

Argument passing: Scalar arguments are passed in a fixed sequence of abstract argument slots. The backend maps these to physical registers or stack positions per target. The mapping is:

Target	Integer/Pointer slots	Float slots
linux/amd64	rdi, rsi, rdx, rcx, r8, r9	xmm0-xmm7
darwin/amd64	rdi, rsi, rdx, rcx, r8, r9	xmm0-xmm7
linux/arm64	x0-x7	v0-v7 (d-regs)
darwin/arm64	x0-x7	v0-v7 (d-regs)

Arguments exceeding available slots spill to the stack in declaration order, aligned to 8 bytes.

Return values: Up to two scalar return values use the first two integer or float return registers. Beyond two, the caller passes a hidden pointer to a return area.

Callee-saved registers: Defined per target. The backend handles save/restore automatically.

9.2 `c` Platform C ABI

Functions declared c follow the platform’s native C calling convention. On amd64 Linux/Darwin this is the System V AMD64 ABI. On arm64 it is AAPCS64 (with Apple’s variant on Darwin).

Restrictions in NCA v1:

Variadic functions (...) are not supported. Use C shims or runtime wrappers.
Struct-by-value passing is not supported. Pass aggregates by pointer.

10. NC/NCA Interop

NCA files contribute declarations to the same module namespace as .nc files at the same source path. This section defines how the two languages see each other’s symbols and how types map across the boundary.

10.1 Calling NCA from NC

A pub function defined in a .nca file is visible to .nc files in the same module. However, because NCA operates on lowered types (addr, i64, etc.) rather than NC surface types (str, int, etc.), the NC-facing API usually goes through a thin NC wrapper that handles type lowering explicitly.

Example:

std/hash.nca:

nc 1

// Operates on raw bytes. Caller is responsible for providing
// a valid pointer and byte length.
pub fn fnv1a_bytes(ptr: addr, byte_len: uptr) -> u64, nc {
    entry:
        ...
}

std/hash.nc:

import "std/runtime" as rt

pub fn hash_string(s: str) -> uint {
    // rt.str_data and rt.str_byte_len are runtime helpers
    // that extract the raw byte pointer and byte length
    // from an NC str value.
    return fnv1a_bytes(rt.str_data(s), rt.str_byte_len(s))
}

The compiler resolves fnv1a_bytes by looking it up in the merged declaration set for std/hash. No special import syntax or extern declaration is needed on the NC side.

There is no implicit coercion between NC types and NCA types. An NC str is not silently convertible to addr. The NC wrapper must explicitly decompose high-level values into the scalar components that the NCA function expects. This is intentional: it keeps NCA’s type boundary explicit and avoids hidden magic in the calling convention.

For NCA functions that operate on simple scalar types (i64, u64, f64, bool, u8), NC code can call them directly when the NC type maps 1:1:

// add64 is defined in .nca as: pub fn add64(a: u64, b: u64) -> u64, nc
uint result = add64(10u, 20u)  // uint maps directly to u64

10.2 Calling NC from NCA

NCA code can call functions defined in .nc files. Because NCA has no import statement, NC functions are declared as extern in the .nca file with their lowered NCA signature.

std/os.nca:

nc 1

// Declared extern -- defined in std/strings.nc
extern fn nc_str_len(s: addr) -> u64, nc

pub fn example(s: addr) -> u64, nc {
    entry:
        %len = call nc_str_len(s)
        ret %len
}

The extern tells the compiler this symbol is resolved at link time from another linked object. The mangled name must match; when both files are in the same build, the compiler can handle that transparently.

For functions in the same module path, the compiler can verify that the extern declaration’s signature matches the actual .nc definition and emit an error on mismatch. For functions in different modules, this becomes a link-time check.

10.3 Type Mapping

NC types cross the interop boundary through compiler-defined lowering rules. Primitive numeric and boolean types lower directly; higher-level NC types lower structurally or opaquely. Handwritten .nca code must use the NCA-side representation explicitly.

NC type	NCA representation	Notes
`int`	`i64`
`uint`	`u64`
`float`	`f64`
`bool`	`bool`	8-bit, 0 or 1.
`byte`	`u8`
`char`	implementation-defined	NC defines `char` as a Unicode extended grapheme cluster. Handwritten NCA should treat it as opaque unless a future ABI section defines a canonical lowered form.
`str`	`addr`	`str` is `char[]`. The safe stable representation for handwritten NCA is a pointer to the runtime string/array object, not a byte-string `(ptr,len)` pair.
`T[]` (dynamic array)	`addr`	Pointer to the dynamic array object. The exact internal field layout is a compiler/runtime detail.
`T[N]` (fixed-size array)	`addr`	Passed by address. Length is known at compile time.
`struct`	`addr`	Passed by pointer. Fields accessed via `offset_of`.
`enum`	`u64` / `i64` or `addr`	Small discriminant-only enums lower to scalars; payload-bearing forms are passed by pointer.
`(T, U)` (tuple)	Multiple scalars or `addr`	Small tuples may decompose into multi-value returns; larger tuples pass by pointer.
`T?` (optional)	Discriminant + payload or `addr`	Small optionals may lower to a `bool` plus payload; larger forms pass by pointer.
`error`	`addr`	Pointer to runtime error object; `none` lowers to `addr.null`.
`(T, error)`	Scalar(s) + `addr`	The value part lowers normally; the error part is an address.
`T!` (throwing return)	`T` + trap path	Throwing behavior lowers to control flow and runtime checks, not a distinct NCA signature shape.
`fn` value	`addr`, `addr`	Callable code pointer plus environment/context pointer for indirect calls.
`map`	`addr`	Pointer to runtime map object. Opaque at the NCA level.

Arithmetic semantics differ between NC and NCA. NC integer arithmetic is checked by default: if an int or uint operation overflows, the program panics. The MIR -> NCA lowering pass implements this by emitting add.ov.i64 / sub.ov.i64 / mul.ov.i64 instructions followed by overflow-checking branches (see section 13.3). Handwritten NCA uses plain add.i64 / sub.i64 / mul.i64, which wrap silently modulo 2^64 with no checks. This is the most important semantic difference between NC code and handwritten NCA code operating on the “same” integer types. If overflow checking is needed in handwritten NCA, the author must use the checked variants (add.ov.*, sub.ov.*, mul.ov.*) and branch on the overflow flag explicitly.

10.4 Strings and Dynamic Arrays

str in NC is char[], and char is a Unicode extended grapheme cluster rather than a byte or fixed-width scalar. Likewise, T[] is NC’s dynamic array type, not a separate slice concept.

For handwritten .nca, the safe rule is:

Treat str as an opaque addr pointing to the runtime string object.
Treat T[] as an opaque addr pointing to the runtime array object.
Do not assume a (ptr, len) calling convention for strings or arrays unless a future ABI section standardizes one.

This matters because the actual in-memory representation of char, str, and T[] is a compiler/runtime implementation detail. Handwritten NCA that needs string or array operations should prefer calling back into NC/runtime helper functions rather than decoding the representation directly.

10.5 Structs Across the Boundary

NC structs are always passed to and from NCA by pointer (addr). NCA code accesses fields using offset_of and typed loads/stores:

// NC:
// pub struct Point { x: float, y: float }

pub fn point_magnitude(p: addr) -> f64, nc {
    entry:
        %x_off = const.uptr offset_of(Point.x)
        %y_off = const.uptr offset_of(Point.y)
        %x_p = addr.add p, %x_off
        %y_p = addr.add p, %y_off
        %x = load.f64 %x_p
        %y = load.f64 %y_p
        %xx = fmul.f64 %x, %x
        %yy = fmul.f64 %y, %y
        %sum = fadd.f64 %xx, %yy
        %mag = sqrt.f64 %sum
        ret %mag
}

The NC compiler knows that point_magnitude takes a Point by pointer and generates the appropriate calling code at the call site. The .nca author is responsible for using the correct offset_of queries and field types.

10.6 What Cannot Cross the Boundary

Some NC features have no direct NCA representation and should not appear in handwritten .nca function signatures:

Direct char / str decoding. NC strings and characters are higher-level runtime values. Handwritten NCA should not hard-code assumptions about grapheme storage or array layout.
Throwing returns as explicit ABI objects. T! lowers to normal result values plus control flow and runtime checks; there is no separate NCA “throw object” type.
Opaque runtime containers. map, large optionals, payload-bearing enums, and similar high-level runtime types should be treated as opaque addresses unless a dedicated ABI rule exists for them.
Function values without context. If a function value is passed through NCA, both the code pointer and its context/environment must be preserved for call.indirect.

10.7 Mutability

NC distinguishes mutable (mut) and immutable bindings at the language level. The NC frontend enforces immutability constraints during type checking.

NCA has no concept of immutability. Any addr can be the target of a store instruction, and any stack slot can be written to at any time. The mut distinction does not survive lowering to NCA.

This means handwritten .nca code can mutate data that NC considers immutable. This is by design: NCA is an unsafe low-level layer, and restricting stores based on NC-level mutability would add complexity to the NCA validator without meaningful safety benefit, since NCA already permits arbitrary pointer arithmetic and raw memory access.

Authors of handwritten .nca code should respect the mutability contracts of the NC APIs they interact with, even though the NCA compiler does not enforce them.

10.8 Visibility Symmetry

The visibility model is symmetric:

pub in .nca makes a symbol visible to .nc files in the same module path and to other modules that import this module.
pub in .nc makes a symbol available to .nca files (via extern declarations) and to other modules.
Unprefixed (file-local) symbols in .nca are invisible to .nc, and vice versa.

There is no way for .nca code to access private NC functions, and no way for NC code to access file-local NCA functions. This preserves encapsulation in both directions.

10.9 Name Collision Rules

Within a single module path (for example, std/hash), symbol names across the merged .nc / .nca source set must be consistent, subject to when-predicate resolution (section 4.6).

Same name, same signature:

Two bare definitions of the same name with the same signature is an error.
A bare .nc definition and a when-guarded .nca definition of the same name with the same signature is valid; the .nc definition serves as the fallback.

Same name, different signatures (overloading):

NC supports function overloading: multiple functions can share a name if their parameter types differ. The compiler disambiguates overloads during type checking, and each overload gets a distinct mangled symbol because <signature_hash> differs.

Handwritten .nca files may not declare multiple overloads of the same name. NCA has no type inference or overload resolution; every call is resolved by exact name. If you need to provide NCA implementations for multiple NC overloads, give each NCA function a distinct name and have the NC overloads dispatch to them:

pub fn to_str(int v) -> str { return int_to_str_impl(v) }
pub fn to_str(float v) -> str { return float_to_str_impl(v) }

pub fn int_to_str_impl(v: i64) -> addr, nc { ... }
pub fn float_to_str_impl(v: f64) -> addr, nc { ... }

Compiler-generated NCA from overloaded NC functions is unaffected by this restriction, since the compiler produces distinct mangled names for each overload automatically.

11. Program Entry and Root-Level Code

NCA does not define module init functions, init blocks, or a magic source-level main function. In NC, program execution starts from root-level code, and the frontend lowers that root-level code into ordinary compiler-generated NCA in source order.

The backend still synthesizes a platform-specific process entry stub (for example, _start on Linux or an LC_MAIN entry on Darwin), but the exact compiler/runtime handoff symbols are intentionally outside the scope of this proposal. Those details are internal runtime contracts, not part of handwritten NCA source compatibility.

Portable handwritten .nca code may still call documented runtime helpers when they exist, but this proposal standardizes the NCA surface language, not the full runtime symbol table.

12. Target Profiles

A target profile encodes all platform-specific parameters needed to lower NCA to an executable. Profiles are built into ncc and selected at compile time.

12.1 Supported Targets (v1)

Profile	Arch	OS	Object Format	Pointer Width
`linux-amd64`	amd64	linux	ELF64	64
`linux-arm64`	arm64	linux	ELF64	64
`darwin-amd64`	amd64	darwin	Mach-O 64	64
`darwin-arm64`	arm64	darwin	Mach-O 64	64

12.2 Profile Contents

Each profile defines:

Pointer size and endianness.
Register classes and counts (integer, float, vector).
nc and c register mappings.
Callee-saved register set.
Stack alignment requirements (16 bytes on all current targets).
Legal operation set and required legalization transforms.
Instruction selection patterns.
Object format details: section names, relocation types, symbol table format.
Dynamic loader path (e.g., /lib64/ld-linux-x86-64.so.2 on linux-amd64).
Default C runtime library (libc.so.6 on Linux, libSystem.B.dylib on Darwin).
Entry stub template.
Unwind information format (DWARF CFI on Linux, compact unwind on Darwin).

12.3 Link Libraries

NCA extern declarations specify the symbol name and signature but not which library provides the symbol. Library resolution is a build-level concern.

ncc links against a minimal default set of libraries per target profile:

Profile	Default link libraries
`linux-amd64`	`libc.so.6`
`linux-arm64`	`libc.so.6`
`darwin-amd64`	`libSystem.B.dylib`
`darwin-arm64`	`libSystem.B.dylib`

Additional libraries are specified via build configuration (for example, ncc build -l png -l z). The compiler does not validate at compile time whether an extern symbol exists in any linked library; unresolved symbols are reported at link time.

For extern symbols that are part of the C standard library (malloc, free, exit, memcpy, strlen, etc.), the default link libraries are sufficient on all current targets.

13. Compiler Pipeline

NC Source (.nc)
    |
    v
[Frontend: parse, typecheck, desugar]
    |
    v
NC HIR (high-level IR)
    |
    v
[Lowering: monomorphize, inline, optimize]
    |
    v
NC MIR (mid-level IR)
    |
    v
[NCA Emit: lower MIR to NCA, resolve layouts, flatten aggregates]
    |
    v
NCA IR (typed, virtual-register, block-structured)  <-- handwritten .nca enters here
    |
    v
[NCA Validation: type check, CFG verify, terminator check]
    |
    v
[Target-Independent Passes: constant fold, DCE, simple CSE]
    |
    v
[Legalization: widen/narrow illegal types, expand unsupported ops]
    |
    v
[Instruction Selection: pattern-match NCA ops to target instructions]
    |
    v
[Register Allocation: virtual -> physical, spill/reload]
    |
    v
[Prologue/Epilogue: frame setup, callee-save, stack adjustment]
    |
    v
Machine IR (physical registers, concrete instructions)
    |
    v
[Binary Emission: encode instructions, resolve relocations]
    |
    v
Object File (.o equivalent, in-memory)
    |
    v
[Internal Linker: merge objects, resolve symbols, build executable]
    |
    v
Executable (ELF or Mach-O)

13.1 NCA Validation Pass

Before any lowering, the validator checks:

Every block ends with exactly one terminator.
Block parameter counts match at every jump/branch site.
All values are defined before use (SSA dominance).
Types match at every operation.
Stack slots are declared before any blocks.
when predicates reference only valid target properties.
No physical register names appear anywhere in the IR.

13.2 Optimization Passes

Target-independent passes on NCA IR:

Constant folding and propagation.
Dead value elimination.
Simple common subexpression elimination.
Block merging (remove trivial jumps).
Unreachable block elimination.

These are deliberately conservative. Heavy optimization happens at the MIR level for compiler-generated code. For handwritten NCA, the author is assumed to know what they want.

13.3 Safety Check Lowering Patterns

NC’s memory safety guarantees are enforced by the MIR -> NCA lowering pass, which inserts explicit checks as normal control flow. By the time code reaches NCA, all safety checks are visible as ordinary branches and calls — there are no implicit traps.

Bounds check pattern:

// NC source: arr[i]
// NCA lowering:
    %len = load.uptr %arr_len_p
    %oob = cmp.ge.uptr %i, %len
    br %oob, panic_oob, access_ok(%i)

panic_oob:
    // pass source location constants to __nc_panic_bounds
    %file = addr.of __nc_srcfile_3
    %line = const.u32 42
    call __nc_panic_bounds(%i, %len, %file, %line)
    unreachable

access_ok(%idx: uptr):
    %off = mul.uptr %idx, 8
    %elem = addr.add %arr_data, %off
    %val = load.i64 %elem
    ...

Nil check pattern:

// NC source: obj.field (where obj is nullable)
    %is_nil = cmp.eq.addr %obj, addr.null
    br %is_nil, panic_nil, deref_ok

panic_nil:
    call __nc_panic_nil(%file, %line)
    unreachable

deref_ok:
    %val = load.i64 %obj
    ...

Integer overflow check (for checked NC arithmetic):

    %result, %ov = add.ov.i64 %a, %b
    br %ov, panic_overflow, continue(%result)

panic_overflow:
    call __nc_panic_overflow(%file, %line)
    unreachable

continue(%r: i64):
    ...

This explicit lowering means handwritten .nca code is unchecked by default. Authors opting to write NCA directly take responsibility for memory safety, just as with C. The compiler does not insert bounds checks, nil checks, or overflow checks into handwritten NCA.

Note on runtime symbols. The symbols used in the examples above (__nc_panic_bounds, __nc_panic_nil, __nc_panic_overflow, __nc_srcfile_*) are illustrative names showing the pattern of lowering, not a stable API. The actual runtime symbol names, signatures, and calling conventions are internal contracts between the compiler and the NC runtime. They may change between compiler versions without notice.

Handwritten .nca code should not call these symbols directly. If a handwritten NCA function needs to signal a panic, it should call a documented, stable runtime helper when one is defined, or use the trap instruction for an immediate abort without a stack trace.

13.4 `ncc` Command-Line Interface

The compiler exposes NCA-related functionality through the following flags:

Flag	Description
`ncc build`	Compile and link all `.nc` and `.nca` files into an executable.
`ncc build --target <profile>`	Cross-compile for a specific target profile.
`-emit-nca`	Dump the NCA IR to stdout or a `.nca` file after MIR lowering.
`-emit-nca=optimized`	Dump NCA after target-independent optimization passes.
`-emit-asm`	Dump target textual assembly after instruction selection.
`-O0` / `-O1` / `-O2`	Optimization level. `-O0` disables optimizations, zero-fills stack, emits frame pointers everywhere.
`-g`	Emit DWARF debug information.
`-Wtailcall`	Warn when a `tailcall` is rejected by the backend.
`--dump-target <profile>`	Print the target profile’s register layout, ABI, and feature set.

13.5 Testing and Verification Strategy

NCA’s position in the compiler pipeline makes it a natural test boundary. The recommended testing approach:

Round-trip parsing. Every .nca file emitted by -emit-nca must parse back into an identical AST. This catches serialization bugs and ensures the textual format is canonical. Run as: ncc -emit-nca foo.nc | ncc -parse-nca -emit-nca and diff.

Validation fuzzing. The NCA validator should reject all malformed inputs without crashing. Fuzz the parser and validator with AFL/libFuzzer on randomized .nca inputs. This is especially important because handwritten .nca files are untrusted input to the compiler.

Semantic test suite. A library of small .nca programs with known outputs, compiled and executed on every supported target. Each test exercises a specific instruction or combination: arithmetic wrapping, comparison semantics by type, atomic ordering, block parameter passing, tail calls, etc.

Cross-target equivalence. For every test in the semantic suite, verify that the output is identical across all four target profiles. This is the core portability guarantee.

Instruction selection coverage. Track which NCA operations have been exercised by the test suite per target. Untested op/target combinations are flagged in CI.

ABI conformance. For c functions, generate NCA wrappers that call C test harnesses and verify that arguments and return values are passed correctly. This catches ABI mismatches between ncc and the platform C compiler.

14. Binary Emission

ncc emits final executables directly without invoking an external assembler or linker. The output is a position-independent executable (PIE) by default on all targets.

14.1 ELF (Linux)

The emitted ELF binary contains:

Section	Contents
`.text`	Executable code.
`.rodata`	Read-only data (constants, string literals).
`.data`	Initialized mutable data.
`.bss`	Zero-initialized mutable data.
`.nc.line`	NC source location line table.
`.symtab`	Symbol table (debug builds).
`.strtab`	String table for symbols.
`.dynamic`	Dynamic linking metadata.
`.rela.dyn`	RELA relocations for the dynamic linker.
`.got`	Global offset table for imported symbols.

Program headers:

PT_LOAD segments for RX (code+rodata), RW (data+bss+got), and the dynamic section.
PT_DYNAMIC pointing to the .dynamic section.
PT_GNU_STACK with NX (non-executable stack).
PT_INTERP pointing to the dynamic loader path from the target profile.

Relocations use RELA format on both amd64 and arm64. Internal references use PC-relative addressing. Imported symbols use GOT-indirect loads with R_X86_64_GLOB_DAT / R_AARCH64_GLOB_DAT relocations resolved eagerly at load time.

14.2 Mach-O (Darwin)

The emitted Mach-O binary contains:

Segment/Section	Contents
`__TEXT,__text`	Executable code.
`__TEXT,__const`	Read-only data.
`__TEXT,__nc_line`	NC source location table.
`__DATA,__data`	Initialized mutable data.
`__DATA,__bss`	Zero-initialized data.
`__DATA,__got`	Global offset table.
`__DATA,__la_symbol_ptr`	Lazy symbol pointers.

Load commands include LC_SEGMENT_64 for each segment, LC_MAIN for the entry point offset, LC_LOAD_DYLIB for imported libraries (libSystem.B.dylib at minimum), LC_DYLD_INFO_ONLY for binding opcodes, and LC_UUID for build identification.

Darwin requires code signing for execution on arm64. ncc emits an ad-hoc LC_CODE_SIGNATURE with a valid CodeDirectory hash. No Apple Developer identity is needed for local execution.

15. Diagnostics and Debugging

15.1 Source Locations

NCA instructions can carry optional source location metadata:

%x = load.u8 %p                  !loc(3, 12)

Where !loc(line, col) references the current .nca file. For compiler-generated NCA from .nc source, the metadata references the original NC source location.

15.2 Line Table

The .nc.line section (ELF) or __TEXT,__nc_line section (Mach-O) contains a compact PC-to-source mapping. Format:

[file_index: u16] [line: u32] [col: u16] [pc_delta: u32]

This is sufficient for stack traces and basic debugging without full DWARF complexity. A -g flag causes ncc to emit DWARF .debug_info / .debug_line for use with standard debuggers.

15.3 Frame Pointers

In debug builds, all functions emit a frame pointer (RBP on amd64, X29 on arm64). In release builds, frame pointers may be omitted for functions the compiler proves do not need them unless frameptr is specified. The runtime’s stack unwinder uses frame pointers when available and falls back to the .nc.line table otherwise.

15.4 Panic and Trap

When NC code panics (bounds check failure, nil dereference, explicit panic()), the MIR lowers to a call to the runtime’s __nc_panic function. This function:

Captures the current PC.
Walks the stack using frame pointers.
Maps PCs to source locations via the line table.
Prints a stack trace to stderr.
Calls __nc_rt_exit(1).

The trap instruction in NCA lowers to the platform’s trap/abort mechanism (ud2 on amd64, brk #1 on arm64).

15.5 Diagnostic Format

NCA compilation errors use the format:

<file>:<line>:<col>: error: <message>
    <source line>
    <caret indicator>

Example:

runtime/memops.nca:14:9: error: type mismatch: expected u64, got addr
        %r = add.u64 %p, %offset
             ^~~~~~~

Warnings use the same format with warning: instead of error:. Diagnostics are printed to stderr and can be machine-parsed by editors and CI tools.

Multiple errors are reported per source file where possible. The parser uses block-level recovery: on encountering a syntax error within a function body, it skips to the next block label or function boundary and continues parsing.

16. Examples

Short examples omit the nc <version> header for brevity. Complete .nca files must include it (see sections 4.1 and 16.6 for full-file examples).

16.1 Simple Arithmetic

pub fn add64(a: u64, b: u64) -> u64, nc {
    entry:
        %r = add.u64 a, b
        ret %r
}

16.2 C Interop

extern fn puts(s: addr) -> i32, c

pub data hello : u8[14] rodata = c"Hello, world!"

pub fn greet(), nc {
    entry:
        %p = addr.of hello
        %_ = call puts(%p)
        ret
}

16.3 Target-Specialized Function

when arch.amd64 and feature.popcnt {
    pub fn popcount(x: u64) -> u64, nc {
        entry:
            %r = popcnt.u64 x
            ret %r
    }
}

pub fn popcount(x: u64) -> u64, nc {
    entry:
        // Portable bit-twiddling fallback
        %m1  = const.u64 0x5555555555555555
        %m2  = const.u64 0x3333333333333333
        %m4  = const.u64 0x0F0F0F0F0F0F0F0F
        %h01 = const.u64 0x0101010101010101

        %a = lshr.u64 x, 1
        %b = and.u64 %a, %m1
        %c = sub.u64 x, %b

        %d = and.u64 %c, %m2
        %e = lshr.u64 %c, 2
        %f = and.u64 %e, %m2
        %g = add.u64 %d, %f

        %h = lshr.u64 %g, 4
        %i = add.u64 %g, %h
        %j = and.u64 %i, %m4

        %k = mul.u64 %j, %h01
        %r = lshr.u64 %k, 56
        ret %r
    }
}

16.4 Atomic Counter

pub fn atomic_inc(counter: addr) -> u64, nc {
    entry:
        %one = const.u64 1
        %old = atomic.rmw.add.u64 counter, %one, order(seq_cst)
        ret %old
}

16.5 Stack Slot Usage

pub fn sum_array(arr: addr, len: uptr) -> i64, nc {
    stack acc : i64[1]

    entry:
        %acc_p = addr.of.stack acc
        %zero = const.i64 0
        store.i64 %acc_p, %zero
        %izero = const.uptr 0
        jmp loop(%izero)

    loop(%i: uptr):
        %done = cmp.ge.uptr %i, len
        br %done, exit, body(%i)

    body(%i: uptr):
        %elem_off = mul.uptr %i, 8
        %elem_p = addr.add arr, %elem_off
        %val = load.i64 %elem_p
        %acc_p2 = addr.of.stack acc
        %cur = load.i64 %acc_p2
        %new = add.i64 %cur, %val
        store.i64 %acc_p2, %new
        %next = add.uptr %i, 1
        jmp loop(%next)

    exit:
        %acc_p3 = addr.of.stack acc
        %result = load.i64 %acc_p3
        ret %result
}

16.6 FNV-1a Hash (Realistic Hot Path)

A complete, portable FNV-1a 64-bit hash function suitable for hash table use:

nc 1

// FNV-1a constants
data fnv_offset : u64 rodata = 0xCBF29CE484222325
data fnv_prime  : u64 rodata = 0x00000100000001B3

pub fn fnv1a_64(data: addr, len: uptr) -> u64, nc {
    entry:
        %hash0 = const.u64 0xCBF29CE484222325
        %prime = const.u64 0x00000100000001B3
        %zero  = const.uptr 0
        jmp loop(%zero, %hash0)

    loop(%i: uptr, %hash: u64):
        %done = cmp.ge.uptr %i, len
        br %done, done(%hash), body(%i, %hash)

    body(%i: uptr, %hash: u64):
        %p = addr.add data, %i
        %b = load.u8 %p
        %b64 = u8.to.u64 %b
        %xored = xor.u64 %hash, %b64
        %hashed = mul.u64 %xored, %prime
        %next = add.uptr %i, 1
        jmp loop(%next, %hashed)

    done(%result: u64):
        ret %result
}

16.7 Spinlock (Atomics in Practice)

A simple test-and-test-and-set spinlock showing atomic operations and control flow:

nc 1

// Lock state: 0 = unlocked, 1 = locked
// lock_addr points to a u32 in shared memory.

pub fn spin_lock(lock_addr: addr), nc {
    entry:
        jmp try_acquire

    try_acquire:
        %expected = const.u32 0
        %desired  = const.u32 1
        %old, %ok = cmpxchg.u32 lock_addr, %expected, %desired, order(acquire, relaxed)
        br %ok, acquired, spin

    spin:
        // Test before retrying CAS (reduces cache line bouncing)
        %current = load.u32 lock_addr, order(relaxed)
        %still_locked = cmp.ne.u32 %current, %expected
        br %still_locked, spin, try_acquire

    acquired:
        ret
}

pub fn spin_unlock(lock_addr: addr), nc {
    entry:
        %zero = const.u32 0
        store.u32 lock_addr, %zero, order(release)
        ret
}

17. Appendix A: Grammar Summary

file         = version_decl NL decl*
version_decl = "nc" INT_LIT
decl         = fn_def | extern_decl | data_def | when_block
comment      = "//" (any character except NL)* NL

when_block   = "when" predicate "{" NL decl* "}"
predicate    = pred_atom ("and" pred_atom)*
pred_atom    = IDENT "." IDENT
             | IDENT "." INT_LIT

fn_def       = ["pub"] "fn" IDENT "(" params ")" ["->" type_list] "," callconv ["," attr_list]
               "{" NL stack_decl* block+ "}"

extern_decl  = "extern" "fn" IDENT "(" params ")" ["->" type_list] "," callconv
             | "extern" "data" IDENT ":" type

data_def     = ["pub"] "data" IDENT ":" type "[" [INT_LIT] "]"
               [section_class] [align_attr] ["=" initializer]

params      = (param ("," param)*)?
param       = param_ident ":" type
param_ident = IDENT | decl_keyword
decl_keyword = "data" | "fn" | "pub" | "extern" | "stack" | "when"
type_list   = type ("," type)*
type         = "i8" | "u8" | "i16" | "u16" | "i32" | "u32" | "i64" | "u64"
             | "f32" | "f64" | "iptr" | "uptr" | "bool" | "addr"

callconv     = "nc" | "c"
attr_list    = attr ("," attr)*
attr         = "frameptr"

section_class = "rodata" | "data" | "bss" | "tls"
align_attr    = "align" "(" INT_LIT ")"

stack_decl  = "stack" IDENT ":" type "[" INT_LIT "]" ["," align_attr]

block       = LABEL ["(" block_params ")"] ":" NL instruction+ terminator NL
block_params = param ("," param)*

instruction  = [value_def] op NL
             | conversion NL
             | bulk_mem_op NL
             | store_op NL
value_def    = "%" IDENT "="
             | "%" IDENT "," "%" IDENT "="

op           = arith_op | bitwise_op | cmp_op | select_op
             | float_op | load_op | addr_op
             | const_op | call_op | atomic_op | fence_op

const_op     = "const." type (INT_LIT | FLOAT_LIT)
             | "addr.of" IDENT
             | "addr.of.stack" IDENT
             | "addr.null"

arith_op     = arith_name "." type value "," value
             | neg_op
             | "uaddc." type value "," value "," value
             | "usubb." type value "," value "," value
             | "umulh." type value "," value
             | "smulh." type value "," value
             | "add.ov." type value "," value
             | "sub.ov." type value "," value
             | "mul.ov." type value "," value
arith_name   = "add" | "sub" | "mul" | "udiv" | "sdiv" | "urem" | "srem"
neg_op       = "neg." type value

bitwise_op   = bitwise_name "." type value "," value
             | unary_bit "." type value
bitwise_name = "and" | "or" | "xor" | "shl" | "lshr" | "ashr" | "rotl" | "rotr"
unary_bit    = "not" | "bswap" | "clz" | "ctz" | "popcnt"

cmp_op       = "cmp." cmp_kind "." type value "," value
cmp_kind     = "eq" | "ne" | "lt" | "le" | "gt" | "ge"
             | "oeq" | "une" | "olt" | "ole" | "ogt" | "oge" | "ord" | "uno"

select_op    = "select." type value "," value "," value

float_op     = float_binary "." ftype value "," value
             | float_unary "." ftype value
float_binary = "fadd" | "fsub" | "fmul" | "fdiv" | "frem" | "copysign" | "fmin" | "fmax"
float_unary  = "fneg" | "fabs" | "sqrt"

conversion   = "%" IDENT "=" type ".to." type value

load_op      = "load." type value
             | "load." type value "," "order(" ordering ")"
             | "load.le." type value
             | "load.be." type value
             | "load.unaligned." type value

store_op     = "store." type value "," value
             | "store." type value "," value "," "order(" ordering ")"
             | "store.le." type value "," value
             | "store.be." type value "," value
             | "store.unaligned." type value "," value

bulk_mem_op  = "memcpy" value "," value "," value
             | "memmove" value "," value "," value
             | "memset" value "," value "," value

addr_op      = "addr.add" value "," value
             | "addr.sub" value "," value

call_op      = "call" IDENT "(" args ")"
             | "call.indirect" value "(" args ")" "->" type_list "," callconv

atomic_op    = "atomic.rmw." atomic_rmw_kind "." type value "," value "," "order(" ordering ")"
             | "cmpxchg." type value "," value "," value "," "order(" ordering "," ordering ")"
atomic_rmw_kind = "add" | "sub" | "and" | "or" | "xor" | "xchg"

fence_op     = "fence" "order(" ordering ")"

ordering     = "relaxed" | "acquire" | "release" | "acq_rel" | "seq_cst"
ftype        = "f32" | "f64"

terminator  = "jmp" LABEL ["(" args ")"]
            | "br" value "," LABEL ["(" args ")"] "," LABEL ["(" args ")"]
            | "switch" value "," "default" LABEL "[" switch_arm* "]"
            | "ret" [value ("," value)*]
            | "tailcall" IDENT "(" args ")"
            | "trap"
            | "unreachable"

switch_arm  = INT_LIT "->" LABEL ","
args        = (value ("," value)*)?
value       = "%" IDENT | IDENT | INT_LIT | FLOAT_LIT
initializer = simple_init | "[" NL? init_elem ("," NL? init_elem)* ","? NL? "]"
init_elem   = value | "addr.of" IDENT
simple_init = INT_LIT | FLOAT_LIT | STRING_LIT

INT_LIT     = ["-"] DIGITS
            | "0x" HEX_DIGITS
            | "0o" OCT_DIGITS
            | "0b" BIN_DIGITS
            (* underscores allowed between digits for readability *)

FLOAT_LIT   = DIGITS "." DIGITS [("e" | "E") ["+" | "-"] DIGITS]

STRING_LIT  = '"' (CHAR | ESCAPE)* '"'
ESCAPE      = "\\" ("n" | "t" | "r" | "\\" | "0" | '"' | "x" HEX HEX)

Keyword scoping. The keywords data, fn, pub, extern, stack, and when are reserved only when they appear at the start of a top-level declaration or inside a when block. In parameter position, value-name position (%data), and label position, they are treated as ordinary identifiers. All other keywords (type names, opcodes, terminators) are globally reserved.

This grammar is intended to be complete for all instructions defined in section 7. If a future version adds new instructions, the grammar must be updated in tandem. The instruction set (section 7) is the normative reference; this grammar is a derived summary for parser implementors.

Comments may appear on their own line or at the end of an instruction line. The parser discards them before processing.

18. Appendix B: Reserved Keywords

The following identifiers are reserved and may not be used where the grammar requires a keyword. Most are reserved globally. Top-level declaration keywords (data, fn, pub, extern, stack, when) are reserved only in declaration position and may still be used as parameter names, value names, and labels.

Structural: nc, fn, pub, extern, data, stack, when, and

Types: i8, u8, i16, u16, i32, u32, i64, u64, f32, f64, iptr, uptr, bool, addr

Qualifiers: c, frameptr

Section classes: rodata, bss, tls

Terminators: jmp, br, switch, ret, tailcall, trap, unreachable

Operations: const, add, sub, mul, udiv, sdiv, urem, srem, neg, and, or, xor, not, shl, lshr, ashr, bswap, clz, ctz, popcnt, rotl, rotr, cmp, select, fadd, fsub, fmul, fdiv, frem, fneg, fabs, sqrt, copysign, fmin, fmax, to, load, store, memcpy, memmove, memset, call, fence

Reserved for future use: vec, v128, v256, inline, volatile, restrict, yield, await, async, import, module, type, struct, enum, union, match, for, while, if, else, ref, mut

19. Appendix C: Design Rationale

19.1 Why Not LLVM IR?

LLVM IR is the most mature option for a low-level target, but it has significant drawbacks for NC’s use case:

Massive dependency. LLVM is ~30M lines of C++. Shipping it as part of ncc would dominate the compiler’s size and build time. NC aims to be a lean, fast-compiling toolchain.
Unstable textual format. LLVM IR is not a stable interface. It changes between LLVM releases, and the textual format is explicitly not guaranteed to be round-trippable or backward-compatible.
Wrong abstraction level. LLVM IR exposes GEP, PHI nodes, and a type system that does not match NC’s. The impedance mismatch between NC’s design and LLVM’s assumptions (e.g., around poison values, undef, and UB) creates subtle correctness risks.
No control over codegen. Using LLVM means accepting its register allocator, instruction selector, and optimization pipeline. NC benefits from controlling the full pipeline for predictable performance and compilation speed.

19.2 Why Not QBE or Cranelift?

QBE is attractively simple (~12K lines of C), but it targets only amd64 and arm64 with limited optimization, no atomics, and no Windows support. Its type system (word/long/single/double) is too coarse for NCA’s needs, and it has no concept of calling convention control or target predicates.

Cranelift (used by Wasmtime and rustc_codegen_cranelift) is more capable, but it is a Rust library with its own IR (CLIF). Depending on it would introduce a Rust build dependency into ncc and tie NC’s codegen quality to Cranelift’s development priorities.

Both are excellent projects, but NC’s goal of a self-contained, C++-based toolchain with full control over codegen makes a custom backend the right choice. NCA is deliberately simpler than LLVM IR and can be implemented incrementally.

19.3 Why Virtual Registers Instead of Go-Style Named Physical Registers?

Go’s assembler uses pseudo-registers (FP, SP, SB, PC) that map to physical registers per target. This is a pragmatic choice for an existing ecosystem, but it still exposes architecture shape: you write MOVQ AX, BX and the assembler knows what AX and BX mean per target.

NCA removes physical registers entirely because:

True single-source portability. A NCA function does not mention any register that exists on only one architecture. The same source compiles identically on amd64 and arm64 without conditional register names.
Backend freedom. The register allocator can use the full register file without being constrained by the author’s register choices. This matters for arm64 (31 GPRs) vs amd64 (16 GPRs) — handwritten register assignments optimized for one target are suboptimal on the other.
Simpler validation. SSA form with typed virtual values can be validated with a single pass. Physical register liveness requires dataflow analysis.

The tradeoff is that NCA authors cannot force a specific register assignment for performance tuning. In practice, the cases where this matters (specific ABI entry points, interrupt handlers) are exactly the cases that should use raw asm().

19.4 Why SSA with Block Parameters Instead of Phi Nodes?

Block parameters (also called block arguments) and phi nodes are semantically equivalent — they both represent value merging at control flow join points. Block parameters are chosen because:

Explicit data flow. Values flow through block parameters at jump/branch sites, making it immediately visible what data each block receives.
No ordering ambiguity. Phi nodes in LLVM IR must appear at the top of a block and reference predecessor blocks, creating subtle ordering dependencies. Block parameters are positional and unambiguous.
Easier hand-authoring. For human-written NCA, block parameters read naturally as function-like arguments to a label, while phi nodes are notoriously confusing to write and read by hand.
Simpler parser. The parser does not need a special phi-node construct; block parameters are part of the label syntax.

This is the same design used by MLIR, Cranelift (CLIF), and Swift’s SIL.

20. Open Questions

Should nc register mappings match c on all current targets? This simplifies the initial implementation at the cost of future flexibility. The Go approach (ABI0 = C-like) worked for over a decade before they added ABIInternal.
Is there demand for inline NCA within .nc files? The file-based model is cleaner, but small escape hatches within NC functions could reduce friction. Deferred to a future phase.
Should NCA support variadic c calls in v1? Currently excluded. If the runtime needs printf-like C calls, a small C shim is the workaround. Revisit if this becomes a pain point.
TLS support timeline. Thread-local storage (tls section class) is declared in the grammar but requires per-target TLS model support in the backend. Priority depends on NC’s concurrency story.
Should ncc support emitting relocatable .o files in addition to full executables? This would enable linking with C object files and static libraries. Currently only full executable output is specified.
Should NCA define a stable binary serialization format? The textual format is the canonical representation, but a compact binary encoding would speed up incremental compilation for large projects. This could be as simple as a 1:1 binary encoding of the AST with a magic number and version header.
What is the error recovery strategy for the NCA parser? For compiler-generated NCA, errors indicate compiler bugs and should abort. For handwritten .nca, the parser should attempt to recover and report multiple diagnostics per file. The question is how much effort to invest in recovery quality for v1.
Should when predicates support or and not in addition to and? Currently only and is supported, which covers the common cases (arch + feature). Disjunction (or) and negation (not) add expressiveness but complicate predicate specificity ranking.
Should NCA support volatile loads and stores? Memory-mapped I/O requires loads and stores that are never optimized away or reordered. The current spec does not distinguish volatile from non-volatile access. Adding load.volatile.<type> and store.volatile.<type> would be straightforward.
If NC eventually needs explicit shared-library exports, should that live only in the build/package layer or also be mirrored in .nca source? This proposal currently keeps export policy out of the core NCA syntax.

21. Appendix D: Undefined Behavior Summary

NCA aims to minimize undefined behavior, but some is unavoidable in a low-level language. This is the complete list of operations that produce undefined behavior in NCA:

Category	Operation
Division	`udiv`, `sdiv`, `urem`, `srem` with a zero divisor.
Division	`sdiv` of `INT_MIN / -1` (signed overflow).
Alignment	`load` or `store` (non-`unaligned` variants) at a misaligned address.
Bool invariant	A `bool` value containing any bit pattern other than 0 or 1.
Provenance	Dereferencing an `addr` with no valid provenance (freed memory, fabricated address pointing to unmapped pages).
Stack lifetime	Using an `addr.of.stack` result after the owning function has returned.
Data races	Concurrent non-atomic access to the same location where at least one is a write.
Unreachable	Executing the `unreachable` instruction.
Uninitialized memory	Reading from a stack slot that has not been written to (release builds only; debug builds zero-fill).

Explicitly NOT undefined behavior:

Integer arithmetic overflow (wraps modulo 2^N).
Shift counts exceeding the bit width (masked to width - 1).
Floating-point division by zero (produces infinity per IEEE 754).
Floating-point overflow (produces infinity per IEEE 754).
NaN propagation (follows IEEE 754 rules).
Null address in addr (valid value; dereferencing it traps via the OS page fault handler, which the runtime converts to a panic).

This list is intentionally smaller than C’s or LLVM IR’s UB surface. The goal is that a correct NCA program can be reasoned about locally: if the inputs to an instruction are well-defined and the preconditions in the table above are met, the output is deterministic.

22. Appendix E: Comparison with Alternatives

Property	NCA	Go ASM	LLVM IR	QBE IL	WASM
Physical registers in source	No	Yes (pseudo-mapped)	No	No	No
Target-independent source	Yes	Partially (per-arch files)	Yes	Yes	Yes
SSA form	Block parameters	No (imperative)	Phi nodes	Phi nodes	Structured control flow
Aggregate types as values	No (memory only)	N/A	Yes	No	No
Calling convention control	`nc`, `c`	ABI0 / ABIInternal	Many (`ccc`, `fastcc`, etc.)	N/A	Single convention
Target predicates	`when` blocks	Build tags + file naming	Target triples	N/A	Feature detection
Atomics	C11 model	Via runtime	C11 model	None	C11 model
Direct binary emission	Yes (planned)	Yes (via Go toolchain)	Via backends	Via system assembler	Via engine
Standalone language	Yes (`.nca` files)	Yes (`.s` files)	Yes (`.ll` files)	Yes (`.ssa` files)	Yes (`.wat` files)
Implementation complexity	Medium (~30-50K LOC est.)	Part of Go toolchain	Very high (~30M LOC)	Low (~12K LOC)	Varies by engine
undef/poison values	No	N/A	Yes	No	No