../proposals
WIP AI Draft nclang.org

NCA: NC Abstract Low-level Language

NCA is a typed, target-independent low-level language that sits between NC source and machine code.

Opened March 23, 2026 Initial AI-authored draft with light or no manual pass yet. Updated March 23, 2026
Table of contents

NCA is a typed, target-independent low-level language that sits between NC source and machine code. It replaces raw asm() as the primary codegen target inside ncc and can optionally be authored by hand for performance-critical paths.

1. Overview

NCA serves three roles:

  1. Compiler IRncc lowers NC source through MIR into NCA, then lowers NCA to machine code per target.
  2. Handwritten low-level code — developers write .nca files for hot loops, crypto, memory routines, and runtime internals. One source file produces correct code for every supported target.
  3. Debugging formatncc -emit-nca dumps the textual NCA representation at any stage for inspection.

1.1 Design Principles

1.2 Non-Goals

NCA does not attempt to cover:

2. Source Model

2.1 File Extension and Compilation

NCA source files use the .nca extension. They live alongside normal .nc files and participate in the same NC source layout. The file path, not an in-file declaration, determines where the declarations belong.

For example, std/os.nca maps to the NC path std/os. If std/os.nc also exists, both files are compiled together as part of the same source set.

2.2 Relationship to NC Modules

NCA files participate in NC’s existing module/file organization:

There is no separate unit declaration and no NCA-specific module layer on top of NC’s existing source tree.

2.3 Symbol Visibility

ModifierWithin compilationUse case
(none)File-local onlyInternal helpers, private data.
pubVisible by NC’s normal symbol rulesCross-file functions and data.
externDeclaration onlyC library imports, runtime imports, linked code.

Dynamic export policy is intentionally not a per-function NCA attribute in v1. If NC later needs explicit shared-library exports, that should be defined at the higher-level build/package layer rather than as a separate NCA-only export keyword.

2.4 Relationship to NC’s asm() Builtin

NC’s existing asm() builtin remains as an unstructured, target-specific escape hatch. It emits raw textual assembly for a single target and bypasses all NCA type checking, register allocation, and portability guarantees.

NCA replaces asm() for the vast majority of low-level use cases. The intended migration path is:

A future phase may introduce inline NCA blocks within .nc files (see Open Questions), but .nca files are the primary interface.

2.5 Symbol Mangling

Internal symbols use the scheme:

N$<module_path>$<identifier>$<signature_hash>

Where <module_path> is derived from the relative source path (for example, std/os.nca -> std/os). <signature_hash> is a stable hash of the canonical parameter and return types. Symbols declared with c use their bare identifier with no mangling.

3. Type System

NCA has a small, fixed set of types. All runtime values are scalar.

3.1 Integer Types

TypeWidthSignedness
i88signed
u88unsigned
i1616signed
u1616unsigned
i3232signed
u3232unsigned
i6464signed
u6464unsigned
iptr64*signed
uptr64*unsigned

*iptr and uptr match the target pointer width. On all currently supported targets this is 64 bits.

3.2 Floating-Point Types

TypeWidthSemantics
f3232IEEE 754 binary32
f6464IEEE 754 binary64

3.3 Other Types

TypeWidthDescription
bool80 or 1. Matches NC’s bool representation.
addr64*Opaque raw memory address.

addr is not interchangeable with integer types. Conversion requires explicit uptr.to.addr / addr.to.uptr operations.

3.4 Relationship to NC Types

NCA’s type set is intentionally wider than NC’s surface type set. NC exposes a small high-level type system (int, uint, float, bool, byte, char, str, arrays, tuples, optionals, errors, maps, and user-defined structs/enums), while NCA adds fixed-width integers (i8 through u64), f32, addr, iptr, and uptr because low-level code, memory layout, and C interop require them.

The compiler maps NC’s primitive numeric and boolean types to NCA scalars during MIR -> NCA lowering:

NC typeNCA lowering
inti64
uintu64
floatf64
boolbool
byteu8

Higher-level NC types such as char, str, arrays, tuples, optionals, maps, and errors lower structurally rather than as simple aliases; section 10 defines the interop rules.

NCA does not define aliases for NC type names. Handwritten .nca code always uses the NCA spelling (i64, not int).

3.5 No Aggregate Values

Structs, arrays, strings, and other compound types do not exist as NCA values. They live in memory and are accessed through typed loads and stores at explicit offsets. The compiler provides size_of, align_of, and offset_of as compile-time layout queries (see section 6).

3.6 Data Layout Algorithm

NC uses a single, target-independent layout algorithm for all struct and array types. This ensures that size_of, align_of, and offset_of produce identical results on every supported target.

Struct layout rules:

  1. Fields are laid out in declaration order.
  2. Each field is aligned to its natural alignment (the alignment of its type).
  3. Padding bytes are inserted before a field if the current offset is not a multiple of the field’s alignment.
  4. The struct’s overall alignment is the maximum alignment of any field.
  5. The struct’s total size is rounded up to a multiple of its overall alignment (trailing padding).

Natural alignment of scalar types:

TypeAlignment
i8, u8, bool1
i16, u162
i32, u32, f324
i64, u64, f64, addr, iptr, uptr8

Array layout: Elements are tightly packed at the element’s natural alignment. size_of(T[N]) = size_of(T) * N, rounded up to align_of(T).

This layout matches the C layout (System V ABI) on all current targets (amd64, arm64), which means NC structs passed by pointer to c functions have compatible memory layout without conversion. If a future target requires different C layout rules, the compiler will insert conversion code at c boundaries rather than changing the canonical NC layout.

4. Syntax

NCA uses a line-oriented syntax with one instruction per line. Comments use //. The grammar is designed to be unambiguous without a separate lexer mode.

4.1 File Structure

nc 1
<declarations...>

The nc <version> header is mandatory and must be the first non-blank, non-comment line. It declares which NC language version the .nca file targets. The parser rejects files with an unsupported version.

A file contains zero or more top-level declarations: fn, extern fn, extern data, data, and when blocks.

4.1.1 Literal Syntax

Integer literals support decimal, hexadecimal, octal, and binary notation. Underscores may appear between digits for readability and are ignored by the parser.

42 // decimal
0xFF_00_AA // hexadecimal
0o777 // octal
0b1010_0011 // binary
-1 // negative (unary minus is part of the literal for constants)

Float literals use standard decimal notation with an optional exponent:

3.14
1.0e-6
0.0

String literals use double quotes with standard C-style escapes (\n, \t, \\, \0, \xHH):

b"hello\nworld" // raw bytes, u8[11]
c"hello\0embedded" // null-terminated, includes the explicit \0 plus trailing \0

4.1.2 Declaration Order

Within a .nca file, declaration order does not matter. A function may call another function defined later in the same file. A data declaration may reference a symbol defined below it. The compiler processes all top-level declarations in a file before validating function bodies.

Within a function body, standard SSA dominance rules apply: a value must be defined before it is used, and must dominate every use. This is a structural property of the block-structured SSA form, not a source-ordering rule.

Stack slot declarations must appear before any blocks in the function, but their order relative to each other does not matter.

4.1.3 Line Continuation

NCA is line-oriented: one instruction per line. However, certain constructs span multiple lines.

Bracket continuation. Inside [] (data initializers, switch arms) and () (function parameters, block parameters, call arguments), newlines are treated as whitespace. This allows multi-line data initializers and long parameter lists without an explicit continuation character.

pub data crc32_table : u32[256] rodata align(64) = [
0x00000000, 0x77073096, 0xEE0E612C, 0x1DB71064,
0xE3630B12, 0x94643B84, 0x0D6D6A3E, 0x7A6A5AA8,
// ... remaining entries
]

No backslash continuation. There is no \ line continuation. If a single instruction is too long, refactor it into multiple instructions using intermediate values. This keeps parsing unambiguous and the validator simple.

4.2 Function Definitions

[pub] fn <name>(<params>) [-> <return_types>], <callconv>[, <attributes>...] {
[stack <slot_name> : <type>[<count>], align(<n>)]...
<label>[(<block_params>)]:
<instruction>
...
<terminator>
...
}

Functions are SSA-form with block parameters (no phi nodes). Every function explicitly spells its calling convention. Every block ends with exactly one terminator. The entry block is the first block in the function body.

Example: portable memcpy

pub fn memcpy(dst: addr, src: addr, n: uptr) -> addr, nc {
entry:
%zero = const.uptr 0
jmp loop(%zero)
loop(%i: uptr):
%done = cmp.ge.uptr %i, n
br %done, exit, body(%i)
body(%i: uptr):
%src_p = addr.add src, %i
%x = load.u8 %src_p
%dst_p = addr.add dst, %i
store.u8 %dst_p, %x
%next = add.uptr %i, 1
jmp loop(%next)
exit:
ret dst
}

4.3 Extern Declarations

extern fn puts(s: addr) -> i32, c
extern fn malloc(size: uptr) -> addr, c
extern data errno : i32

Extern functions and data are resolved at link time. The target profile determines which shared library provides each symbol (see section 12).

4.4 Data Declarations

[pub] data <name> : <type>[<count>?] [<section_class>] [align(<n>)] = <initializer>

Section classes: rodata, data, bss, tls.

pub data crc32_table : u32[256] rodata align(64) = [
0x00000000, 0x77073096, 0xEE0E612C, ...
]
data tls_seed : u64 tls = 0
data zero_page : u8[4096] bss align(4096)

String literals produce byte arrays:

Address-valued initializers support relocations:

data vtable : addr[3] rodata = [
addr.of some_fn,
addr.of other_fn,
addr.of third_fn
]

Partial initialization. NCA does not support partial initialization. A data declaration must initialize all elements or none:

If you need a table where most entries are zero but a few are nonzero, define it as fully initialized with all values spelled out, or initialize it at runtime by writing to a BSS-allocated buffer.

The element count in the type must match the initializer length exactly. The compiler rejects mismatches:

data table : u32[4] rodata = [1, 2, 3] // error: expected 4 elements, got 3
data table : u32[4] rodata = [1, 2, 3, 4, 5] // error: expected 4 elements, got 5

Inferred count. If the count is omitted, it is inferred from the initializer:

data table : u32[] rodata = [1, 2, 3, 4] // count inferred as 4
data hello : u8[] rodata = c"Hello, world!" // count inferred from string + null

4.5 Target Predicates

Top-level when blocks conditionally include declarations based on target properties.

when arch.amd64 {
pub fn fast_crc32(data: addr, len: uptr) -> u32, nc {
// amd64-optimized implementation
...
}
}
when arch.arm64 {
pub fn fast_crc32(data: addr, len: uptr) -> u32, nc {
// arm64-optimized implementation
...
}
}
// Generic fallback (no `when` clause)
pub fn fast_crc32(data: addr, len: uptr) -> u32, nc {
// portable scalar implementation
...
}

The compiler selects the most specific matching definition. A definition without a when clause matches all targets and serves as the fallback. Section 4.6 defines the exact resolution algorithm, including ambiguity handling.

Available predicate atoms:

Predicate atomMeaning
arch.amd64Target architecture is amd64.
arch.arm64Target architecture is arm64.
os.linuxTarget OS is Linux.
os.darwinTarget OS is Darwin.
endian.littleTarget endianness is little-endian.
endian.bigTarget endianness is big-endian.
feature.aesTarget exposes AES support.
feature.crc32Target exposes CRC32 support.
feature.popcntTarget exposes POPCNT support.
ptr_bits.64Target pointer width is 64 bits.

Predicates compose with and:

when arch.amd64 and feature.aes {
...
}

4.6 When-Predicate Resolution

When multiple definitions of the same symbol exist, the compiler must select exactly one definition for the active target.

Definitions

A candidate set is all definitions of a given symbol (same name, same signature) within one module path’s merged source set: the .nca file plus any sibling .nc file with the same relative stem. A candidate is either bare (no when clause) or guarded (has a when clause).

The specificity of a candidate is the number of predicate atoms in its when clause. A bare candidate has specificity 0.

A candidate matches a target if every predicate atom in its when clause is true for that target. A bare candidate matches all targets.

Algorithm

For a given symbol and target:

  1. Collect all candidates that match the target.
  2. If the set is empty, emit a compile error: "no definition of <symbol> matches target <profile>".
  3. Find the maximum specificity among matching candidates.
  4. If exactly one candidate has that maximum specificity, select it.
  5. If multiple candidates share the maximum specificity, emit a compile error: "ambiguous definitions of <symbol> for target <profile>", listing the conflicting candidates with their source locations and when clauses.

Example

Given these definitions and target linux-amd64 with feature.aes:

// (A) specificity 0 -- matches everything
pub fn encrypt(src: addr, dst: addr, len: uptr), nc { ... }
// (B) specificity 1 -- matches any amd64
when arch.amd64 {
pub fn encrypt(src: addr, dst: addr, len: uptr), nc { ... }
}
// (C) specificity 2 -- matches amd64 with AES
when arch.amd64 and feature.aes {
pub fn encrypt(src: addr, dst: addr, len: uptr), nc { ... }
}

All three match. Maximum specificity is 2 (candidate C). C is selected.

On target linux-arm64, only A matches. A is selected.

On target linux-amd64 without feature.aes, A and B match. Maximum specificity is 1 (candidate B). B is selected.

Ambiguity Example

when arch.amd64 {
pub fn hash(data: addr, len: uptr) -> u64, nc { ... }
}
when os.linux {
pub fn hash(data: addr, len: uptr) -> u64, nc { ... }
}

On linux-amd64, both match with specificity 1. This is an ambiguous error. The fix is to either combine them (when arch.amd64 and os.linux), remove one, or add a bare fallback and keep only one guarded variant at specificity 1.

Signature Matching

Two candidates are considered definitions of the same symbol if and only if:

  1. They have the same identifier name.
  2. They have the same parameter types in the same order.
  3. They have the same return types in the same order.
  4. They have the same calling convention.

If two candidates share a name but differ in any of the above, they are distinct overloads and follow NC’s normal overloading rules. Handwritten .nca should not declare multiple overloads of the same name; in practice this case mainly arises from NC source or compiler-generated NCA. when resolution only operates within one overload set at a time.

Cross-File Interaction

Predicate resolution is per module path. Two .nca files at different source paths do not participate in the same candidate set because their mangled symbols differ by module path.

The intent is that all when variants of a symbol live in the same .nca file, or in the .nc / .nca pair for the same relative stem. The linker never sees when; it only sees the single winning definition emitted for that module path.

If two separate translation units still emit the same final symbol name (for example via c ABI names, or through some future explicit symbol override mechanism), that is a duplicate-definition link error, not predicate resolution.

Extern Declarations Inside when Blocks

extern declarations may appear inside when blocks. This is useful when a C library function exists on one platform but not another:

when os.linux {
extern fn epoll_create1(flags: i32) -> i32, c
}
when os.darwin {
extern fn kqueue() -> i32, c
}

A when-gated extern declaration is only visible to code inside the same when block or to code whose own when predicate implies the extern’s predicate. Calling a when-gated extern from an ungated function is a compile error:

pub fn make_poller() -> i32, nc {
entry:
%fd = call epoll_create1(0) // error: epoll_create1 only exists when os.linux
ret %fd
}

The fix is to gate the calling function too, or provide platform-specific implementations behind when blocks with a common fallback.

Interaction with Bare NC Definitions

If crypto.nc defines a function encrypt and crypto.nca also defines encrypt, these are two definitions of the same symbol in the same module path. The compiler merges them into the same candidate set and applies the normal resolution rules:

This allows a pattern where .nc provides a readable pure-NC fallback and .nca provides an optimized version for specific targets, all under the same public API.

Missing Definitions

If no candidate matches the current target and no bare fallback exists, the compiler emits a target-specific error:

error: no definition of `fast_crc32` matches target `darwin-arm64`
note: candidates exist for: arch.amd64, arch.amd64 and feature.crc32
note: add a bare fallback definition, or add a `when arch.arm64` variant

A symbol with when-gated definitions but no bare fallback is valid only if at least one candidate matches the selected build target.

The compiler checks coverage only for the active target being built. It does not try to prove that every possible future target would be covered.

5. Function Qualifiers

Function signatures always spell their calling convention explicitly after the parameter list and return type.

QualifierMeaning
ncNC stable calling convention.
cPlatform C calling convention.
frameptrAlways emit a frame pointer (default in debug builds).

There is no default calling convention in NCA v1.

The compiler infers whether a function is leaf and whether it can return normally from the actual CFG, so there are no leaf or noreturn attributes.

NCA also does not define a standalone export attribute. Any eventual shared-library export mechanism should come from NC’s higher-level packaging/build rules.

5.1 Calling Conventions

nc (NC stable ABI): The documented, stable ABI for handwritten NCA and normal NC-generated calls. Parameter and return value placement is defined per target but guaranteed stable across compiler versions.

c (Platform C ABI): The target’s native C calling convention (System V AMD64 ABI on Linux/Darwin amd64, AAPCS64 on arm64). Used for FFI with C libraries. Functions with c use their bare identifier (no mangling).

6. Compile-Time Layout Queries

NCA provides three compile-time operators for accessing NC type layout information. These resolve to integer constants during compilation.

OperatorReturns
size_of(T)Size of type T in bytes.
align_of(T)Alignment of type T in bytes.
offset_of(T.field)Byte offset of field within type T.

These reference NC type metadata emitted by the frontend. They replace the need for generated headers (compare Go’s go_asm.h).

pub fn reader_pos(r: addr) -> uptr, nc {
entry:
%off = const.uptr offset_of(Reader.pos)
%p = addr.add r, %off
%val = load.uptr %p
ret %val
}

6.1 Constant Expressions

Certain positions in NCA require compile-time-known values: data initializers, const.<type> operands, stack slot sizes, and alignment attributes. NCA supports a restricted set of compile-time arithmetic in these positions.

Allowed in constant expressions:

Not allowed in constant expressions:

Examples:

// Valid: offset arithmetic for nested struct access
%p = addr.add %base, offset_of(Outer.inner) + offset_of(Inner.field)
// Valid: stack slot sized to struct
stack buf : u8[size_of(MyStruct)], align(align_of(MyStruct))
// Invalid: cannot do arithmetic on addr.of
data bad : u64 rodata = addr.of some_sym + 8 // error: addr.of is not an integer constant

Link-time constants. addr.of <symbol> is a link-time constant, not a compile-time integer constant. It may appear in data initializers, but not in compile-time arithmetic expressions. If you need “address of symbol + byte offset,” compute it at runtime:

%base = addr.of some_struct
%p = addr.add %base, 8

7. Instruction Set

All typed instructions spell their full type suffix explicitly. The validator rejects abbreviated spellings such as cmp.ge.u or conversions that rely on implicit source or destination types.

7.1 Constants

%x = const.<type> <value> // integer or float literal
%p = addr.of <symbol> // address of a function or data symbol
%n = addr.null // null address

7.2 Integer Arithmetic

All integer arithmetic wraps modulo 2^N for the operand width.

%r = add.<type> %a, %b
%r = sub.<type> %a, %b
%r = mul.<type> %a, %b
%r = udiv.<type> %a, %b // unsigned division (traps on zero divisor)
%r = sdiv.<type> %a, %b // signed division (traps on zero divisor or MIN/-1)
%r = urem.<type> %a, %b // unsigned remainder
%r = srem.<type> %a, %b // signed remainder
%r = neg.<type> %a // two's complement negation

Extended arithmetic

%r, %carry = uaddc.<type> %a, %b, %c_in // add with carry
%r, %borrow = usubb.<type> %a, %b, %b_in // subtract with borrow
%hi = umulh.<type> %a, %b // unsigned multiply high half
%hi = smulh.<type> %a, %b // signed multiply high half

Checked arithmetic

%r, %ov = add.ov.<type> %a, %b
%r, %ov = sub.ov.<type> %a, %b
%r, %ov = mul.ov.<type> %a, %b

For checked arithmetic, signedness comes from <type>. For example, add.ov.i64 uses signed overflow rules, while add.ov.u64 uses unsigned carry/overflow rules.

7.3 Bitwise Operations

%r = and.<type> %a, %b
%r = or.<type> %a, %b
%r = xor.<type> %a, %b
%r = not.<type> %a
%r = shl.<type> %a, %b // shift left (count masked to width-1)
%r = lshr.<type> %a, %b // logical shift right
%r = ashr.<type> %a, %b // arithmetic shift right
%r = bswap.<type> %a // byte swap
%r = clz.<type> %a // count leading zeros
%r = ctz.<type> %a // count trailing zeros
%r = popcnt.<type> %a // population count
%r = rotl.<type> %a, %b // rotate left
%r = rotr.<type> %a, %b // rotate right

7.4 Comparison and Selection

Comparisons produce bool.

%r = cmp.eq.<type> %a, %b // equal
%r = cmp.ne.<type> %a, %b // not equal
%r = cmp.lt.<type> %a, %b
%r = cmp.le.<type> %a, %b
%r = cmp.gt.<type> %a, %b
%r = cmp.ge.<type> %a, %b

For integer and pointer-sized integer types, signedness comes from <type>. For example, cmp.lt.i64 is signed, while cmp.lt.u64 is unsigned. For addr, only cmp.eq.addr and cmp.ne.addr are valid.

Floating-point comparisons:

%r = cmp.oeq.<ftype> %a, %b // ordered equal
%r = cmp.une.<ftype> %a, %b // unordered not equal
%r = cmp.olt.<ftype> %a, %b // ordered less than
%r = cmp.ole.<ftype> %a, %b
%r = cmp.ogt.<ftype> %a, %b
%r = cmp.oge.<ftype> %a, %b
%r = cmp.ord.<ftype> %a, %b // both operands are not NaN
%r = cmp.uno.<ftype> %a, %b // either operand is NaN

Selection:

%r = select.<type> %cond, %a, %b // if cond then a else b

7.5 Floating-Point Arithmetic

All floating-point operations follow IEEE 754 with round-to-nearest-even.

%r = fadd.<ftype> %a, %b
%r = fsub.<ftype> %a, %b
%r = fmul.<ftype> %a, %b
%r = fdiv.<ftype> %a, %b
%r = frem.<ftype> %a, %b // IEEE remainder
%r = fneg.<ftype> %a
%r = fabs.<ftype> %a
%r = sqrt.<ftype> %a
%r = copysign.<ftype> %a, %b
%r = fmin.<ftype> %a, %b // IEEE 754-2008 minimum
%r = fmax.<ftype> %a, %b // IEEE 754-2008 maximum

7.6 Conversions

All conversions use a uniform <src>.to.<dst> spelling. The validator checks that the operand type really is <src> and rejects unsupported source/destination pairs.

%r = <src>.to.<dst> %a

Common cases:

%r = u8.to.u64 %a // zero-extend
%r = i8.to.i64 %a // sign-extend
%r = u64.to.u8 %a // truncate
%r = f64.to.i64 %a // float -> signed int (truncate toward zero)
%r = f64.to.u64 %a // float -> unsigned int
%r = i64.to.f64 %a // signed int -> float
%r = u64.to.f64 %a // unsigned int -> float
%r = f32.to.f64 %a // widen float
%r = f64.to.f32 %a // narrow float
%r = addr.to.uptr %a // address -> integer
%r = uptr.to.addr %a // integer -> address
%r = bool.to.u32 %a // bool -> integer (0 or 1)
%r = u32.to.bool %a // integer -> bool (nonzero = true)

7.7 Memory Operations

%r = load.<type> %addr // load from address
store.<type> %addr, %value // store to address
%r = load.<type> %addr, order(<ordering>) // atomic load
store.<type> %addr, %value, order(<ordering>) // atomic store

Endian-explicit loads and stores for binary protocols:

%r = load.le.<type> %addr // load little-endian
%r = load.be.<type> %addr // load big-endian
store.le.<type> %addr, %value
store.be.<type> %addr, %value

Bulk memory:

memcpy %dst, %src, %len // non-overlapping copy
memmove %dst, %src, %len // overlapping-safe copy
memset %dst, %val: u8, %len // fill memory

7.8 Address Arithmetic

%r = addr.add %base, %offset // base + offset (offset is iptr or uptr)
%r = addr.sub %a, %b // address difference

addr.add accepts an offset of type iptr (signed) or uptr (unsigned). Signed offsets are useful for negative-direction pointer arithmetic; unsigned offsets are natural for index-based access. The result is always addr.

addr.sub takes two addr operands and produces an iptr result: the signed byte distance between the two addresses. This is the only instruction whose result type differs from its operand types; the validator infers iptr automatically.

7.9 Stack Slots

Stack slots are declared at the top of a function, before any blocks. They allocate addressable memory in the function’s stack frame.

stack buf : u8[4096], align(16)
stack tmp : u64[1]

The address of a stack slot is obtained with:

%p = addr.of.stack buf

7.10 Control Flow

jmp <label>[(<args>)] // unconditional jump
br %cond: bool, <true_label>, <false_label>[(<args>)] // conditional branch
switch %val: <int_type>, default <label> [
<const> -> <label>,
...
]
ret [<values>] // return (zero or more values)
trap // abnormal termination
unreachable // assert unreachable (UB if reached)

7.11 Calls

%r = call <fn_name>(<args>) // direct call
%r = call.indirect %addr: addr (<args>) -> <return_types>, <callconv>
// indirect call (signature required)
tailcall <fn_name>(<args>) // tail call (must be a terminator)

Tail call constraints: tailcall is only valid when the callee’s signature matches the caller’s return type(s) and calling convention. The callee must not be c unless the caller is also c. The compiler may reject tailcall and fall back to a regular call + ret if the target ABI or stack layout makes a true tail call infeasible (e.g., callee requires more stack argument space than the caller allocated). A rejected tailcall emits a diagnostic at -Wtailcall.

switch semantics: The integer value is compared against each constant arm. If a match is found, control transfers to that arm’s label. If no match is found, control goes to the default label. All arm constants must be the same type as the switched value and must be unique. The backend may lower switch to a jump table, binary search, or linear scan depending on density and target heuristics.

7.12 Atomic Operations

%old = atomic.rmw.add.<type> %addr, %val, order(<ordering>)
%old = atomic.rmw.sub.<type> %addr, %val, order(<ordering>)
%old = atomic.rmw.and.<type> %addr, %val, order(<ordering>)
%old = atomic.rmw.or.<type> %addr, %val, order(<ordering>)
%old = atomic.rmw.xor.<type> %addr, %val, order(<ordering>)
%old = atomic.rmw.xchg.<type> %addr, %val, order(<ordering>)
%old, %ok = cmpxchg.<type> %addr, %expected, %desired, order(<succ>, <fail>)
fence order(<ordering>)

Orderings: relaxed, acquire, release, acq_rel, seq_cst.

8. Memory Model

8.1 Alignment

All loads and stores require natural alignment by default. A load.u32 requires 4-byte alignment, a load.u64 requires 8-byte alignment, and so on. Misaligned access is undefined behavior.

For situations requiring unaligned access (binary protocol parsing, packed serialization), explicit unaligned variants are provided:

%r = load.unaligned.<type> %addr
store.unaligned.<type> %addr, %value

These are slower on most targets but always correct regardless of address alignment.

Stack slots are aligned to at least the natural alignment of their element type, or to the alignment specified by the align() attribute, whichever is greater.

8.2 Non-Atomic Memory Ordering

Non-atomic loads and stores have no ordering guarantees with respect to other threads. The compiler and hardware are free to reorder, merge, or eliminate non-atomic memory operations as long as single-threaded semantics are preserved.

To enforce ordering between non-atomic operations and atomic operations, use fence. To enforce ordering between two atomic operations, use the appropriate memory ordering on the operations themselves.

Concurrent non-atomic access to the same memory location where at least one access is a write is a data race and constitutes undefined behavior, exactly as in C11/C++11.

8.3 Atomic Memory Ordering

Atomic operations use the C11/C++11 memory model. The five orderings are:

Atomic operations are restricted to types u8, u16, u32, u64, i8, i16, i32, i64, uptr, iptr, and addr. The backend guarantees lock-free atomics for all supported widths on all current targets (amd64 and arm64 both provide lock-free atomics up to 64 bits).

8.4 Address Provenance

addr values carry implicit provenance: an address derived from a stack slot is valid only within that function’s lifetime, and an address derived from a data symbol is valid for the program’s lifetime. The compiler does not track provenance formally in NCA v1, but violating provenance (e.g., using a stack address after the function returns) is undefined behavior.

Fabricating addresses from integers via uptr.to.addr produces addresses with no provenance. Such addresses may only be used to access memory that the program has independently established is valid (e.g., memory-mapped I/O regions, addresses returned by mmap via the runtime).

8.5 Bool Invariant

A bool value must always contain exactly 0 or 1. Creating a bool with any other bit pattern (e.g., by loading a u8 and treating it as bool without an explicit integer-to-bool conversion) is undefined behavior. An explicit <int>.to.bool conversion normalizes any nonzero value to 1.

9. ABI Definition

9.1 nc Stable ABI

The nc convention is the stable handwritten-NCA ABI. It is defined per target and guaranteed not to change within a major NC version.

Argument passing: Scalar arguments are passed in a fixed sequence of abstract argument slots. The backend maps these to physical registers or stack positions per target. The mapping is:

TargetInteger/Pointer slotsFloat slots
linux/amd64rdi, rsi, rdx, rcx, r8, r9xmm0-xmm7
darwin/amd64rdi, rsi, rdx, rcx, r8, r9xmm0-xmm7
linux/arm64x0-x7v0-v7 (d-regs)
darwin/arm64x0-x7v0-v7 (d-regs)

Arguments exceeding available slots spill to the stack in declaration order, aligned to 8 bytes.

Return values: Up to two scalar return values use the first two integer or float return registers. Beyond two, the caller passes a hidden pointer to a return area.

Callee-saved registers: Defined per target. The backend handles save/restore automatically.

9.2 c Platform C ABI

Functions declared c follow the platform’s native C calling convention. On amd64 Linux/Darwin this is the System V AMD64 ABI. On arm64 it is AAPCS64 (with Apple’s variant on Darwin).

Restrictions in NCA v1:

10. NC/NCA Interop

NCA files contribute declarations to the same module namespace as .nc files at the same source path. This section defines how the two languages see each other’s symbols and how types map across the boundary.

10.1 Calling NCA from NC

A pub function defined in a .nca file is visible to .nc files in the same module. However, because NCA operates on lowered types (addr, i64, etc.) rather than NC surface types (str, int, etc.), the NC-facing API usually goes through a thin NC wrapper that handles type lowering explicitly.

Example:

std/hash.nca:

nc 1
// Operates on raw bytes. Caller is responsible for providing
// a valid pointer and byte length.
pub fn fnv1a_bytes(ptr: addr, byte_len: uptr) -> u64, nc {
entry:
...
}

std/hash.nc:

NC
import "std/runtime" as rt
pub fn hash_string(s: str) -> uint {
// rt.str_data and rt.str_byte_len are runtime helpers
// that extract the raw byte pointer and byte length
// from an NC str value.
return fnv1a_bytes(rt.str_data(s), rt.str_byte_len(s))
}

The compiler resolves fnv1a_bytes by looking it up in the merged declaration set for std/hash. No special import syntax or extern declaration is needed on the NC side.

There is no implicit coercion between NC types and NCA types. An NC str is not silently convertible to addr. The NC wrapper must explicitly decompose high-level values into the scalar components that the NCA function expects. This is intentional: it keeps NCA’s type boundary explicit and avoids hidden magic in the calling convention.

For NCA functions that operate on simple scalar types (i64, u64, f64, bool, u8), NC code can call them directly when the NC type maps 1:1:

NC
// add64 is defined in .nca as: pub fn add64(a: u64, b: u64) -> u64, nc
uint result = add64(10u, 20u) // uint maps directly to u64

10.2 Calling NC from NCA

NCA code can call functions defined in .nc files. Because NCA has no import statement, NC functions are declared as extern in the .nca file with their lowered NCA signature.

std/os.nca:

nc 1
// Declared extern -- defined in std/strings.nc
extern fn nc_str_len(s: addr) -> u64, nc
pub fn example(s: addr) -> u64, nc {
entry:
%len = call nc_str_len(s)
ret %len
}

The extern tells the compiler this symbol is resolved at link time from another linked object. The mangled name must match; when both files are in the same build, the compiler can handle that transparently.

For functions in the same module path, the compiler can verify that the extern declaration’s signature matches the actual .nc definition and emit an error on mismatch. For functions in different modules, this becomes a link-time check.

10.3 Type Mapping

NC types cross the interop boundary through compiler-defined lowering rules. Primitive numeric and boolean types lower directly; higher-level NC types lower structurally or opaquely. Handwritten .nca code must use the NCA-side representation explicitly.

NC typeNCA representationNotes
inti64
uintu64
floatf64
boolbool8-bit, 0 or 1.
byteu8
charimplementation-definedNC defines char as a Unicode extended grapheme cluster. Handwritten NCA should treat it as opaque unless a future ABI section defines a canonical lowered form.
straddrstr is char[]. The safe stable representation for handwritten NCA is a pointer to the runtime string/array object, not a byte-string (ptr,len) pair.
T[] (dynamic array)addrPointer to the dynamic array object. The exact internal field layout is a compiler/runtime detail.
T[N] (fixed-size array)addrPassed by address. Length is known at compile time.
structaddrPassed by pointer. Fields accessed via offset_of.
enumu64 / i64 or addrSmall discriminant-only enums lower to scalars; payload-bearing forms are passed by pointer.
(T, U) (tuple)Multiple scalars or addrSmall tuples may decompose into multi-value returns; larger tuples pass by pointer.
T? (optional)Discriminant + payload or addrSmall optionals may lower to a bool plus payload; larger forms pass by pointer.
erroraddrPointer to runtime error object; none lowers to addr.null.
(T, error)Scalar(s) + addrThe value part lowers normally; the error part is an address.
T! (throwing return)T + trap pathThrowing behavior lowers to control flow and runtime checks, not a distinct NCA signature shape.
fn valueaddr, addrCallable code pointer plus environment/context pointer for indirect calls.
mapaddrPointer to runtime map object. Opaque at the NCA level.

Arithmetic semantics differ between NC and NCA. NC integer arithmetic is checked by default: if an int or uint operation overflows, the program panics. The MIR -> NCA lowering pass implements this by emitting add.ov.i64 / sub.ov.i64 / mul.ov.i64 instructions followed by overflow-checking branches (see section 13.3). Handwritten NCA uses plain add.i64 / sub.i64 / mul.i64, which wrap silently modulo 2^64 with no checks. This is the most important semantic difference between NC code and handwritten NCA code operating on the “same” integer types. If overflow checking is needed in handwritten NCA, the author must use the checked variants (add.ov.*, sub.ov.*, mul.ov.*) and branch on the overflow flag explicitly.

10.4 Strings and Dynamic Arrays

str in NC is char[], and char is a Unicode extended grapheme cluster rather than a byte or fixed-width scalar. Likewise, T[] is NC’s dynamic array type, not a separate slice concept.

For handwritten .nca, the safe rule is:

This matters because the actual in-memory representation of char, str, and T[] is a compiler/runtime implementation detail. Handwritten NCA that needs string or array operations should prefer calling back into NC/runtime helper functions rather than decoding the representation directly.

10.5 Structs Across the Boundary

NC structs are always passed to and from NCA by pointer (addr). NCA code accesses fields using offset_of and typed loads/stores:

// NC:
// pub struct Point { x: float, y: float }
pub fn point_magnitude(p: addr) -> f64, nc {
entry:
%x_off = const.uptr offset_of(Point.x)
%y_off = const.uptr offset_of(Point.y)
%x_p = addr.add p, %x_off
%y_p = addr.add p, %y_off
%x = load.f64 %x_p
%y = load.f64 %y_p
%xx = fmul.f64 %x, %x
%yy = fmul.f64 %y, %y
%sum = fadd.f64 %xx, %yy
%mag = sqrt.f64 %sum
ret %mag
}

The NC compiler knows that point_magnitude takes a Point by pointer and generates the appropriate calling code at the call site. The .nca author is responsible for using the correct offset_of queries and field types.

10.6 What Cannot Cross the Boundary

Some NC features have no direct NCA representation and should not appear in handwritten .nca function signatures:

10.7 Mutability

NC distinguishes mutable (mut) and immutable bindings at the language level. The NC frontend enforces immutability constraints during type checking.

NCA has no concept of immutability. Any addr can be the target of a store instruction, and any stack slot can be written to at any time. The mut distinction does not survive lowering to NCA.

This means handwritten .nca code can mutate data that NC considers immutable. This is by design: NCA is an unsafe low-level layer, and restricting stores based on NC-level mutability would add complexity to the NCA validator without meaningful safety benefit, since NCA already permits arbitrary pointer arithmetic and raw memory access.

Authors of handwritten .nca code should respect the mutability contracts of the NC APIs they interact with, even though the NCA compiler does not enforce them.

10.8 Visibility Symmetry

The visibility model is symmetric:

There is no way for .nca code to access private NC functions, and no way for NC code to access file-local NCA functions. This preserves encapsulation in both directions.

10.9 Name Collision Rules

Within a single module path (for example, std/hash), symbol names across the merged .nc / .nca source set must be consistent, subject to when-predicate resolution (section 4.6).

Same name, same signature:

Same name, different signatures (overloading):

NC supports function overloading: multiple functions can share a name if their parameter types differ. The compiler disambiguates overloads during type checking, and each overload gets a distinct mangled symbol because <signature_hash> differs.

Handwritten .nca files may not declare multiple overloads of the same name. NCA has no type inference or overload resolution; every call is resolved by exact name. If you need to provide NCA implementations for multiple NC overloads, give each NCA function a distinct name and have the NC overloads dispatch to them:

std/convert.nc
pub fn to_str(int v) -> str { return int_to_str_impl(v) }
pub fn to_str(float v) -> str { return float_to_str_impl(v) }
std/convert.nca
pub fn int_to_str_impl(v: i64) -> addr, nc { ... }
pub fn float_to_str_impl(v: f64) -> addr, nc { ... }

Compiler-generated NCA from overloaded NC functions is unaffected by this restriction, since the compiler produces distinct mangled names for each overload automatically.

11. Program Entry and Root-Level Code

NCA does not define module init functions, init blocks, or a magic source-level main function. In NC, program execution starts from root-level code, and the frontend lowers that root-level code into ordinary compiler-generated NCA in source order.

The backend still synthesizes a platform-specific process entry stub (for example, _start on Linux or an LC_MAIN entry on Darwin), but the exact compiler/runtime handoff symbols are intentionally outside the scope of this proposal. Those details are internal runtime contracts, not part of handwritten NCA source compatibility.

Portable handwritten .nca code may still call documented runtime helpers when they exist, but this proposal standardizes the NCA surface language, not the full runtime symbol table.

12. Target Profiles

A target profile encodes all platform-specific parameters needed to lower NCA to an executable. Profiles are built into ncc and selected at compile time.

12.1 Supported Targets (v1)

ProfileArchOSObject FormatPointer Width
linux-amd64amd64linuxELF6464
linux-arm64arm64linuxELF6464
darwin-amd64amd64darwinMach-O 6464
darwin-arm64arm64darwinMach-O 6464

12.2 Profile Contents

Each profile defines:

NCA extern declarations specify the symbol name and signature but not which library provides the symbol. Library resolution is a build-level concern.

ncc links against a minimal default set of libraries per target profile:

ProfileDefault link libraries
linux-amd64libc.so.6
linux-arm64libc.so.6
darwin-amd64libSystem.B.dylib
darwin-arm64libSystem.B.dylib

Additional libraries are specified via build configuration (for example, ncc build -l png -l z). The compiler does not validate at compile time whether an extern symbol exists in any linked library; unresolved symbols are reported at link time.

For extern symbols that are part of the C standard library (malloc, free, exit, memcpy, strlen, etc.), the default link libraries are sufficient on all current targets.

13. Compiler Pipeline

NC Source (.nc)
|
v
[Frontend: parse, typecheck, desugar]
|
v
NC HIR (high-level IR)
|
v
[Lowering: monomorphize, inline, optimize]
|
v
NC MIR (mid-level IR)
|
v
[NCA Emit: lower MIR to NCA, resolve layouts, flatten aggregates]
|
v
NCA IR (typed, virtual-register, block-structured) <-- handwritten .nca enters here
|
v
[NCA Validation: type check, CFG verify, terminator check]
|
v
[Target-Independent Passes: constant fold, DCE, simple CSE]
|
v
[Legalization: widen/narrow illegal types, expand unsupported ops]
|
v
[Instruction Selection: pattern-match NCA ops to target instructions]
|
v
[Register Allocation: virtual -> physical, spill/reload]
|
v
[Prologue/Epilogue: frame setup, callee-save, stack adjustment]
|
v
Machine IR (physical registers, concrete instructions)
|
v
[Binary Emission: encode instructions, resolve relocations]
|
v
Object File (.o equivalent, in-memory)
|
v
[Internal Linker: merge objects, resolve symbols, build executable]
|
v
Executable (ELF or Mach-O)

13.1 NCA Validation Pass

Before any lowering, the validator checks:

13.2 Optimization Passes

Target-independent passes on NCA IR:

These are deliberately conservative. Heavy optimization happens at the MIR level for compiler-generated code. For handwritten NCA, the author is assumed to know what they want.

13.3 Safety Check Lowering Patterns

NC’s memory safety guarantees are enforced by the MIR -> NCA lowering pass, which inserts explicit checks as normal control flow. By the time code reaches NCA, all safety checks are visible as ordinary branches and calls — there are no implicit traps.

Bounds check pattern:

// NC source: arr[i]
// NCA lowering:
%len = load.uptr %arr_len_p
%oob = cmp.ge.uptr %i, %len
br %oob, panic_oob, access_ok(%i)
panic_oob:
// pass source location constants to __nc_panic_bounds
%file = addr.of __nc_srcfile_3
%line = const.u32 42
call __nc_panic_bounds(%i, %len, %file, %line)
unreachable
access_ok(%idx: uptr):
%off = mul.uptr %idx, 8
%elem = addr.add %arr_data, %off
%val = load.i64 %elem
...

Nil check pattern:

// NC source: obj.field (where obj is nullable)
%is_nil = cmp.eq.addr %obj, addr.null
br %is_nil, panic_nil, deref_ok
panic_nil:
call __nc_panic_nil(%file, %line)
unreachable
deref_ok:
%val = load.i64 %obj
...

Integer overflow check (for checked NC arithmetic):

%result, %ov = add.ov.i64 %a, %b
br %ov, panic_overflow, continue(%result)
panic_overflow:
call __nc_panic_overflow(%file, %line)
unreachable
continue(%r: i64):
...

This explicit lowering means handwritten .nca code is unchecked by default. Authors opting to write NCA directly take responsibility for memory safety, just as with C. The compiler does not insert bounds checks, nil checks, or overflow checks into handwritten NCA.

Note on runtime symbols. The symbols used in the examples above (__nc_panic_bounds, __nc_panic_nil, __nc_panic_overflow, __nc_srcfile_*) are illustrative names showing the pattern of lowering, not a stable API. The actual runtime symbol names, signatures, and calling conventions are internal contracts between the compiler and the NC runtime. They may change between compiler versions without notice.

Handwritten .nca code should not call these symbols directly. If a handwritten NCA function needs to signal a panic, it should call a documented, stable runtime helper when one is defined, or use the trap instruction for an immediate abort without a stack trace.

13.4 ncc Command-Line Interface

The compiler exposes NCA-related functionality through the following flags:

FlagDescription
ncc buildCompile and link all .nc and .nca files into an executable.
ncc build --target <profile>Cross-compile for a specific target profile.
-emit-ncaDump the NCA IR to stdout or a .nca file after MIR lowering.
-emit-nca=optimizedDump NCA after target-independent optimization passes.
-emit-asmDump target textual assembly after instruction selection.
-O0 / -O1 / -O2Optimization level. -O0 disables optimizations, zero-fills stack, emits frame pointers everywhere.
-gEmit DWARF debug information.
-WtailcallWarn when a tailcall is rejected by the backend.
--dump-target <profile>Print the target profile’s register layout, ABI, and feature set.

13.5 Testing and Verification Strategy

NCA’s position in the compiler pipeline makes it a natural test boundary. The recommended testing approach:

Round-trip parsing. Every .nca file emitted by -emit-nca must parse back into an identical AST. This catches serialization bugs and ensures the textual format is canonical. Run as: ncc -emit-nca foo.nc | ncc -parse-nca -emit-nca and diff.

Validation fuzzing. The NCA validator should reject all malformed inputs without crashing. Fuzz the parser and validator with AFL/libFuzzer on randomized .nca inputs. This is especially important because handwritten .nca files are untrusted input to the compiler.

Semantic test suite. A library of small .nca programs with known outputs, compiled and executed on every supported target. Each test exercises a specific instruction or combination: arithmetic wrapping, comparison semantics by type, atomic ordering, block parameter passing, tail calls, etc.

Cross-target equivalence. For every test in the semantic suite, verify that the output is identical across all four target profiles. This is the core portability guarantee.

Instruction selection coverage. Track which NCA operations have been exercised by the test suite per target. Untested op/target combinations are flagged in CI.

ABI conformance. For c functions, generate NCA wrappers that call C test harnesses and verify that arguments and return values are passed correctly. This catches ABI mismatches between ncc and the platform C compiler.

14. Binary Emission

ncc emits final executables directly without invoking an external assembler or linker. The output is a position-independent executable (PIE) by default on all targets.

14.1 ELF (Linux)

The emitted ELF binary contains:

SectionContents
.textExecutable code.
.rodataRead-only data (constants, string literals).
.dataInitialized mutable data.
.bssZero-initialized mutable data.
.nc.lineNC source location line table.
.symtabSymbol table (debug builds).
.strtabString table for symbols.
.dynamicDynamic linking metadata.
.rela.dynRELA relocations for the dynamic linker.
.gotGlobal offset table for imported symbols.

Program headers:

Relocations use RELA format on both amd64 and arm64. Internal references use PC-relative addressing. Imported symbols use GOT-indirect loads with R_X86_64_GLOB_DAT / R_AARCH64_GLOB_DAT relocations resolved eagerly at load time.

14.2 Mach-O (Darwin)

The emitted Mach-O binary contains:

Segment/SectionContents
__TEXT,__textExecutable code.
__TEXT,__constRead-only data.
__TEXT,__nc_lineNC source location table.
__DATA,__dataInitialized mutable data.
__DATA,__bssZero-initialized data.
__DATA,__gotGlobal offset table.
__DATA,__la_symbol_ptrLazy symbol pointers.

Load commands include LC_SEGMENT_64 for each segment, LC_MAIN for the entry point offset, LC_LOAD_DYLIB for imported libraries (libSystem.B.dylib at minimum), LC_DYLD_INFO_ONLY for binding opcodes, and LC_UUID for build identification.

Darwin requires code signing for execution on arm64. ncc emits an ad-hoc LC_CODE_SIGNATURE with a valid CodeDirectory hash. No Apple Developer identity is needed for local execution.

15. Diagnostics and Debugging

15.1 Source Locations

NCA instructions can carry optional source location metadata:

%x = load.u8 %p !loc(3, 12)

Where !loc(line, col) references the current .nca file. For compiler-generated NCA from .nc source, the metadata references the original NC source location.

15.2 Line Table

The .nc.line section (ELF) or __TEXT,__nc_line section (Mach-O) contains a compact PC-to-source mapping. Format:

[file_index: u16] [line: u32] [col: u16] [pc_delta: u32]

This is sufficient for stack traces and basic debugging without full DWARF complexity. A -g flag causes ncc to emit DWARF .debug_info / .debug_line for use with standard debuggers.

15.3 Frame Pointers

In debug builds, all functions emit a frame pointer (RBP on amd64, X29 on arm64). In release builds, frame pointers may be omitted for functions the compiler proves do not need them unless frameptr is specified. The runtime’s stack unwinder uses frame pointers when available and falls back to the .nc.line table otherwise.

15.4 Panic and Trap

When NC code panics (bounds check failure, nil dereference, explicit panic()), the MIR lowers to a call to the runtime’s __nc_panic function. This function:

  1. Captures the current PC.
  2. Walks the stack using frame pointers.
  3. Maps PCs to source locations via the line table.
  4. Prints a stack trace to stderr.
  5. Calls __nc_rt_exit(1).

The trap instruction in NCA lowers to the platform’s trap/abort mechanism (ud2 on amd64, brk #1 on arm64).

15.5 Diagnostic Format

NCA compilation errors use the format:

<file>:<line>:<col>: error: <message>
<source line>
<caret indicator>

Example:

runtime/memops.nca:14:9: error: type mismatch: expected u64, got addr
%r = add.u64 %p, %offset
^~~~~~~

Warnings use the same format with warning: instead of error:. Diagnostics are printed to stderr and can be machine-parsed by editors and CI tools.

Multiple errors are reported per source file where possible. The parser uses block-level recovery: on encountering a syntax error within a function body, it skips to the next block label or function boundary and continues parsing.

16. Examples

Short examples omit the nc <version> header for brevity. Complete .nca files must include it (see sections 4.1 and 16.6 for full-file examples).

16.1 Simple Arithmetic

pub fn add64(a: u64, b: u64) -> u64, nc {
entry:
%r = add.u64 a, b
ret %r
}

16.2 C Interop

extern fn puts(s: addr) -> i32, c
pub data hello : u8[14] rodata = c"Hello, world!"
pub fn greet(), nc {
entry:
%p = addr.of hello
%_ = call puts(%p)
ret
}

16.3 Target-Specialized Function

when arch.amd64 and feature.popcnt {
pub fn popcount(x: u64) -> u64, nc {
entry:
%r = popcnt.u64 x
ret %r
}
}
pub fn popcount(x: u64) -> u64, nc {
entry:
// Portable bit-twiddling fallback
%m1 = const.u64 0x5555555555555555
%m2 = const.u64 0x3333333333333333
%m4 = const.u64 0x0F0F0F0F0F0F0F0F
%h01 = const.u64 0x0101010101010101
%a = lshr.u64 x, 1
%b = and.u64 %a, %m1
%c = sub.u64 x, %b
%d = and.u64 %c, %m2
%e = lshr.u64 %c, 2
%f = and.u64 %e, %m2
%g = add.u64 %d, %f
%h = lshr.u64 %g, 4
%i = add.u64 %g, %h
%j = and.u64 %i, %m4
%k = mul.u64 %j, %h01
%r = lshr.u64 %k, 56
ret %r
}
}

16.4 Atomic Counter

pub fn atomic_inc(counter: addr) -> u64, nc {
entry:
%one = const.u64 1
%old = atomic.rmw.add.u64 counter, %one, order(seq_cst)
ret %old
}

16.5 Stack Slot Usage

pub fn sum_array(arr: addr, len: uptr) -> i64, nc {
stack acc : i64[1]
entry:
%acc_p = addr.of.stack acc
%zero = const.i64 0
store.i64 %acc_p, %zero
%izero = const.uptr 0
jmp loop(%izero)
loop(%i: uptr):
%done = cmp.ge.uptr %i, len
br %done, exit, body(%i)
body(%i: uptr):
%elem_off = mul.uptr %i, 8
%elem_p = addr.add arr, %elem_off
%val = load.i64 %elem_p
%acc_p2 = addr.of.stack acc
%cur = load.i64 %acc_p2
%new = add.i64 %cur, %val
store.i64 %acc_p2, %new
%next = add.uptr %i, 1
jmp loop(%next)
exit:
%acc_p3 = addr.of.stack acc
%result = load.i64 %acc_p3
ret %result
}

16.6 FNV-1a Hash (Realistic Hot Path)

A complete, portable FNV-1a 64-bit hash function suitable for hash table use:

nc 1
// FNV-1a constants
data fnv_offset : u64 rodata = 0xCBF29CE484222325
data fnv_prime : u64 rodata = 0x00000100000001B3
pub fn fnv1a_64(data: addr, len: uptr) -> u64, nc {
entry:
%hash0 = const.u64 0xCBF29CE484222325
%prime = const.u64 0x00000100000001B3
%zero = const.uptr 0
jmp loop(%zero, %hash0)
loop(%i: uptr, %hash: u64):
%done = cmp.ge.uptr %i, len
br %done, done(%hash), body(%i, %hash)
body(%i: uptr, %hash: u64):
%p = addr.add data, %i
%b = load.u8 %p
%b64 = u8.to.u64 %b
%xored = xor.u64 %hash, %b64
%hashed = mul.u64 %xored, %prime
%next = add.uptr %i, 1
jmp loop(%next, %hashed)
done(%result: u64):
ret %result
}

16.7 Spinlock (Atomics in Practice)

A simple test-and-test-and-set spinlock showing atomic operations and control flow:

nc 1
// Lock state: 0 = unlocked, 1 = locked
// lock_addr points to a u32 in shared memory.
pub fn spin_lock(lock_addr: addr), nc {
entry:
jmp try_acquire
try_acquire:
%expected = const.u32 0
%desired = const.u32 1
%old, %ok = cmpxchg.u32 lock_addr, %expected, %desired, order(acquire, relaxed)
br %ok, acquired, spin
spin:
// Test before retrying CAS (reduces cache line bouncing)
%current = load.u32 lock_addr, order(relaxed)
%still_locked = cmp.ne.u32 %current, %expected
br %still_locked, spin, try_acquire
acquired:
ret
}
pub fn spin_unlock(lock_addr: addr), nc {
entry:
%zero = const.u32 0
store.u32 lock_addr, %zero, order(release)
ret
}

17. Appendix A: Grammar Summary

EBNF
file = version_decl NL decl*
version_decl = "nc" INT_LIT
decl = fn_def | extern_decl | data_def | when_block
comment = "//" (any character except NL)* NL
when_block = "when" predicate "{" NL decl* "}"
predicate = pred_atom ("and" pred_atom)*
pred_atom = IDENT "." IDENT
| IDENT "." INT_LIT
fn_def = ["pub"] "fn" IDENT "(" params ")" ["->" type_list] "," callconv ["," attr_list]
"{" NL stack_decl* block+ "}"
extern_decl = "extern" "fn" IDENT "(" params ")" ["->" type_list] "," callconv
| "extern" "data" IDENT ":" type
data_def = ["pub"] "data" IDENT ":" type "[" [INT_LIT] "]"
[section_class] [align_attr] ["=" initializer]
params = (param ("," param)*)?
param = param_ident ":" type
param_ident = IDENT | decl_keyword
decl_keyword = "data" | "fn" | "pub" | "extern" | "stack" | "when"
type_list = type ("," type)*
type = "i8" | "u8" | "i16" | "u16" | "i32" | "u32" | "i64" | "u64"
| "f32" | "f64" | "iptr" | "uptr" | "bool" | "addr"
callconv = "nc" | "c"
attr_list = attr ("," attr)*
attr = "frameptr"
section_class = "rodata" | "data" | "bss" | "tls"
align_attr = "align" "(" INT_LIT ")"
stack_decl = "stack" IDENT ":" type "[" INT_LIT "]" ["," align_attr]
block = LABEL ["(" block_params ")"] ":" NL instruction+ terminator NL
block_params = param ("," param)*
instruction = [value_def] op NL
| conversion NL
| bulk_mem_op NL
| store_op NL
value_def = "%" IDENT "="
| "%" IDENT "," "%" IDENT "="
op = arith_op | bitwise_op | cmp_op | select_op
| float_op | load_op | addr_op
| const_op | call_op | atomic_op | fence_op
const_op = "const." type (INT_LIT | FLOAT_LIT)
| "addr.of" IDENT
| "addr.of.stack" IDENT
| "addr.null"
arith_op = arith_name "." type value "," value
| neg_op
| "uaddc." type value "," value "," value
| "usubb." type value "," value "," value
| "umulh." type value "," value
| "smulh." type value "," value
| "add.ov." type value "," value
| "sub.ov." type value "," value
| "mul.ov." type value "," value
arith_name = "add" | "sub" | "mul" | "udiv" | "sdiv" | "urem" | "srem"
neg_op = "neg." type value
bitwise_op = bitwise_name "." type value "," value
| unary_bit "." type value
bitwise_name = "and" | "or" | "xor" | "shl" | "lshr" | "ashr" | "rotl" | "rotr"
unary_bit = "not" | "bswap" | "clz" | "ctz" | "popcnt"
cmp_op = "cmp." cmp_kind "." type value "," value
cmp_kind = "eq" | "ne" | "lt" | "le" | "gt" | "ge"
| "oeq" | "une" | "olt" | "ole" | "ogt" | "oge" | "ord" | "uno"
select_op = "select." type value "," value "," value
float_op = float_binary "." ftype value "," value
| float_unary "." ftype value
float_binary = "fadd" | "fsub" | "fmul" | "fdiv" | "frem" | "copysign" | "fmin" | "fmax"
float_unary = "fneg" | "fabs" | "sqrt"
conversion = "%" IDENT "=" type ".to." type value
load_op = "load." type value
| "load." type value "," "order(" ordering ")"
| "load.le." type value
| "load.be." type value
| "load.unaligned." type value
store_op = "store." type value "," value
| "store." type value "," value "," "order(" ordering ")"
| "store.le." type value "," value
| "store.be." type value "," value
| "store.unaligned." type value "," value
bulk_mem_op = "memcpy" value "," value "," value
| "memmove" value "," value "," value
| "memset" value "," value "," value
addr_op = "addr.add" value "," value
| "addr.sub" value "," value
call_op = "call" IDENT "(" args ")"
| "call.indirect" value "(" args ")" "->" type_list "," callconv
atomic_op = "atomic.rmw." atomic_rmw_kind "." type value "," value "," "order(" ordering ")"
| "cmpxchg." type value "," value "," value "," "order(" ordering "," ordering ")"
atomic_rmw_kind = "add" | "sub" | "and" | "or" | "xor" | "xchg"
fence_op = "fence" "order(" ordering ")"
ordering = "relaxed" | "acquire" | "release" | "acq_rel" | "seq_cst"
ftype = "f32" | "f64"
terminator = "jmp" LABEL ["(" args ")"]
| "br" value "," LABEL ["(" args ")"] "," LABEL ["(" args ")"]
| "switch" value "," "default" LABEL "[" switch_arm* "]"
| "ret" [value ("," value)*]
| "tailcall" IDENT "(" args ")"
| "trap"
| "unreachable"
switch_arm = INT_LIT "->" LABEL ","
args = (value ("," value)*)?
value = "%" IDENT | IDENT | INT_LIT | FLOAT_LIT
initializer = simple_init | "[" NL? init_elem ("," NL? init_elem)* ","? NL? "]"
init_elem = value | "addr.of" IDENT
simple_init = INT_LIT | FLOAT_LIT | STRING_LIT
INT_LIT = ["-"] DIGITS
| "0x" HEX_DIGITS
| "0o" OCT_DIGITS
| "0b" BIN_DIGITS
(* underscores allowed between digits for readability *)
FLOAT_LIT = DIGITS "." DIGITS [("e" | "E") ["+" | "-"] DIGITS]
STRING_LIT = '"' (CHAR | ESCAPE)* '"'
ESCAPE = "\\" ("n" | "t" | "r" | "\\" | "0" | '"' | "x" HEX HEX)

Keyword scoping. The keywords data, fn, pub, extern, stack, and when are reserved only when they appear at the start of a top-level declaration or inside a when block. In parameter position, value-name position (%data), and label position, they are treated as ordinary identifiers. All other keywords (type names, opcodes, terminators) are globally reserved.

This grammar is intended to be complete for all instructions defined in section 7. If a future version adds new instructions, the grammar must be updated in tandem. The instruction set (section 7) is the normative reference; this grammar is a derived summary for parser implementors.

Comments may appear on their own line or at the end of an instruction line. The parser discards them before processing.

18. Appendix B: Reserved Keywords

The following identifiers are reserved and may not be used where the grammar requires a keyword. Most are reserved globally. Top-level declaration keywords (data, fn, pub, extern, stack, when) are reserved only in declaration position and may still be used as parameter names, value names, and labels.

Structural: nc, fn, pub, extern, data, stack, when, and

Types: i8, u8, i16, u16, i32, u32, i64, u64, f32, f64, iptr, uptr, bool, addr

Qualifiers: c, frameptr

Section classes: rodata, bss, tls

Terminators: jmp, br, switch, ret, tailcall, trap, unreachable

Operations: const, add, sub, mul, udiv, sdiv, urem, srem, neg, and, or, xor, not, shl, lshr, ashr, bswap, clz, ctz, popcnt, rotl, rotr, cmp, select, fadd, fsub, fmul, fdiv, frem, fneg, fabs, sqrt, copysign, fmin, fmax, to, load, store, memcpy, memmove, memset, call, fence

Reserved for future use: vec, v128, v256, inline, volatile, restrict, yield, await, async, import, module, type, struct, enum, union, match, for, while, if, else, ref, mut

19. Appendix C: Design Rationale

19.1 Why Not LLVM IR?

LLVM IR is the most mature option for a low-level target, but it has significant drawbacks for NC’s use case:

19.2 Why Not QBE or Cranelift?

QBE is attractively simple (~12K lines of C), but it targets only amd64 and arm64 with limited optimization, no atomics, and no Windows support. Its type system (word/long/single/double) is too coarse for NCA’s needs, and it has no concept of calling convention control or target predicates.

Cranelift (used by Wasmtime and rustc_codegen_cranelift) is more capable, but it is a Rust library with its own IR (CLIF). Depending on it would introduce a Rust build dependency into ncc and tie NC’s codegen quality to Cranelift’s development priorities.

Both are excellent projects, but NC’s goal of a self-contained, C++-based toolchain with full control over codegen makes a custom backend the right choice. NCA is deliberately simpler than LLVM IR and can be implemented incrementally.

19.3 Why Virtual Registers Instead of Go-Style Named Physical Registers?

Go’s assembler uses pseudo-registers (FP, SP, SB, PC) that map to physical registers per target. This is a pragmatic choice for an existing ecosystem, but it still exposes architecture shape: you write MOVQ AX, BX and the assembler knows what AX and BX mean per target.

NCA removes physical registers entirely because:

The tradeoff is that NCA authors cannot force a specific register assignment for performance tuning. In practice, the cases where this matters (specific ABI entry points, interrupt handlers) are exactly the cases that should use raw asm().

19.4 Why SSA with Block Parameters Instead of Phi Nodes?

Block parameters (also called block arguments) and phi nodes are semantically equivalent — they both represent value merging at control flow join points. Block parameters are chosen because:

This is the same design used by MLIR, Cranelift (CLIF), and Swift’s SIL.

20. Open Questions

  1. Should nc register mappings match c on all current targets? This simplifies the initial implementation at the cost of future flexibility. The Go approach (ABI0 = C-like) worked for over a decade before they added ABIInternal.

  2. Is there demand for inline NCA within .nc files? The file-based model is cleaner, but small escape hatches within NC functions could reduce friction. Deferred to a future phase.

  3. Should NCA support variadic c calls in v1? Currently excluded. If the runtime needs printf-like C calls, a small C shim is the workaround. Revisit if this becomes a pain point.

  4. TLS support timeline. Thread-local storage (tls section class) is declared in the grammar but requires per-target TLS model support in the backend. Priority depends on NC’s concurrency story.

  5. Should ncc support emitting relocatable .o files in addition to full executables? This would enable linking with C object files and static libraries. Currently only full executable output is specified.

  6. Should NCA define a stable binary serialization format? The textual format is the canonical representation, but a compact binary encoding would speed up incremental compilation for large projects. This could be as simple as a 1:1 binary encoding of the AST with a magic number and version header.

  7. What is the error recovery strategy for the NCA parser? For compiler-generated NCA, errors indicate compiler bugs and should abort. For handwritten .nca, the parser should attempt to recover and report multiple diagnostics per file. The question is how much effort to invest in recovery quality for v1.

  8. Should when predicates support or and not in addition to and? Currently only and is supported, which covers the common cases (arch + feature). Disjunction (or) and negation (not) add expressiveness but complicate predicate specificity ranking.

  9. Should NCA support volatile loads and stores? Memory-mapped I/O requires loads and stores that are never optimized away or reordered. The current spec does not distinguish volatile from non-volatile access. Adding load.volatile.<type> and store.volatile.<type> would be straightforward.

  10. If NC eventually needs explicit shared-library exports, should that live only in the build/package layer or also be mirrored in .nca source? This proposal currently keeps export policy out of the core NCA syntax.

21. Appendix D: Undefined Behavior Summary

NCA aims to minimize undefined behavior, but some is unavoidable in a low-level language. This is the complete list of operations that produce undefined behavior in NCA:

CategoryOperation
Divisionudiv, sdiv, urem, srem with a zero divisor.
Divisionsdiv of INT_MIN / -1 (signed overflow).
Alignmentload or store (non-unaligned variants) at a misaligned address.
Bool invariantA bool value containing any bit pattern other than 0 or 1.
ProvenanceDereferencing an addr with no valid provenance (freed memory, fabricated address pointing to unmapped pages).
Stack lifetimeUsing an addr.of.stack result after the owning function has returned.
Data racesConcurrent non-atomic access to the same location where at least one is a write.
UnreachableExecuting the unreachable instruction.
Uninitialized memoryReading from a stack slot that has not been written to (release builds only; debug builds zero-fill).

Explicitly NOT undefined behavior:

This list is intentionally smaller than C’s or LLVM IR’s UB surface. The goal is that a correct NCA program can be reasoned about locally: if the inputs to an instruction are well-defined and the preconditions in the table above are met, the output is deterministic.

22. Appendix E: Comparison with Alternatives

PropertyNCAGo ASMLLVM IRQBE ILWASM
Physical registers in sourceNoYes (pseudo-mapped)NoNoNo
Target-independent sourceYesPartially (per-arch files)YesYesYes
SSA formBlock parametersNo (imperative)Phi nodesPhi nodesStructured control flow
Aggregate types as valuesNo (memory only)N/AYesNoNo
Calling convention controlnc, cABI0 / ABIInternalMany (ccc, fastcc, etc.)N/ASingle convention
Target predicateswhen blocksBuild tags + file namingTarget triplesN/AFeature detection
AtomicsC11 modelVia runtimeC11 modelNoneC11 model
Direct binary emissionYes (planned)Yes (via Go toolchain)Via backendsVia system assemblerVia engine
Standalone languageYes (.nca files)Yes (.s files)Yes (.ll files)Yes (.ssa files)Yes (.wat files)
Implementation complexityMedium (~30-50K LOC est.)Part of Go toolchainVery high (~30M LOC)Low (~12K LOC)Varies by engine
undef/poison valuesNoN/AYesNoNo