NKIPy Tracing Architecture#

Overview#

NKIPy provides a Python-based interface for writing ML kernels that compile to AWS Neuron hardware. The tracing system translates tensor operations into HLO (High Level Operations) suitable for compilation to Neuron Executable File Format (NEFF).

Users write normal-looking Python/NumPy code, but during tracing, operations generate HLO instead of computing values.

How Tracing Works#

NKIPy uses a shallow layer design where operations dispatch directly to HLO construction. The tracing flow is straightforward:

User Function (Python/NumPy)
    ↓
Tracing Context Activated
    ↓
Arguments Replaced with Tensor Wrappers
    ↓
Operations Intercepted & Recorded as HLO
    ↓
HLO Module Generated
    ↓ (compilation)
Neuron Compiler (neuronx-cc)
    ↓
NEFF Binary

Tensor Wrappers#

When tracing begins, input arrays are replaced with special tensor wrapper objects that:

Intercept NumPy operations via __array_function__ and __array_ufunc__ protocols
Dispatch operations directly to HLO builders
Track metadata (shape, dtype, source location)
Generate HLO instead of computing values

This allows users to write familiar NumPy code while NKIPy captures the computation graph.

Operation Dispatch#

Operations are dispatched through a simple registry pattern:

NumPy function called on tensor wrapper (e.g., np.add(a, b))
Wrapper’s __array_function__ intercepts the call
Registry looks up the corresponding HLO builder
HLO operation is emitted to the module

NKIPy also provides additional operations beyond NumPy (e.g., ops.matmul, ops.conv2d) that follow the same dispatch pattern.

Supported Operations#

Category	Example Operations
Elementwise Unary	abs, exp, log, sin, cos
Elementwise Binary	add, multiply, subtract
Comparison	equal, less, greater
Reduction	sum, max, min, mean
Shape	transpose, reshape, broadcast
Indexing	take, gather, scatter
Array Creation	zeros, ones, full
Linear Algebra	matmul, conv2d, conv3d
Distributed	all_reduce, all_gather

See the Operations Reference for a complete list.

Special Features#

IO Aliasing#

NKIPy supports in-place modification semantics for mutable parameters:

def kernel(x: tensor[mutable, float32, (N,)]):
    x[:] = x + 1  # In-place modification
    return x      # Returns aliased input

The compiler respects aliasing information for memory optimization.

Source Location Tracking#

Python source locations (filename, line number) are embedded in HLO operations for debugging. This information is preserved through compilation and used for error reporting.

IO Naming#

Parameters and outputs can be named for stable NEFF interfaces:

@trace
def kernel(input_tensor):
    ...
    return output_tensor

Names are preserved through compilation and used in NEFF for input/output identification.

NKIPy Tracing Architecture

Contents