NKIPy Tracing Architecture#
Overview#
NKIPy provides a Python-based interface for writing ML kernels that compile to AWS Neuron hardware. The tracing system translates tensor operations into HLO (High Level Operations) suitable for compilation to Neuron Executable File Format (NEFF).
Users write normal-looking Python/NumPy code, but during tracing, operations generate HLO instead of computing values.
How Tracing Works#
NKIPy uses a shallow layer design where operations dispatch directly to HLO construction. The tracing flow is straightforward:
User Function (Python/NumPy)
↓
Tracing Context Activated
↓
Arguments Replaced with Tensor Wrappers
↓
Operations Intercepted & Recorded as HLO
↓
HLO Module Generated
↓ (compilation)
Neuron Compiler (neuronx-cc)
↓
NEFF Binary
Tensor Wrappers#
When tracing begins, input arrays are replaced with special tensor wrapper objects that:
Intercept NumPy operations via
__array_function__and__array_ufunc__protocolsDispatch operations directly to HLO builders
Track metadata (shape, dtype, source location)
Generate HLO instead of computing values
This allows users to write familiar NumPy code while NKIPy captures the computation graph.
Operation Dispatch#
Operations are dispatched through a simple registry pattern:
NumPy function called on tensor wrapper (e.g.,
np.add(a, b))Wrapper’s
__array_function__intercepts the callRegistry looks up the corresponding HLO builder
HLO operation is emitted to the module
NKIPy also provides additional operations beyond NumPy (e.g., ops.matmul, ops.conv2d) that follow the same dispatch pattern.
Supported Operations#
Category |
Example Operations |
|---|---|
Elementwise Unary |
abs, exp, log, sin, cos |
Elementwise Binary |
add, multiply, subtract |
Comparison |
equal, less, greater |
Reduction |
sum, max, min, mean |
Shape |
transpose, reshape, broadcast |
Indexing |
take, gather, scatter |
Array Creation |
zeros, ones, full |
Linear Algebra |
matmul, conv2d, conv3d |
Distributed |
all_reduce, all_gather |
See the Operations Reference for a complete list.
Special Features#
IO Aliasing#
NKIPy supports in-place modification semantics for mutable parameters:
def kernel(x: tensor[mutable, float32, (N,)]):
x[:] = x + 1 # In-place modification
return x # Returns aliased input
The compiler respects aliasing information for memory optimization.
Source Location Tracking#
Python source locations (filename, line number) are embedded in HLO operations for debugging. This information is preserved through compilation and used for error reporting.
IO Naming#
Parameters and outputs can be named for stable NEFF interfaces:
@trace
def kernel(input_tensor):
...
return output_tensor
Names are preserved through compilation and used in NEFF for input/output identification.