Overview*

1. Overview of NPU Hardware*

The NPU processor is specifically designed for IoT artificial intelligence, aimed at accelerating the computation of neural networks and addressing the inefficiency of traditional chips in neural network operations. GX8002 is a low-power AI chip launched by Hangzhou Guoxin, with advantages such as small size, low power consumption, and low cost. It contains a high-performance, low-power NPU processor internally. The NPU processor includes sub-modules such as matrix multiplication, convolution, general computation, copying, decompression, etc.

2. Toolchain and API*

Guoxin Neural Network Processing Unit Compiler (gxnpuc) is a model conversion tool for heterogeneous computing architectures. This compilation tool is used in a Linux environment and converts network models from open-source frameworks into offline model files compatible with Guoxin AI processors.

Developers can deploy offline models on the GX8002 computing platform for inference using the NPU-related API provided by the LPV framework. This enables the development of applications such as speech recognition and object detection.

3. Steps to Use*

Generate network model files in the specified format on a PC using supported open-source framework.

Currently supported deep learning frameworks include TensorFlow and PyTorch. For specific conversion processes to the designated model format, refer to the model compilation examples: TensorFlow Example, PyTorch Example.
Write a model conversion configuration file. Use the gxnpuc model conversion tool to compile open-source framework network models into offline model files supported by the NPU.
Invoke NPU-related APIs to complete operations such as resource loading, data transfer, model inference, and resource release, enabling the development of relevant applications.

4. List of TensorFlow op operators supported by NPU of GX8002*

OP name	Restrictions
Abs
Add
AddV2
AvgPool	The data format only supports NHWC The range of pooling windows H and W must be 1-15 Pooling windows H and W cannot both be 1 Pooling windows H and stride_h must be the same, pooling Window W and stride_w must be the same
BatchMatMulV2	The product of H and W of the second parameter (weight) rounded up to 32 must be less than 65536
BatchToSpaceND	Only supports conversion to convolution dilation>1
BiasAdd
Concat	Dimension information must be determined at compile time
ConcatV2	Dimension information must be determined at compile time
Const
Conv2D	The value of the second parameter (weight) must be determined at compile time The data format only supports NHWC When the input channel is 1, only VALID is supported, and the convolution kernel H*W<=49, H<=11 , W<=11, stride<=4 When the input channel is not 1, the convolution kernel H<=15, W<=15, stride<=15
DepthwiseConv2dNative	The second parameter (weight) value must be determined at compile time The data format only supports NHWC Only supports VALID, convolution kernel H*W<=49, H<=11, W<=11, stride<=4
Div
Exp
ExpandDims	The value of the second parameter must be determined at compile time
FusedBatchNorm
FusedBatchNormV2
FusedBatchNormV3
Identity
Log
MatMul	The product of H and W of the second parameter (weight) rounded up to 32 must be less than 65536
MaxPool	The data format only supports NHWC The range of pooling windows H and W must be 1-15 Pooling windows H and W cannot both be 1 Pooling window H and stride_h must be the same, pooling Window W and stride_w must be the same
Mean	Dimension information must be determined at compile time
Mul
Neg
Pack
Pad
Placeholder
Pow	The value of the second parameter (exponent) must be determined at compile time The first parameter (data) must be greater than 0
RealDiv
Reciprocal
Relu
Relu6
Reshape	The value of the second parameter must be determined at compile time
Rsqrt
Selu
Shape
Sigmoid
Slice	Dimension information must be determined at compile time
SpaceToBatchND	Only supports conversion to convolution dilation>1
Split
Sqrt
Square
SquaredDifference	The two input Tensors must have the same shape
Squeeze
StridedSlice	Dimension information must be determined at compile time
Sub
Sum	Dimension information must be determined at compile time
Tanh
Transpose	The value of the second parameter must be determined at compile time Only supports two-dimensional transpose or can be regarded as a two-dimensional transpose operation

5. List of PyTorch op operators supported by NPU of GX8002*

op type	support torch api	Limitations
Conv2d	1. torch.nn.Conv2d 2. torch.nn.functional.conv2d	Conv2d kernel_h and kernel_w must <= 15 Conv2d stride_h and stride_w must <= 15 Conv2d dilation_h and dilation_w must <= 15
DepthwiseConv2d	1. torch.nn.Conv2d 2. torch.nn.functional.conv2d	DepthwiseConv2d kernel_h and kernel_w must <= 11 DepthwiseConv2d kernel_h * kernel_w must <= 49 DepthwiseConv2d stride_h and stride_w must <= 4 DepthwiseConv2d dilation_h and dilation_w must == 1 DepthwiseConv2d don't supported padding
Conv1d	1. torch.nn.Conv1d 2. torch.nn.functional.conv1d	Conv1d stride must <= 15 Conv1d dilation must <= 15 Conv1d kernel must <= 15
DepthwiseConv1d	1. torch.nn.Conv1d 2. torch.nn.functional.conv1d	DepthwiseConv1d stride size must <= 4 DepthwiseConv1d dilation size must == 1 DepthwiseConv1d kernel must <= 11 DepthwiseConv1d don't supported padding
MaxPool2d	1. torch.nn.MaxPool2d 2. torch.nn.functional.max_pool2d	MaxPool2d kernel_h and kernel_w must <= 15 MaxPool2d kernel_h and kernel_w shouldn't be both == 1 MaxPool2d kernel_h must be equal with stride_h MaxPool2d kernel_w must be equal with stride_w MaxPool2d input height must be divisible by kernel_h MaxPool2d input width must be divisible by kernel_w MaxPool2d dilation_h and dilation_w must == 1
AvgPool2d	1. torch.nn.AvgPool2d 2. torch.nn.functional.avg_pool2d	MaxPool2d kernel_h and kernel_w must <= 15 MaxPool2d kernel_h and kernel_w shouldn't be both == 1 MaxPool2d kernel_h must be equal with stride_h MaxPool2d kernel_w must be equal with stride_w MaxPool2d input height must be divisible by kernel_h MaxPool2d input width must be divisible by kernel_w MaxPool2d dilation_h and dilation_w must == 1
Relu	1. torch.nn.ReLU 2. torch.nn.functional.relu
Relu6	1. torch.nn.ReLU6 2. torch.nn.functional.relu6
PRelu	1. torch.nn.PReLU 2. torch.nn.functional.prelu
Selu	1. torch.nn.SELU 2. torch.nn.functional.selu
HardTanh	1. torch.nn.Hardtanh 2. torch.nn.functional.hardtanh	HardTanh min_val param must be 0
Sigmoid	1. torch.nn.Sigmoid 2. torch.nn.functional.sigmoid
Tanh	1. torch.nn.Tanh 2. torch.nn.functional.tanh
Flatten	1. torch.nn.Flatten 2. torch.flatten	only support reshaping input tensor into a one-dimensional tensor
Linear	1. torch.nn.Linear 2. torch.nn.functional.linear
Permute	1. torch.permute 2. Tensor.permute 3. torch.transpose 4. Tensor.transpose
BatchNorm2d	1. torch.nn.BatchNorm2d 2. torch.nn.functional.batch_norm
BatchNorm1d	1. torch.nn.BatchNorm1d 2. torch.nn.functional.batch_norm
Pad	1. torch.nn.ZeroPad2d 2. torch.nn.ConstantPad2d 3. torch.nn.ConstantPad1d	Pad input tensor dimensions must <= 4 Pad value must be 0
Reshape	1. torch.reshape 2. Tensor.reshape
Concat	1. torch.concatenate 2. torch.concat 3. torch.cat
Squeeze	1. torch.squeeze
UnSqueeze	1. torch.unsqueeze
Add	1. torch.add 2. + operator	Add output tensor dimensions must <= 5
Mul	1. torch.mul 2. * operator 3. torch.multiply	Mul output tensor dimensions must <= 5
Sub	1. torch.sub 2. - operator 3. torch.subtract	Sub output tensor dimensions must <= 5
Div	1. torch.div 2. / operator 3. torch.divide	Div output tensor dimensions must <= 5
Slice	1. Tensor[x0:y0, ..., xn:yn]
ReduceSum	1. torch.sum
ReduceMean	1. torch.mean
Exp	1. torch.exp
Log	1. torch.log	new tensor with the natural logarithm of the elements of input.
Sqrt	1. torch.sqrt
Square	1. torch.square
Reciprocal	1. torch.reciprocal
Neg	1. torch.neg 2. torch.negative
Rsqrt	1. torch.rsqrt
Abs	1. torch.abs 2. torch.absolute
Pow	1. torch.pow
UpSample	1. torch.nn.functional.upsample 2. torch.nn.functional.upsample_nearest	UpSample only support use scale_factor param UpSample scale_h and scale_w must be same UpSample input tensor dimension must be 4
Split	1. torch.split

6. List of op operators supported by NPU for GX8010/GX8009/GX8008*

OP name	Restrictions
Abs
Add
AddN
All	The value of the second parameter must be determined at compile time
Any	The value of the second parameter must be determined at compile time
Assert
AvgPool	The calculation efficiency is higher when the data format is NCHW stride<=63
BatchMatMul	The value of the second parameter (weight) must be determined at compile time
BatchToSpaceND	The value of the second and third parameters must be determined at compile time
BiasAdd
Cast	only supports compile-time evaluation
Concat	Dimension information must be determined at compile time
ConcatV2	Dimension information must be determined at compile time
Const
Conv2D	The value of the second parameter (weight) must be determined at compile time The calculation efficiency is higher when the data format is NCHW Convolution kernel H<=11, W<=11, H and W are equal and odd Higher efficiency, stride<=63
Conv2DBackpropInput	The value of the first and second parameters must be determined at compile time Only support the data format is NCHW Convolution kernel H<=11, W<=11, H and W are equal and odd numbers are more efficient , stride<=63
DepthwiseConv2dNative	The value of the second parameter (weight) must be determined at compile time Only support the data format as NCHW Convolution kernel H<=11, W<=11, H and W are equal and more efficient when they are odd high, stride<=63
Div
Enter
Equal
Exit
Exp
ExpandDims	The value of the second parameter must be determined at compile time
Fill	only supports compile-time evaluation
FloorDiv
FloorMod
Gather	The value of the second parameter must be determined at compile time
GatherV2	The value of the second and third parameters must be determined at compile time
Greater Equal
Identity
Less
LessEqual
ListDiff	only supports compile-time evaluation
Log
LogSoftmax	only supports compile-time calculations
LogicalAnd
LogicalNot
LogicalOr
LogicalXor
LoopCond
MatMul	The value of the second parameter (weight) must be determined at compile time
Max	The value of the second parameter must be determined at compile time
MaxPool	The calculation efficiency is higher when the data format is NCHW stride<=63
Maximum
Mean	The value of the second parameter must be determined at compile time
Merge
Min	The value of the second parameter must be determined at compile time
Minimum
Mul
Neg
NextIteration
Pack