Skip to content

Overview*

1. Overview of NPU Hardware*

The NPU processor is specifically designed for IoT artificial intelligence, aimed at accelerating the computation of neural networks and addressing the inefficiency of traditional chips in neural network operations. GX8002 is a low-power AI chip launched by Hangzhou Guoxin, with advantages such as small size, low power consumption, and low cost. It contains a high-performance, low-power NPU processor internally. The NPU processor includes sub-modules such as matrix multiplication, convolution, general computation, copying, decompression, etc.

2. Toolchain and API*

Guoxin Neural Network Processing Unit Compiler (gxnpuc) is a model conversion tool for heterogeneous computing architectures. This compilation tool is used in a Linux environment and converts network models from open-source frameworks into offline model files compatible with Guoxin AI processors.

Developers can deploy offline models on the GX8002 computing platform for inference using the NPU-related API provided by the LPV framework. This enables the development of applications such as speech recognition and object detection.

3. Steps to Use*

  1. Generate network model files in the specified format on a PC using supported open-source framework.

    Currently supported deep learning frameworks include TensorFlow and PyTorch. For specific conversion processes to the designated model format, refer to the model compilation examples: TensorFlow Example, PyTorch Example.

  2. Write a model conversion configuration file. Use the gxnpuc model conversion tool to compile open-source framework network models into offline model files supported by the NPU.

  3. Invoke NPU-related APIs to complete operations such as resource loading, data transfer, model inference, and resource release, enabling the development of relevant applications.

4. List of TensorFlow op operators supported by NPU of GX8002*

OP name Restrictions
Abs
Add
AddV2
AvgPool The data format only supports NHWC
The range of pooling windows H and W must be 1-15
Pooling windows H and W cannot both be 1
Pooling windows H and stride_h must be the same, pooling Window W and stride_w must be the same
BatchMatMulV2 The product of H and W of the second parameter (weight) rounded up to 32 must be less than 65536
BatchToSpaceND Only supports conversion to convolution dilation>1
BiasAdd
Concat Dimension information must be determined at compile time
ConcatV2 Dimension information must be determined at compile time
Const
Conv2D The value of the second parameter (weight) must be determined at compile time
The data format only supports NHWC
When the input channel is 1, only VALID is supported, and the convolution kernel H*W<=49, H<=11 , W<=11, stride<=4
When the input channel is not 1, the convolution kernel H<=15, W<=15, stride<=15
DepthwiseConv2dNative The second parameter (weight) value must be determined at compile time
The data format only supports NHWC
Only supports VALID, convolution kernel H*W<=49, H<=11, W<=11, stride<=4
Div
Exp
ExpandDims The value of the second parameter must be determined at compile time
FusedBatchNorm
FusedBatchNormV2
FusedBatchNormV3
Identity
Log
MatMul The product of H and W of the second parameter (weight) rounded up to 32 must be less than 65536
MaxPool The data format only supports NHWC
The range of pooling windows H and W must be 1-15
Pooling windows H and W cannot both be 1
Pooling window H and stride_h must be the same, pooling Window W and stride_w must be the same
Mean Dimension information must be determined at compile time
Mul
Neg
Pack
Pad
Placeholder
Pow The value of the second parameter (exponent) must be determined at compile time
The first parameter (data) must be greater than 0
RealDiv
Reciprocal
Relu
Relu6
Reshape The value of the second parameter must be determined at compile time
Rsqrt
Selu
Shape
Sigmoid
Slice Dimension information must be determined at compile time
SpaceToBatchND Only supports conversion to convolution dilation>1
Split
Sqrt
Square
SquaredDifference The two input Tensors must have the same shape
Squeeze
StridedSlice Dimension information must be determined at compile time
Sub
Sum Dimension information must be determined at compile time
Tanh
Transpose The value of the second parameter must be determined at compile time
Only supports two-dimensional transpose or can be regarded as a two-dimensional transpose operation

5. List of PyTorch op operators supported by NPU of GX8002*

op type support torch api Limitations
Conv2d 1. torch.nn.Conv2d
2. torch.nn.functional.conv2d
Conv2d kernel_h and kernel_w must <= 15
Conv2d stride_h and stride_w must <= 15
Conv2d dilation_h and dilation_w must <= 15
DepthwiseConv2d 1. torch.nn.Conv2d
2. torch.nn.functional.conv2d
DepthwiseConv2d kernel_h and kernel_w must <= 11
DepthwiseConv2d kernel_h * kernel_w must <= 49
DepthwiseConv2d stride_h and stride_w must <= 4
DepthwiseConv2d dilation_h and dilation_w must == 1
DepthwiseConv2d don't supported padding
Conv1d 1. torch.nn.Conv1d
2. torch.nn.functional.conv1d
Conv1d stride must <= 15
Conv1d dilation must <= 15
Conv1d kernel must <= 15
DepthwiseConv1d 1. torch.nn.Conv1d
2. torch.nn.functional.conv1d
DepthwiseConv1d stride size must <= 4
DepthwiseConv1d dilation size must == 1
DepthwiseConv1d kernel must <= 11
DepthwiseConv1d don't supported padding
MaxPool2d 1. torch.nn.MaxPool2d
2. torch.nn.functional.max_pool2d
MaxPool2d kernel_h and kernel_w must <= 15
MaxPool2d kernel_h and kernel_w shouldn't be both == 1
MaxPool2d kernel_h must be equal with stride_h
MaxPool2d kernel_w must be equal with stride_w
MaxPool2d input height must be divisible by kernel_h
MaxPool2d input width must be divisible by kernel_w
MaxPool2d dilation_h and dilation_w must == 1
AvgPool2d 1. torch.nn.AvgPool2d
2. torch.nn.functional.avg_pool2d
MaxPool2d kernel_h and kernel_w must <= 15
MaxPool2d kernel_h and kernel_w shouldn't be both == 1
MaxPool2d kernel_h must be equal with stride_h
MaxPool2d kernel_w must be equal with stride_w
MaxPool2d input height must be divisible by kernel_h
MaxPool2d input width must be divisible by kernel_w
MaxPool2d dilation_h and dilation_w must == 1
Relu 1. torch.nn.ReLU
2. torch.nn.functional.relu
Relu6 1. torch.nn.ReLU6
2. torch.nn.functional.relu6
PRelu 1. torch.nn.PReLU
2. torch.nn.functional.prelu
Selu 1. torch.nn.SELU
2. torch.nn.functional.selu
HardTanh 1. torch.nn.Hardtanh
2. torch.nn.functional.hardtanh
HardTanh min_val param must be 0
Sigmoid 1. torch.nn.Sigmoid
2. torch.nn.functional.sigmoid
Tanh 1. torch.nn.Tanh
2. torch.nn.functional.tanh
Flatten 1. torch.nn.Flatten
2. torch.flatten
only support reshaping input tensor into a one-dimensional tensor
Linear 1. torch.nn.Linear
2. torch.nn.functional.linear
Permute 1. torch.permute
2. Tensor.permute
3. torch.transpose
4. Tensor.transpose
BatchNorm2d 1. torch.nn.BatchNorm2d
2. torch.nn.functional.batch_norm
BatchNorm1d 1. torch.nn.BatchNorm1d
2. torch.nn.functional.batch_norm
Pad 1. torch.nn.ZeroPad2d
2. torch.nn.ConstantPad2d
3. torch.nn.ConstantPad1d
Pad input tensor dimensions must <= 4
Pad value must be 0
Reshape 1. torch.reshape
2. Tensor.reshape
Concat 1. torch.concatenate
2. torch.concat
3. torch.cat
Squeeze 1. torch.squeeze
UnSqueeze 1. torch.unsqueeze
Add 1. torch.add
2. + operator
Add output tensor dimensions must <= 5
Mul 1. torch.mul
2. * operator
3. torch.multiply
Mul output tensor dimensions must <= 5
Sub 1. torch.sub
2. - operator
3. torch.subtract
Sub output tensor dimensions must <= 5
Div 1. torch.div
2. / operator
3. torch.divide
Div output tensor dimensions must <= 5
Slice 1. Tensor[x0:y0, ..., xn:yn]
ReduceSum 1. torch.sum
ReduceMean 1. torch.mean
Exp 1. torch.exp
Log 1. torch.log new tensor with the natural logarithm of the elements of input.
Sqrt 1. torch.sqrt
Square 1. torch.square
Reciprocal 1. torch.reciprocal
Neg 1. torch.neg
2. torch.negative
Rsqrt 1. torch.rsqrt
Abs 1. torch.abs
2. torch.absolute
Pow 1. torch.pow
UpSample 1. torch.nn.functional.upsample
2. torch.nn.functional.upsample_nearest
UpSample only support use scale_factor param
UpSample scale_h and scale_w must be same
UpSample input tensor dimension must be 4
Split 1. torch.split

6. List of op operators supported by NPU for GX8010/GX8009/GX8008*

OP name Restrictions
Abs
Add
AddN
All The value of the second parameter must be determined at compile time
Any The value of the second parameter must be determined at compile time
Assert
AvgPool The calculation efficiency is higher when the data format is NCHW
stride<=63
BatchMatMul The value of the second parameter (weight) must be determined at compile time
BatchToSpaceND The value of the second and third parameters must be determined at compile time
BiasAdd
Cast only supports compile-time evaluation
Concat Dimension information must be determined at compile time
ConcatV2 Dimension information must be determined at compile time
Const
Conv2D The value of the second parameter (weight) must be determined at compile time
The calculation efficiency is higher when the data format is NCHW
Convolution kernel H<=11, W<=11, H and W are equal and odd Higher efficiency, stride<=63
Conv2DBackpropInput The value of the first and second parameters must be determined at compile time
Only support the data format is NCHW
Convolution kernel H<=11, W<=11, H and W are equal and odd numbers are more efficient , stride<=63
DepthwiseConv2dNative The value of the second parameter (weight) must be determined at compile time
Only support the data format as NCHW
Convolution kernel H<=11, W<=11, H and W are equal and more efficient when they are odd high, stride<=63
Div
Enter
Equal
Exit
Exp
ExpandDims The value of the second parameter must be determined at compile time
Fill only supports compile-time evaluation
FloorDiv
FloorMod
Gather The value of the second parameter must be determined at compile time
GatherV2 The value of the second and third parameters must be determined at compile time
Greater Equal
Identity
Less
LessEqual
ListDiff only supports compile-time evaluation
Log
LogSoftmax only supports compile-time calculations
LogicalAnd
LogicalNot
LogicalOr
LogicalXor
LoopCond
MatMul The value of the second parameter (weight) must be determined at compile time
Max The value of the second parameter must be determined at compile time
MaxPool The calculation efficiency is higher when the data format is NCHW
stride<=63
Maximum
Mean The value of the second parameter must be determined at compile time
Merge
Min The value of the second parameter must be determined at compile time
Minimum
Mul
Neg
NextIteration
Pack