Overview*
1. Overview of NPU Hardware*
The NPU processor is specifically designed for IoT artificial intelligence, aimed at accelerating the computation of neural networks and addressing the inefficiency of traditional chips in neural network operations. GX8002 is a low-power AI chip launched by Hangzhou Guoxin, with advantages such as small size, low power consumption, and low cost. It contains a high-performance, low-power NPU processor internally. The NPU processor includes sub-modules such as matrix multiplication, convolution, general computation, copying, decompression, etc.
2. Toolchain and API*
Guoxin Neural Network Processing Unit Compiler (gxnpuc) is a model conversion tool for heterogeneous computing architectures. This compilation tool is used in a Linux environment and converts network models from open-source frameworks into offline model files compatible with Guoxin AI processors.
Developers can deploy offline models on the GX8002 computing platform for inference using the NPU-related API provided by the LPV framework. This enables the development of applications such as speech recognition and object detection.
3. Steps to Use*
-
Generate network model files in the specified format on a PC using supported open-source framework.
Currently supported deep learning frameworks include TensorFlow and PyTorch. For specific conversion processes to the designated model format, refer to the model compilation examples: TensorFlow Example, PyTorch Example.
-
Write a model conversion configuration file. Use the gxnpuc model conversion tool to compile open-source framework network models into offline model files supported by the NPU.
-
Invoke NPU-related APIs to complete operations such as resource loading, data transfer, model inference, and resource release, enabling the development of relevant applications.
4. List of TensorFlow op operators supported by NPU of GX8002*
OP name | Restrictions |
---|---|
Abs | |
Add | |
AddV2 | |
AvgPool | The data format only supports NHWC The range of pooling windows H and W must be 1-15 Pooling windows H and W cannot both be 1 Pooling windows H and stride_h must be the same, pooling Window W and stride_w must be the same |
BatchMatMulV2 | The product of H and W of the second parameter (weight) rounded up to 32 must be less than 65536 |
BatchToSpaceND | Only supports conversion to convolution dilation>1 |
BiasAdd | |
Concat | Dimension information must be determined at compile time |
ConcatV2 | Dimension information must be determined at compile time |
Const | |
Conv2D | The value of the second parameter (weight) must be determined at compile time The data format only supports NHWC When the input channel is 1, only VALID is supported, and the convolution kernel H*W<=49, H<=11 , W<=11, stride<=4 When the input channel is not 1, the convolution kernel H<=15, W<=15, stride<=15 |
DepthwiseConv2dNative | The second parameter (weight) value must be determined at compile time The data format only supports NHWC Only supports VALID, convolution kernel H*W<=49, H<=11, W<=11, stride<=4 |
Div | |
Exp | |
ExpandDims | The value of the second parameter must be determined at compile time |
FusedBatchNorm | |
FusedBatchNormV2 | |
FusedBatchNormV3 | |
Identity | |
Log | |
MatMul | The product of H and W of the second parameter (weight) rounded up to 32 must be less than 65536 |
MaxPool | The data format only supports NHWC The range of pooling windows H and W must be 1-15 Pooling windows H and W cannot both be 1 Pooling window H and stride_h must be the same, pooling Window W and stride_w must be the same |
Mean | Dimension information must be determined at compile time |
Mul | |
Neg | |
Pack | |
Pad | |
Placeholder | |
Pow | The value of the second parameter (exponent) must be determined at compile time The first parameter (data) must be greater than 0 |
RealDiv | |
Reciprocal | |
Relu | |
Relu6 | |
Reshape | The value of the second parameter must be determined at compile time |
Rsqrt | |
Selu | |
Shape | |
Sigmoid | |
Slice | Dimension information must be determined at compile time |
SpaceToBatchND | Only supports conversion to convolution dilation>1 |
Split | |
Sqrt | |
Square | |
SquaredDifference | The two input Tensors must have the same shape |
Squeeze | |
StridedSlice | Dimension information must be determined at compile time |
Sub | |
Sum | Dimension information must be determined at compile time |
Tanh | |
Transpose | The value of the second parameter must be determined at compile time Only supports two-dimensional transpose or can be regarded as a two-dimensional transpose operation |
5. List of PyTorch op operators supported by NPU of GX8002*
op type | support torch api | Limitations |
---|---|---|
Conv2d | 1. torch.nn.Conv2d 2. torch.nn.functional.conv2d |
Conv2d kernel_h and kernel_w must <= 15 Conv2d stride_h and stride_w must <= 15 Conv2d dilation_h and dilation_w must <= 15 |
DepthwiseConv2d | 1. torch.nn.Conv2d 2. torch.nn.functional.conv2d |
DepthwiseConv2d kernel_h and kernel_w must <= 11 DepthwiseConv2d kernel_h * kernel_w must <= 49 DepthwiseConv2d stride_h and stride_w must <= 4 DepthwiseConv2d dilation_h and dilation_w must == 1 DepthwiseConv2d don't supported padding |
Conv1d | 1. torch.nn.Conv1d 2. torch.nn.functional.conv1d |
Conv1d stride must <= 15 Conv1d dilation must <= 15 Conv1d kernel must <= 15 |
DepthwiseConv1d | 1. torch.nn.Conv1d 2. torch.nn.functional.conv1d |
DepthwiseConv1d stride size must <= 4 DepthwiseConv1d dilation size must == 1 DepthwiseConv1d kernel must <= 11 DepthwiseConv1d don't supported padding |
MaxPool2d | 1. torch.nn.MaxPool2d 2. torch.nn.functional.max_pool2d |
MaxPool2d kernel_h and kernel_w must <= 15 MaxPool2d kernel_h and kernel_w shouldn't be both == 1 MaxPool2d kernel_h must be equal with stride_h MaxPool2d kernel_w must be equal with stride_w MaxPool2d input height must be divisible by kernel_h MaxPool2d input width must be divisible by kernel_w MaxPool2d dilation_h and dilation_w must == 1 |
AvgPool2d | 1. torch.nn.AvgPool2d 2. torch.nn.functional.avg_pool2d |
MaxPool2d kernel_h and kernel_w must <= 15 MaxPool2d kernel_h and kernel_w shouldn't be both == 1 MaxPool2d kernel_h must be equal with stride_h MaxPool2d kernel_w must be equal with stride_w MaxPool2d input height must be divisible by kernel_h MaxPool2d input width must be divisible by kernel_w MaxPool2d dilation_h and dilation_w must == 1 |
Relu | 1. torch.nn.ReLU 2. torch.nn.functional.relu |
|
Relu6 | 1. torch.nn.ReLU6 2. torch.nn.functional.relu6 |
|
PRelu | 1. torch.nn.PReLU 2. torch.nn.functional.prelu |
|
Selu | 1. torch.nn.SELU 2. torch.nn.functional.selu |
|
HardTanh | 1. torch.nn.Hardtanh 2. torch.nn.functional.hardtanh |
HardTanh min_val param must be 0 |
Sigmoid | 1. torch.nn.Sigmoid 2. torch.nn.functional.sigmoid |
|
Tanh | 1. torch.nn.Tanh 2. torch.nn.functional.tanh |
|
Flatten | 1. torch.nn.Flatten 2. torch.flatten |
only support reshaping input tensor into a one-dimensional tensor |
Linear | 1. torch.nn.Linear 2. torch.nn.functional.linear |
|
Permute | 1. torch.permute 2. Tensor.permute 3. torch.transpose 4. Tensor.transpose |
|
BatchNorm2d | 1. torch.nn.BatchNorm2d 2. torch.nn.functional.batch_norm |
|
BatchNorm1d | 1. torch.nn.BatchNorm1d 2. torch.nn.functional.batch_norm |
|
Pad | 1. torch.nn.ZeroPad2d 2. torch.nn.ConstantPad2d 3. torch.nn.ConstantPad1d |
Pad input tensor dimensions must <= 4 Pad value must be 0 |
Reshape | 1. torch.reshape 2. Tensor.reshape |
|
Concat | 1. torch.concatenate 2. torch.concat 3. torch.cat |
|
Squeeze | 1. torch.squeeze | |
UnSqueeze | 1. torch.unsqueeze | |
Add | 1. torch.add 2. + operator |
Add output tensor dimensions must <= 5 |
Mul | 1. torch.mul 2. * operator 3. torch.multiply |
Mul output tensor dimensions must <= 5 |
Sub | 1. torch.sub 2. - operator 3. torch.subtract |
Sub output tensor dimensions must <= 5 |
Div | 1. torch.div 2. / operator 3. torch.divide |
Div output tensor dimensions must <= 5 |
Slice | 1. Tensor[x0:y0, ..., xn:yn] | |
ReduceSum | 1. torch.sum | |
ReduceMean | 1. torch.mean | |
Exp | 1. torch.exp | |
Log | 1. torch.log | new tensor with the natural logarithm of the elements of input. |
Sqrt | 1. torch.sqrt | |
Square | 1. torch.square | |
Reciprocal | 1. torch.reciprocal | |
Neg | 1. torch.neg 2. torch.negative |
|
Rsqrt | 1. torch.rsqrt | |
Abs | 1. torch.abs 2. torch.absolute |
|
Pow | 1. torch.pow | |
UpSample | 1. torch.nn.functional.upsample 2. torch.nn.functional.upsample_nearest |
UpSample only support use scale_factor param UpSample scale_h and scale_w must be same UpSample input tensor dimension must be 4 |
Split | 1. torch.split |
6. List of op operators supported by NPU for GX8010/GX8009/GX8008*
OP name | Restrictions |
---|---|
Abs | |
Add | |
AddN | |
All | The value of the second parameter must be determined at compile time |
Any | The value of the second parameter must be determined at compile time |
Assert | |
AvgPool | The calculation efficiency is higher when the data format is NCHW stride<=63 |
BatchMatMul | The value of the second parameter (weight) must be determined at compile time |
BatchToSpaceND | The value of the second and third parameters must be determined at compile time |
BiasAdd | |
Cast | only supports compile-time evaluation |
Concat | Dimension information must be determined at compile time |
ConcatV2 | Dimension information must be determined at compile time |
Const | |
Conv2D | The value of the second parameter (weight) must be determined at compile time The calculation efficiency is higher when the data format is NCHW Convolution kernel H<=11, W<=11, H and W are equal and odd Higher efficiency, stride<=63 |
Conv2DBackpropInput | The value of the first and second parameters must be determined at compile time Only support the data format is NCHW Convolution kernel H<=11, W<=11, H and W are equal and odd numbers are more efficient , stride<=63 |
DepthwiseConv2dNative | The value of the second parameter (weight) must be determined at compile time Only support the data format as NCHW Convolution kernel H<=11, W<=11, H and W are equal and more efficient when they are odd high, stride<=63 |
Div | |
Enter | |
Equal | |
Exit | |
Exp | |
ExpandDims | The value of the second parameter must be determined at compile time |
Fill | only supports compile-time evaluation |
FloorDiv | |
FloorMod | |
Gather | The value of the second parameter must be determined at compile time |
GatherV2 | The value of the second and third parameters must be determined at compile time |
Greater Equal | |
Identity | |
Less | |
LessEqual | |
ListDiff | only supports compile-time evaluation |
Log | |
LogSoftmax | only supports compile-time calculations |
LogicalAnd | |
LogicalNot | |
LogicalOr | |
LogicalXor | |
LoopCond | |
MatMul | The value of the second parameter (weight) must be determined at compile time |
Max | The value of the second parameter must be determined at compile time |
MaxPool | The calculation efficiency is higher when the data format is NCHW stride<=63 |
Maximum | |
Mean | The value of the second parameter must be determined at compile time |
Merge | |
Min | The value of the second parameter must be determined at compile time |
Minimum | |
Mul | |
Neg | |
NextIteration | |
Pack |