NPU Compiler Usage*
Before using the NPU compiler gxnpuc
, please read the following two technical documents carefully:
1. Introduction to gxnpuc Toolchain*
gxnpuc is used to compile model files into npu
files that can run on the NPU
.
$ gxnpuc -h
usage: gxnpuc [-h] [-V] [-L] [-v] [-m] [-c CMD [CMD ...]] [-w]
[config_filename]
NPU Compiler
positional arguments:
config_filename config file
optional arguments:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-L, --list list supported ops
-v, --verbose verbosely list the processed ops
-m, --meminfo verbosely list memory info of ops
-c CMD [CMD ...], --cmd CMD [CMD ...]
use command line configuration
-w, --weights print compressed weights(GRUS only)
1.1 Configuration File (config_filename) Explanation*
Configuration Item | Options | Description |
---|---|---|
CORENAME | LEO | Chip model (LEO for 8008/8009) |
PB_FILE | PB file containing weights | |
OUTPUT_FILE | Output filename after compilation | |
NPU_UNIT | NPU32 | NPU model(must be filled with NPU32) |
COMPRESS | true / false | Whether to enable full connection weight compression |
COMPRESS_QUANT_BITS | 8 | The bit number of quantification compression, LEO NRE chip only support 8 bits |
COMPRESS_TYPE | LINEAR / GAUSSIAN | Linear compression or Gaussian compression. Linear compression has higher accuracy, but the compression rate is not as good as Gaussian compression |
OUTPUT_TYPE | c_code | Currently must be filled with c_code |
CONV2D_COMPRESS | true / false | Whether to enable convolution weight compression (default false) |
INPUT_OPS | op_name: [shape] ... | Set the input OP name and shape |
OUTPUT_OPS | [out_op_names, ...] | Set the list of output OP names |
FP16_OUT_OPS | [out_op_names, ...] | The list of OPs that output format is float16, and the ones not in the list output float32 |
FUSE_BN | true / false | Whether to merge BN parameters into convolution (default false) (1.5.2rc6 and above) |
WEIGHT_CACHE_SIZE | When weights need to be separated into SRAM and FLASH, the size of SRAM (1.5.3rc6 and above) |
2. Compiling the Model*
2.1 Preparing the Model File*
- Prepare the PB and CKPT files generated by TensorFlow, or the model file generated by the saved_model method.
- Generate the frozen PB file using TensorFlow's
freeze_graph.py
script.
2.2 Writing the Configuration File*
- Write the yaml configuration file, including the pb file name, output file name, output file type, whether to compress, input node name, dimension information, output node name, etc.
2.3 Compiling to Generate the Model File*
Compile using the following command:
$ gxnpuc config.yaml
Note
The NPU toolchain must be used in an environment with TensorFlow installed. For the format of the generated model file, please refer to: NPU Model Format
3. Model Optimization*
To make the model run more efficiently on the NPU processor, some optimizations need to be done.
- The data format for convolution and downsampling should be in NCHW format. The optimization process can be referred here.
- The dimension information of Placeholder needs to be determined.
- The shape of each OP needs to be determined, i.e., the OP value related to shape needs to be determined.
- It is not recommended to put Softmax in NPU because NPU uses FP16 data format, which can easily cause data overflow.
4.LEO NPU Usage Restrictions:*
- Only supports TensorFlow 1, it is recommended to use TensorFlow 1.13
- Conv2D, Pool, BiasAdd, and other OPs must use NCHW format.
- It is not recommended to use Softmax in the model
- Conv2D, Depthwise Conv2d and other operators will be split into 32 * 32 two-dimensional matrix multiplication and addition for calculation, so it is recommended not to have too many convolution channels, otherwise the generated instructions will be large.
- Other OP restrictions can be viewed using 'gxnpuc -- list - c LEO'.