NPU Compiler Usage*

Before using the NPU compiler gxnpuc, please read the following two technical documents carefully:

1. Introduction to gxnpuc Toolchain*

gxnpuc is used to compile model files into npu files that can run on the NPU.

$ gxnpuc -h
usage: gxnpuc [-h] [-V] [-L] [-v] [-m] [-c CMD [CMD ...]] [-w]
                [config_filename]

NPU Compiler

positional arguments:
    config_filename       config file

optional arguments:
    -h, --help            show this help message and exit
    -V, --version         show program's version number and exit
    -L, --list            list supported ops
    -v, --verbose         verbosely list the processed ops
    -m, --meminfo         verbosely list memory info of ops
    -c CMD [CMD ...], --cmd CMD [CMD ...]
                        use command line configuration
    -w, --weights         print compressed weights(GRUS only)

1.1 Configuration File (config_filename) Explanation*

Configuration Item	Options	Description
CORENAME	LEO	Chip model (LEO for 8008/8009)
PB_FILE		PB file containing weights
OUTPUT_FILE		Output filename after compilation
NPU_UNIT	NPU32	NPU model(must be filled with NPU32)
COMPRESS	true / false	Whether to enable full connection weight compression
COMPRESS_QUANT_BITS	8	The bit number of quantification compression, LEO NRE chip only support 8 bits
COMPRESS_TYPE	LINEAR / GAUSSIAN	Linear compression or Gaussian compression. Linear compression has higher accuracy, but the compression rate is not as good as Gaussian compression
OUTPUT_TYPE	c_code	Currently must be filled with c_code
CONV2D_COMPRESS	true / false	Whether to enable convolution weight compression (default false)
INPUT_OPS	op_name: [shape] ...	Set the input OP name and shape
OUTPUT_OPS	[out_op_names, ...]	Set the list of output OP names
FP16_OUT_OPS	[out_op_names, ...]	The list of OPs that output format is float16, and the ones not in the list output float32
FUSE_BN	true / false	Whether to merge BN parameters into convolution (default false) (1.5.2rc6 and above)
WEIGHT_CACHE_SIZE		When weights need to be separated into SRAM and FLASH, the size of SRAM (1.5.3rc6 and above)

2. Compiling the Model*

2.1 Preparing the Model File*

Prepare the PB and CKPT files generated by TensorFlow, or the model file generated by the saved_model method.
Generate the frozen PB file using TensorFlow's freeze_graph.py script.

2.2 Writing the Configuration File*

Write the yaml configuration file, including the pb file name, output file name, output file type, whether to compress, input node name, dimension information, output node name, etc.

2.3 Compiling to Generate the Model File*

Compile using the following command:

$ gxnpuc config.yaml

Note

The NPU toolchain must be used in an environment with TensorFlow installed. For the format of the generated model file, please refer to: NPU Model Format

3. Model Optimization*

To make the model run more efficiently on the NPU processor, some optimizations need to be done.

The data format for convolution and downsampling should be in NCHW format. The optimization process can be referred here.
The dimension information of Placeholder needs to be determined.
The shape of each OP needs to be determined, i.e., the OP value related to shape needs to be determined.
It is not recommended to put Softmax in NPU because NPU uses FP16 data format, which can easily cause data overflow.

4.LEO NPU Usage Restrictions：*

Only supports TensorFlow 1, it is recommended to use TensorFlow 1.13
Conv2D, Pool, BiasAdd, and other OPs must use NCHW format.
It is not recommended to use Softmax in the model
Conv2D, Depthwise Conv2d and other operators will be split into 32 * 32 two-dimensional matrix multiplication and addition for calculation, so it is recommended not to have too many convolution channels, otherwise the generated instructions will be large.
Other OP restrictions can be viewed using 'gxnpuc -- list - c LEO'.