NPU model deployment Guide*

This paper mainly explains how the role of software engineer deploys the model trained by the role of algorithm engineer on the SDK. Suitable for third-party algorithm companies to read.

Role of algorithm engineer: Training model
Software Engineer role: Deployment model

warn

In general, the company's algorithm engineers train, software engineers deploy, and there are also great engineers who train and deploy.

1. NPU model compilation [The algorithm engineer role needs attention]*

Read NPU compiler uses compile the NPU model and publish it to the software engineer role .

2. NPU model format description [Software engineer roles to focus on]*

After the software engineer gets the NPU model , he needs to understand format description of NPU model.

3. A guide to model package generation*

We provide a automated tool to generate model packages. The path is lvp_tws/tools/auto_model/. The instructions are as follows:

3.1 Understand automated tools*

This tool is used to assist in deploying kws models to LVP projects
The following files are required to use this tool:
- NPU model file: for example, vp_tws/tools/auto_model/example_npu_model.c:
  - Published by an algorithm engineer
- Automatic deployment profile:
  - The default path is config.json. You can specify another path by using the -c parameter.
  - The file is a json file in the following format:
```
{
    "model_name": "user_model",
    "version": "v0.1.0",
    "source_path": "./example_npu_model.c",
    "support_SoftMax": "Y",
    "normal_ctc": "Y",
    "kws_list_path": "./kws.txt",
    "new_model": "Y",
    "input_stride": 4,
    "decoder": "user"
}
```
  - model_name: model deployment name, as required by the model application. If it is an iteration of the deployed model, keep the names consistent with the existing models. Use underscores to split if necessary. For example, user_model.
  - version: model release version, which can be obtained from the model release information. The value is in the format of three dots starting with the letter v , for example, v0.1.0.
  - source_path: path of the model file, relative path in the current directory, absolute path must be used for non-current paths.
  - support_SoftMax: specifies whether the model supports SoftMax. The value can be obtained from the model release information. If yes, set this parameter to Y; otherwise, set this parameter to N. Note: In the absence of additional instructions, set to Y.
  - normal_ctc: specifies whether to use normal_ctc. Set this parameter to Y if required and N if not. Note: In the absence of additional instructions, set to Y.
  - kws_list_path: keyword list file path.
  - new_model: Set to Y when the new model is deployed and N when the old model is iteratively deployed. Because some files did not need to be generated during iterations of the old model.
  - input_stride: model input step, obtained from the model release information. This parameter corresponds to the "PCM Frame Number in a Context" option in the compilation configuration .
  - decoder: user represents the third-party client algorithm model.
    
    notice
    
    The red part is mandatory. The third party algorithm company does not need to care about other fields.

3.2 Generate model packages using automated tools*

After configuring the configuration file as in the previous section, run the auto_model to generate the model package
```
$ ./auto_model
```
deployment
- Two folders lvp/ and simulate/ are generated in./output after successful program execution
- lvp/ and simulate/ are used to model deployments of lvp_tws/ and vpa_simulate/, respectively
- Take lvp/ as an example
  - You can see the subfolder named model_name
  - After entering model_name there is a subfolder named version
  - If a new model is deployed, there are three other files: kws_list.h[third party customer algorithm model does not have this file], kws.name, kws_version.list.
  - For a new model deployment, simply copy the folder named model_name to lvp_tws/lvp/vui/kws/models
  - For older model iterations, simply copy the folder named version to lvp_tws/lvp/vui/kws/models/
- To simulate/deployment path LVP/vpa_simulate/vpa/LVP/SRC/vui/KWS/models/

3.3 menuconfig Selects the model package*

make menuconfig → VUI Settings as shown below:

notice

If vpa_simulate then make menuconfig-- > LVP Settings → VUI Settings.

Keyword Deocder Type Must choose User Deocder
KeyWord select Select the model package you want, as in this article user model

important

If your Model is not an rnn Network, do not check the Model Use Recurrent Neural Network in the figure above

4. Directory structure analysis of model packages*

This section explains the directory structure after the A guide to model package generation and the parsing of the contained files:

4.1 Directory structure*

lvp_tws/lvp/vui/kws/models/user_model
$ tree
.
├── kws.name
├── kws_version.list
└── v0.1.0
    ├── Kconfig # Model package parameters such as snpu buffer size, the number of input frames of the model and so on
    ├── Makefile
    ├── ctc_model.c # The api interface of the model package, such as model initialization, model version, model input dimension, output dimension, and so on
    ├── ctc_model.h
    ├── kws_version.name
    └── model.h # Model file, more information than example_npu_model.c, you can compare the differences

4.2 Kconfig parsing of model packages*

lvp_tws/lvp/vui/kws/models/user_model
menu "Model Param Setting:"
    depends on LVP_KWS_USER_MODEL_V0DOT1DOT0_2022_0331
    config NORMAL_CTC_SCORE
      default y

    config KWS_MODEL_SUPPORT_SOFTMAX
      default y
    # Byte
    config KWS_SNPU_BUFFER_SIZE
        default 2996

    # Frames
    config KWS_MODEL_FEATURES_DIM_PER_FRAME
        default 40

    config KWS_MODEL_INPUT_STRIDE_LENGTH
        default 4

    config KWS_MODEL_INPUT_WIN_LENGTH
        default 15

    config KWS_MODEL_OUTPUT_LENGTH
        default 65

    config KWS_MODEL_DECODER_STRIDE_LENGTH
        int "KWS Lantency (unit of Context)"
        default 1
        range 1 4

    config KWS_MODEL_DECODER_WIN_LENGTH
        int "KWS Model Decoder Window Length (unit of context)"
        default 25
endmenu

KWS_SNPU_BUFFER_SIZE: Equal to the size of the in_out structure in model.h. The snpu_buffer size of each context is equal to KWS_SNPU_BUFFER_SIZE. For details, see lvp_tws/include/lvp_buffer.h.
KWS_MODEL_FEATURES_DIM_PER_FRAME: Is equal to the last dimension size of model.h npu_data_t Feats[1][15][40].
KWS_MODEL_INPUT_STRIDE_LENGTH: The number of frames updated per run of the NPU model is equal to the input_stride in config.json used by the automated deployment tool.
KWS_MODEL_INPUT_WIN_LENGTH: Total frame input of the NPU model.
KWS_MODEL_OUTPUT_LENGTH: NPU model output dimensions.

important

I/O Buffer Settings → ()PCM Frame Number in a Context must be the same as KWS_MODEL_INPUT_STRIDE_LENGTH This parameter must be consistent

4.3 Analysis of ctc_model.c and Model.h of model package*

The API interface of ctc_model.c is described. Please read the format description of the NPU model before.

API interface	API interface description
int LvpModelGetOpsSize(void)	Return model op size
int LvpModelGetDataSize(void)	Return model data size
int LvpModelGetTmpSize(void)	Return model tmp size
int LvpModelGetCmdSize(void)	Return model cmd size
int LvpModelGetWeightSize(void)	Return model weight size
void LvpSetSnpuTask(GX_SNPU_TASK* snpu_task)	parameter: Input: snpu_task(snpu task structure: contains parameter information for the model) Output: None Function: To be initialized snpu_task Copy to the model package globally s_snpu_task
int LvpCTCModelInitSnpuTask(GX_SNPU_TASK *snpu_task)	parameter: Input: snpu_task(OUT) Output: 0 indicate success, -1 Indicate failure Function: Will be global s_snpu_task Assign a value to the output parameter snpu_task
const char *LvpCTCModelGetKwsVersion(void)	Returns the version of the model in model.h Medium definition
void LvpCTCModelGetSnpuOutBuffer(void snpu_buffer)	return snpu_buffer The last layer outputs the address
void LvpCTCModelGetSnpuFeatsBuffer(void snpu_buffer)	return snpu_buffer Input address
void LvpCTCModelGetSnpuStateBuffer(void snpu_buffer)	return snpu_buffer Status address
unsigned int LvpCTCModelGetSnpuFeatsDim(void)	Return model feature input(feats) dimension
unsigned int LvpCTCModelGetSnpuStateDim(void)	Return model state dimension

5. Third party model package decoder*

GX8002 chip SDK framework also provides a porting interface of the decoder of the third-party model package, which is named LvpDoUserDecoder. Engineers only need to complete the porting of the decoder based on this interface.

lvp/vui/kws/user_decoder.c
int LvpDoUserDecoder(LVP_CONTEXT *context)
{
#ifdef CONFIG_LVP_ENABLE_KEYWORD_RECOGNITION
    gx_dcache_invalid_range((uint32_t *)context->snpu_buffer, context->ctx_header->snpu_buffer_size);
    float *output = (float *)LvpCTCModelGetSnpuOutBuffer(context->snpu_buffer);
    printf("moudel output: %f , %f\n", output[0], output[1]);
    /*
    todo: decode model output
     */
#endif
    return 0;
}

output It's actually the output of the model
Once determined to be active, the interface only needs to assign the id of the activation word to context->kws

6. Sample steps for model package deployment*

6.1 Generate model package*

By default, the lvp_tws/tools/auto_model contains a NPU model file published by the algorithm engineer: example_npu_model.c, See generate model packages with automated tools to generate model packages.

6.2 vpa_simulate the simulator*

6.2.1 Copy model*

Will generate a good run on vpa_simulate simulator model package (lvp_tws/tools/auto_model/output/simulate/user_model) copy Vpa_simulate/vpa/LVP/SRC/vui/KWS/models/ directory.

6.2.2 Select model package*

This example has two methods:
- Method 1: Read menuconfig 选择模型包 Select a user_model model package.
- Method 2: Use vpa_simulate A configuration file is used by default( vpa_simulate/vpa/lvp/configs/user_model_decoder.config )

6.2.3 Compile and run*

$ make menuconfig # 退出保存
$ make
$ ./output/vpa_main -f audio/tmjl.wav
audio_data/tmjl.wav
[2022-03-31 20:16:05][WAV]riff_id:          RIFF
[2022-03-31 20:16:05][WAV]riff_size:        266276
[2022-03-31 20:16:05][WAV]riff_format:          WAVE
[2022-03-31 20:16:05][WAV]format_id:        fmt
[2022-03-31 20:16:05][WAV]format_size:          16
[2022-03-31 20:16:05][WAV]compression_code:     1
[2022-03-31 20:16:05][WAV]channels:         1
[2022-03-31 20:16:05][WAV]sample_rate:          16000
[2022-03-31 20:16:05][WAV]average_bytes_per_second:     32000
[2022-03-31 20:16:05][WAV]block_align:      2
[2022-03-31 20:16:05][WAV]bits_per_sample:      16
[2022-03-31 20:16:05][WAV]data_id:          data
[2022-03-31 20:16:05][WAV]data_size:        266240
[2022-03-31 20:16:05]
[2022-03-31 20:16:05]wav_duration:8320 ms, duration:8320 ms
[2022-03-31 20:16:05][LVP_TWS]Kws Version: [user_model_v0.1.0_2022_0331]
[2022-03-31 20:16:05]Enter: ThreadRunAudioInRecoderSim, 60000
[2022-03-31 20:16:05]Enter: ThreadRunLvpSim
[2022-03-31 20:16:05][LVP_AUD]vad param 3 [0]
[2022-03-31 20:16:05][LVP_TWS]Ctx:0, Vad:1, Ns:0
moudel output: 0.000000 , 0.000157
moudel output: 0.000000 , 0.000000
moudel output: 0.000000 , 0.011093
moudel output: 0.000068 , 0.003191
moudel output: 0.000337 , 0.033539
moudel output: 0.000000 , 0.000000
moudel output: 0.000000 , 0.001714
moudel output: 0.000000 , 0.000815
moudel output: 0.000794 , 0.002771
moudel output: 0.000000 , 0.000000
moudel output: 0.000000 , 0.000000
moudel output: 0.000000 , 0.000000
moudel output: 0.000000 , 0.000000
moudel output: 0.000000 , 0.000000
moudel output: 0.000000 , 0.000114
[2022-03-31 20:16:06][LVP_TWS]Ctx:15, Vad:1, Ns:0

6.3 lvp_tws scheme*

6.3.1 Copy model*

Copy the generated model package (lvp_tws/tools/auto_model/output/lvp/user_model) to lvp_tws/lvp/vui/kws/models/

6.3.2 Select model package*

This example has two methods:
- Method 1: Read menuconfig 选择模型包 Select a user_model model package.
- Method 2: Using lvp_tws with a configuration file by default (lvp_tws/configs/grus_user_model_decoder.config)

6.3.3 Compile, burn and run*

Compile:
```
$ make menuconfig # Exit save
$ make
```
Burn: Please read Serial port upgrade

Run:

Serial port printing

[LVP]Low-Power Voice Preprocess
[LVP]Copyright (C) 2001-2020 NationalChip Co., Ltd
[LVP]ALL RIGHTS RESERVED!
[LVP]Board Model: [grus_gx8002b_dev_1v]
[LVP]MCU Version: [bcd1a41]
[LVP]Release Ver: [0x42555858]
[LVP]Build Date : [2022-03-31, 20:21:35]
[LVP]Flash vendor:[PUYA]
[LVP]Flash type:  [p25q40l]
[LVP]Flash ID:    [0x856013]
[LVP]Flash size:  [520192 Byte]
[LVP]CPU   Freq:  [8192000 Hz][fix]
[LVP]SRAM  Freq:  [8192000 Hz]
[LVP]NPU   Freq:  [8192000 Hz]
[LVP]FLASH Freq:  [24576000 Hz]
[LVP]Ldo   Trim:  [924 mV]
[LVP_KWS ]Kws Use:237 ms
[AB]amic          [1 Channel]
[AB]amic pga_gain:[24 dB]
[AB]amic Ain_gain:[0 dB]
[LVP_AUD]vad param 3 [0]
[LVP_TWS]Ctx:0, Vad:1, Ns:0, R:0
moudel output: 0.000063 , 0.000669
moudel output: 0.000000 , 0.000540
moudel output: 0.000000 , 0.001191
moudel output: 0.000000 , 0.000000
moudel output: 0.000000 , 0.000523
moudel output: 0.000000 , 0.000000
moudel output: 0.000000 , 0.000299
moudel output: 0.000000 , 0.003605
moudel output: 0.000000 , 0.001276
moudel output: 0.000000 , 0.013138
moudel output: 0.000000 , 0.000000
moudel output: 0.000000 , 0.000138
moudel output: 0.000126 , 0.001716
moudel output: 0.000000 , 0.000000