vsp algorithm transplantation guide*

In order to help algorithm engineers deploy their own algorithms faster, some configurations mentioned in this article are based on the GX8008C_Wukong_Prime development board, referred to as Wukong Development Board. Therefore, if you do not have this development board in hand, please contact our local sales personnel.

1. Development board system block diagram:*

2. example firmware compilation and download*

2.1 example configuration*

You can modify the following configurations according to your needs.

example_lib: an algorithm package that includes TX and RX processing
- TX: a simple multichannel mixing algorithm that mixes the input n-channel microphone data as output
- RX: amplifies the RX audio
8008c_wukong_v1.4_example_lib_16k_2Amic_1Aref_UAC4ch_SPKtxout.config
- Sampling rate: 16k
- UAC 1.0 sound card mode
- UAC upstream output 4-channel data: 1ch-TX_OUT, 2ch-Amic, 1ch-Aref
- AudioOut outputs TX_OUT
8008c_wukong_v1.4_example_lib_16k_2Amic_1Aref_UAC6ch_SPKrxout.config
- Sampling rate: 16k
- UAC 1.0 sound card mode
- UAC upstream output 6-channel data: 1ch-TX_OUT, 1ch-RX_OUT, 2ch-Amic, 1ch-Aref, 1ch-RX_IN
- AudioOut outputs RX_OUT
8008c_wukong_v1.4_example_lib_16k_4Dmic_1Aref_UAC6ch_SPKtxout.config
- Sampling rate: 16k
- UAC 1.0 sound card mode
- UAC upstream output 6-channel data: 1ch-TX_OUT, 4ch-Dmic, 1ch-Aref
- AudioOut outputs TX_OUT
8008c_wukong_v1.4_example_lib_16k_4Dmic_1Aref_UAC8ch_SPKrxout.config
- Sampling rate: 16k
- UAC 1.0 sound card mode
- UAC upstream output 8-channel data: 1ch-TX_OUT, 1ch-RX_OUT, 4ch-Dmic, 1ch-Aref, 1ch-RX_IN
- AudioOut outputs RX_OUT
8008c_wukong_v1.4_example_lib_48k_UAC2ch_SPKrxout2ch.config
- Sampling rate: 48k
- UAC 1.0 sound card mode
- UAC upstream output 2-channel data: 2ch-RX_OUT, UAC downstream dual-channel 48k
- AudioOut outputs RX_OUT

2.2 How to compile:*

After downloading vsp_sdk to the local computer, execute the command in the vsp_sdk directory
```
$ cp configs/example_lib/8008c_wukong_v1.4_example_lib_16k_2Amic_1Aref_UAC4ch_SPKtxout.config .config
$ make menuconfig # Open menuconfig, save and exit
$ make clean;
$ make
```
After compilation is completed, the generated firmware is in the output directory. mcu_nor.bin is the firmware of mcu, dsp.fw is the firmware of dsp, and vsp.bin is the merged firmware of the two parts. ,

2.3 How to burn firmware [two methods]:*

Burn vsp.bin:

$ sudo tools/bootx/bootx -m leo_mini -tu -c "download 0 output/vsp.bin;reboot"

Burn mcu_nor.bin and dsp.fw
```
$ cd tools/bootx
$ ./flash_nor_mini.sh
```
After the download is complete, the PC will recognize the sound card device, and you can record it through audacity (recording tool). Click here for specific recording instructions

3. Configuration of input and output channels*

Execute make menuconfig, enter VSP I/O Buffer settings

Channel settings: Set the channel. Set the corresponding mic and ref channels according to the requirements. The output channel is also configured according to the requirements. If you need uac recording, you must select Interlaced output and check Interlaced OUT Channels, and configure the number of channels you need to record via uac as needed.
Frame settings: Configure the corresponding sampling rate and frame length according to needs
Context settings: The object processed by the dsp algorithm is context, and Frame Number in a Context needs to be configured as needed

4. How to build your own algorithm package*

In the vsp_sdk/dsp/vpa directory are independent algorithm packages. You can select the required algorithm package through menuconfig. In this section, we refer to example_lib to build our own algorithm package

Copy the example_lib algorithm directory and rename it to the algorithm that needs to be transplanted, for example to gx_lib
Modify the contents of vpa.name, Makefile and Kconfig files in the gx_lib directory, mainly to modify some paths and macros
- Replace the contents of vpa.name with
```
config VSP_VPA_GX_LIB
  bool "GX [Library]"
```
- Replace VSP_VPA_EXAMPLE_LIB in Kconfig with VSP_VPA_GX_LIB
- Replace CONFIG_VSP_VPA_EXAMPLE_LIB in Makefile with CONFIG_VSP_VPA_GX_LIB, replace SRC_DIR=vpa/example_lib with SRC_DIR=vpa/gx_lib
After building the algorithm package, execute make menuconfig, select the newly added algorithm gx_lib in Voice Process Algorithm select, and what you compile is your newly added algorithm package

5. Algorithm transplantation*

5.1 Directory introduction*

The entire directory structure of example_lib is as follows. The highlighted parts are the parts that engineers need to focus on.

dsp/vpa/example_lib
├── include # header file directory
│ ├── usr_alg.h # The external header file of the algorithm
│ └── vsp_algorithm.h
├── Kconfig
├──lib
│ └── libusr_alg.a # The library file generated by compiling the algorithm source code
                   # The source code in usr_alg will be compiled first to generate the library file in the current directory.
                   # If it is already a library file, put the library file here directly, ignoring the directory usr_alg
├── Makefile
├── src
│ ├── usr_alg # The source code of the algorithm can be stored here to facilitate unified maintenance and tailoring
│ │ ├── Makefile
│ │ └── usr_alg.c
│ ├── vsp_algorithm.c # Algorithm integration code is placed here for easy unified maintenance
│ └── vsp_process.c
└── vpa.name

5.1 Algorithm initialization*

The code that needs to be initialized by the algorithm is placed in the VspInitialize interface. This interface will be called once after power on.

XIP_TEXT_ATTR int VspInitialize(VSP_CONTEXT_HEADER *context_header)
{
    VspDoExampleInit(context_header);
    return 0;
}

IRAM0_TEXT_ATTR int VspDoExampleInit(VSP_CONTEXT_HEADER *context_header)
{
    /* Algorithm initialization code */
    return 0;
}

5.2 Algorithm processing*

The entry function for algorithm processing is VspProcessActive in vpa_process.c. All algorithms are processed in this function. This function will be called back according to the configured context duration after initialization. For example, if Frame Number in a Context is 2 and Frame settings is 16ms, then VspProcessActive will be called back every 32ms.

5.2.1 TX Processing*

IRAM0_TEXT_ATTR int VspDoTxExampleProc(VSP_CONTEXT *context, int *output_index)
{
    VSP_CONTEXT_HEADER *ctx_header  = context->ctx_header;
    int frame_length               = ctx_header->frame_length * ctx_header->sample_rate / 1000;
    int context_sample_num          = frame_length * ctx_header->frame_num_per_context;

    int ref_num                    = ctx_header->ref_num;
    int mic_num                    = ctx_header->mic_num;
    int channel_num                = mic_num + ref_num;

    short *mic_buffer[ref_num];
    short *ref_buffer[mic_num];

    short out[context_sample_num];

    int i, j;

    for (i = 0; i < mic_num; i++) {
        mic_buffer[i] = VspProcessGetMicFrame(context,i,0); //Get N mic data
    }
    for (i = 0; i < ref_num; i++) {
        ref_buffer[i] = VspProcessGetRefFrame(context,i,0); //Get N ref data
    }

    for (j = 0; j < context_sample_num; j++) { // Organize mic and ref according to the format required by the algorithm api
        for (i = 0; i < mic_num; i++) {
            all_data[i+j*channel_num] = mic_buffer[i][j];
        }
        for (i = 0; i < ref_num; i++) {
            all_data[mic_num+i+j*channel_num] = ref_buffer[i][j];
        }
    }

    // TX data processing. eg:
    MixAudio((short *)all_data, (short *)out, mic_num, ref_num, context_sample_num);

    memcpy(context->out_buffer, out, context_sample_num*sizeof(short)); // Copy algorithm output data to context->out_buffer
    return 0;
}

5.2.2 RX Processing*

IRAM0_TEXT_ATTR int VspDoRxExampleProc(VSP_CONTEXT *context, int *output_index)
{
    VSP_CONTEXT_HEADER *ctx_header  = context->ctx_header;

    if (ctx_header->rx_num == 2) { // Dual channel processing
        VspCopyRxChannelToOut(context, 0, *output_index);
        VspCopyRxChannelToOut(context, 1, *output_index+1);
        // RX data processing. eg:
        VSPDoSetOutputGain(context, *output_index, -3);  // -3 dB gain on RX data
        VSPDoSetOutputGain(context, *output_index+1, -3);// -3 dB gain on RX data

        *output_index += 2;
    }
    else if (ctx_header->rx_num == 1) { // Single channel processing
        VspCopyRxChannelToOut(context, 0, *output_index);
        // RX data processing. eg:
        VSPDoSetOutputGain(context, *output_index, -3);  // -3 dB gain on RX data

        *output_index += 1;
    }

    return 0;
}

6. Algorithm-related `helper` interface*

All related helper interfaces are in vsp_sdk/dsp/vsp/vsp_helper.c.

Function Interface	Usage
int `VspProcessInvalidateMicBuffer`(VSP_CONTEXT *context)	Invalidate cache data based on corresponding `mic_buffer` address and size in `context`
int `VspProcessWritebackMicBuffer`(VSP_CONTEXT *context)	Write back cache data to `sram` based on corresponding `mic_buffer` address and size in `context`
int `VspProcessInvalidateRefBuffer`(VSP_CONTEXT *context)	Invalidate cache data based on corresponding `ref_buffer` address and size in `context`
int `VspProcessWritebackRefBuffer`(VSP_CONTEXT *context)
int `VspProcessInvalidateRxBuffer`(VSP_CONTEXT *context)	Invalidate cache data based on corresponding `rx_buffer` address and size in `context`
short * `VspProcessGetMicFrame`(VSP_CONTEXT *context, unsigned int channel_num, int frame_index)	`Function`: Get start address of a `frame` of mic data from a `mic channel` in the current `context` `channel_num`: `mic` channel number `frame_index`: Frame number in current `context`
short * `VspProcessGetRefFrame`(VSP_CONTEXT *context, unsigned int channel_num, int frame_index)	`Function`: Get start address of a `frame` of ref data from a `ref channel` in the current `context` `channel_num`: `ref` channel number `frame_index`: Frame number in current `context`
short * `VspProcessGetRxFrame`(VSP_CONTEXT *context, unsigned int channel_num, int frame_index)	`Function`: Get start address of a `frame` of ref data from a `ref channel` in the current `context` `channel_num`: `ref` channel number `frame_index`: Frame number in current `context`
short * `VspProcessGetOutFrame`(VSP_CONTEXT *context, unsigned int channel_num, unsigned int frame_index)	`Function`: Get start address of a `frame` of output data from an `output channel` in the current `context` `channel_num`: `output` channel number `frame_index`: Frame number in current `context`
VSP_CONTEXT * `VspProcessGetContext`(const VSP_CONTEXT *context, unsigned int index)	Function: Get address of the `index`th `context` with current `context` as start point

Note

Before VspProcessActive, we need to call VspProcessInvalidateMicBuffer and VspProcessInvalidateRefBuffer interfaces to invalidate the relevant cachedata in the currentdsp, and the mic and ref data obtained by the positive protection algorithm are all valid in real time

Note

When using VspProcessGetRxFrame to get the dual-channel UAC downlink data, the data is stored interleaved.

7. Hashrate view*

We also provide the function of viewing DSP computing power in real time

Execute make menuconfig and enter DSP settings
Enable Enable Process Cycle Statistic and Enable log printing on DSP, the default under Enable log printing on DSP classification is fine, then recompile the firmware, connect to the serial port of dsp, it will print the computing power, more than 100 One hundred percent means that the computing power is too high, and the algorithm needs to be optimized.

Reminder: How to count the computing power of a certain hot function

We provide unsigned xthal_get_ccount(void) to get the value of the current CCOUNT register, and the DSP will automatically add 1 every time it takes a beat. If the frequency of the DSP is 400M, then this register It will automatically add 400M per second. Therefore, we call xthal_get_ccount() before and after the hot function, and then subtract the computing power consumed by the hot function.

8. Memory usage*

8.1 SRAM*

Both 8008/8008C have 1536KB SRAM, the default code runs on SRAM, 1536KB is shared by MCU and DSP, the size of SRAM memory used by DSP can be configured, and the remaining memory is reserved for MCU.
Execute make menuconfig, enter DSP settings, configure (1300) SRAM size kept for DSP(KB)

How to determine MCU memory

MCU will not use dynamic memory. As long as make mcu can compile normally, there will be no problem with MCU memory.

8.2 DRAM0 and IRAM0*

In addition to SRAM, there are also 64k DRAM0 and 64k IRAM0 available on the DSP. By default, SRAM is used. If you need to use this memory, you need to use the following macro

#define IRAM0_TEXT_ATTR __attribute__((section(".iram0.text")))
#define DRAM0_BSS_ATTR __attribute__((section(".dram0.bss")))
#define DRAM0_DATA_ATTR __attribute__((section(".dram0.data")))
#define DRAM0_RODATA_ATTR __attribute__((section(".dram0.rodata")))

DRAM0 put some data, such as

static short all_data[6*FRAME_LEN] DRAM0_DATA_ATTR;

IRAM0 put some code, such as

IRAM0_TEXT_ATTR int VspProcessActive(VSP_CONTEXT *context)

8.3 XIP*

On 8008c (not supported on 8008), you can also use XIP technology, you can put some codes with low execution frequency or some read-only data on the XIP segment, you need to use the following macros

#define XIP_TEXT_ATTR __attribute__((section(".xip.text")))
#define XIP_RODATA_ATTR __attribute__((section(".xip.rodata")))

Execute make menuconfig, enter DSP settings, configure [*] Enable XIP
XIP put some code, such as

XIP_TEXT_ATTR int VspInitialize(VSP_CONTEXT_HEADER *context_header)

XIP put some code, such as

short data[3] XIP_RODATA_ATTR = {1, 2, 3};

See here for more detailed XIP usage

DSP Compilation Toolchain After the installation is completed, open Xplorer and click Help→PDF Documentation to see a lot about HIFI4 documentation. Contains Hifi4 specifications, instruction set instructions, and Xtexsa compilation toolchain instructions.

10. Xtensa IDE Lsp Modification Instructions*

Xtensa+IDE+Lsp+Modification Guide.doc