Skip to content

Latest commit

 

History

History
216 lines (130 loc) · 10.8 KB

yq-linkvoice.md

File metadata and controls

216 lines (130 loc) · 10.8 KB

Intelligent voice based on AliOS Things

With the progress of AI technology, intelligent voice is no longer a rigid tool, but more like a life assistant, bringing people more convenient and interesting experience. "Help me open the air conditioner.", "Do I need an umbrella tomorrow?", "Where is my express?"... In IoT era, all your needs can be realized by your sentence. The Link Voice SDK integrated in AliOS Things can achieve intelligent voice interaction.

Ali intelligent voice service

Ali intelligent voice service can enable voice interaction and smart home control, and can customize device skills (for example, it can manipulate running machines, massage chairs and other devices by voice). Services provided by us Include :

  • General services: search for songs, columns, radio stations, weather; refer to encyclopedia, four arithmetic operation table, etc.
  • Services provided by Ali: smart home control, recharge mobile phone, Tmall shopping, etc. (You need to connect to our account system, which can refer to SDS access);
  • Private service: equipment control, customer service inquiry, etc. (If customized skills are needed, please provide product requirements when signing the contract).

Functions integration

To use Ali Link Voice service, you need to integrate Alink SDK and Link Voice SDK. Alink SDK provides devices with access to Ali IoT platform, account system, network distribution, OTA, etc. Link Voice SDK provides devices with Ali intelligent voice service. Devices should first integrate Alink SDK to be a device in SDS platform and then integrate Link Voice SDK to use intelligent voice service. Besides using Alink to manage devices in the platform, Link Voice SDK needs modules as listed in the table to complete its work.

As listed below, WebSockets is used to complete voice data interaction; opus is used to transform voice recording from PCM format to opus format (the server can only receive opus format); Cjson is used for JSON analysis; Mbedtls is used to encrypt the connection in underlying layer of Alink and WebSockets, securing data transmission.

依赖组件

AliOS Things has integrated all these modules in the list, so you can directly develop intelligent voice service based on it. Suggestion for MCU capability :

  • Flash>=512KB
  • RAM>=200KB
  • CPU>=180Mhz

Flow chart of an intelligent voice task

It simplifies buffer processing and other details:

img

Preparation for development platform

You need development boards that meet the performance requirements mentioned in the previous section, together with audio recording and playing functions. Also, you need to complete AliOS Tings migration adaptation. This article takes xr871evb as an example (OS matching has been completed). The resources of our platform:

  • CPU : 192Mhz cortext-M4f
  • RAM:448KB(some code related to hardwares need to be operated in ram; about 280KB can actually be used)
  • FLASH:2MB SPI FLASH

Environment setup and code compilation

You should first set up AliOS development environment ( take linux as an example): AliOS-Things-Linux-Environment-Setup

Then download the latest version of AliOS Things source code in github (https://github.com/alibaba/AliOS-Things): also take linux as an example :

`git clone [email protected]:alibaba/AliOS-Things.git`

switch to master branch:

`git checkout master`

We suggest you to add your development branch based on the master one :

`git checkout –b dev-xxx(yourname):`

Till now on, environment has been set up, code has been prepared and you can start compilation and programming test. Example procedure of link-voice test :

`aos make linkvoiceapp@xr871evb xr871=1`

Program your code:

`cd platform/mcu/xr871/tools/`

Modify serial port configuration :

`vim settings.ini`  

Replace the serial poet number with your board's, you can check it in ls /dev/tty*. Reserve and exit. uart settingBurn your code in. You should first turn the switch to ON position as is shown in the picture :flash settingThen execute :

`./phoenixMC_linux`  to start code burning. After burning,  set the switch back to the serial port position and restart.

Open minicom or other serial port supervision device to input information, with baud rate as 115200. Operation example: start network distribution after first power on :

`netmgr connect ssid psswd`

Replace ssid and psswd with the name and password of your wireless network. When connecting to network, the following notice will appear in your terminal :press keyPress button 2 (AK2) to enable speech recognition function. talkingAt this time, speak to the development board, and your voice data will be encoded, uploaded to the cloud, and then corresponding feedback will return no matter the identification has succeeded or failed.

Example: 1: Do I need an umbrella tomorrow? 2. Tell me a ghost story. 3. Recommend me a suspense movie. 4. Open the air conditioner.

Introduction to API

1)Initiate

int pal_init(const struct pal_config *config);

  • Description: SDK initiation. You only need to call it once.
  • Parameters:config transfer necessary parameters to SDK.
  • Returns:0 on success, negative error on failure.

2)Destroy

void pal_destroy();

  • Description : destroy SDK and release the resources.
  • Parameters:no
  • Returns:no

3)Get SDK version number

int pal_version();

  • Description : return SDK version number
  • Parameters : no
  • Returns : SDK version number

4)Set log level

void pal_set_log_level(int level);

  • Description:set SDK log level. In testing stage, you can set it as PAL_LOG_LEVEL_DEBUG to make testing easier. After testing, set it as PAL_LOG_LEVEL_ERROR.
  • Parameters:level is the log level of SDK.
  • Returns:no

5)Environment setup

void pal_set_env(int env);

  • Description:set up SDK environment, and it is PAL_ENV_RELEASE by fault. You can connect to Alibaba online environment in vendors' environment. There is no need for vendors to use this interface.
  • Parameters:env is the environment of SDK
  • Returns : no

6)Vendor player notify SDK of message

int pal_notify_msg(const char *msg);

  • Description:vendor players notify SDK of its status or key events in json format as defined in Link_Voice_SDK_播控协议_v1.0.0.xlsx.
  • Parameters:msg is the message that vendors want to report to SDK
  • Returns:0 on success, negative error on failure.

7)Vendor player post ALink message through SDK

int pal_post_alink_msg(const char *msg);

  • Description : ALINK will be initiated during SDK initiation, and keeps the connection to Alibaba platform. Vendors can post message to ALink through this interface, where SDK will post it to ALink server side.
  • Parameters:msg is the message that vendors want to post to ALink through SDK. The format is the message format defined by ALink.
  • Returns:0 on success, negative error on failure.

8)Start speech recognition

int pal_asr_start();

  • Description : It will be called when speech recognition function is triggered through button or far field awakening.
  • Parameters:no
  • Returns:0 on success, negative error on failure.

9)Send out voice data

int pal_asr_send_buffer(const char *buffer, int buffer_len);

  • Description:This interface should be called when calling of pal_asr_start has been completed. When calling, voice data will be sent. If the data is in PCM format, it should be 640 bytes. If return is PAL_VAD_STATUS_STOP, which means the voice is detected by the cloud to end, vendors can call pal_asr_stop or pal_asr_stop_async to obtain the recognition result.
  • Parameters:buffer is voice data. buffer_len is the length and byte of this voice data.
  • Returns:the VAD status detected by the cloud

10)End voice recognition (synchro interface)

struct pal_rec_result* pal_asr_stop();

  • Description : As for synchro interface, pal_rec_result will return the result of voice recognition. "Status" shows the statues of voice recognition; "should_restore_player_status" shows whether the vendor player recover to its former status when completing voice recognition, where 0 means not recover and 1 means recover; "asr_result" shows the text recognized by ASR; "task_status" shows the status of the task; "PAL_REC_TASK_STATUS_END" shows the end of this task; "PAL_REC_TASK_STATUS_WAITING" shows it's a multi - wheel dialogue.
  • Parameters:no
  • Returns:the result of voice recognition

11)End voice recognition(Asynchronous interface)

void pal_asr_stop_async(pal_asr_callback callback, void *user);

  • Description:Interface for asynchronous return, whose function is similar to pal_asr_stop.
  • Parameters:callback is a registered callback function, which is used to return recognition result; user is user-defined pointer, which will be returned to users in callback.
  • Returns:no

12)Cancel voice recognition

void pal_asr_cancel();

  • Description:Cancel voice recognition
  • Parameters:no

  • Returns:no

13)Destroy the result of voice recognition

void pal_rec_result_destroy(struct pal_rec_result *result);

  • Description:pal_asr_stop and pal_asr_stop_async can return the result of voice recognition. You can use this interface to release resources.
  • Parameters:result is the result that need to be destroyed.
  • Returns:no

14)Transform text to voice (synchronous interface)

struct pal_rec_result* pal_get_tts(const char *text);

  • Description:Transform text to voice. The return result, pal_rec_result, needs to be destroyed by pal_rec_result_destroy.
  • Parameters:text is the text that needs to be transformed
  • Returns:The returned structure. The voice transformed from text is a playable url in tts field.