-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for ESP32-S3-BOX peripherals + voice_assistant #2239
Comments
Thanks for this overview! I have created a component for the touchscreen already in esphome/esphome#4793 which is working fine on my Box and is ready for review. I am currently working on the I2C control component for the ES8311 (no PR yet, trying to figure out the best solution for MCLK). |
Also, the ILI9342C driver requires some additions to allow enabling x-mirroring for the ESP32-S3-BOX. I have started implementing that, a PR will also follow. For now, you can check out my sample config linked in esphome/esphome#4793 (comment) which I will update periodically. |
Awesome, nice progress. There is a very rough implementation for the ES8388 here, not sure how helpful it is as the register map is quite different and MCLK is currently hard-coded. Also, it's worth looking into how willow and the default firmware handle MCLK, I think the codec has a mode which can derive it's LRCLK and BCLK from it's MCLK/SCLK and distribute them to the ADC on the board. That may be required if there is a requirement that MCLK is synchronous to LRCLK and BCLK? I'm not too familiar with I2S or the ESP32/esphome implementation of it. Also, is it worth trying to get I2S support for the esp-idf framework as well, to make hacking in wake word stuff with esp-adf/esp-sr later easier? I haven't looked into this much but maybe not, since I think I saw an arduino framework wrapper for esp-adf somewhere. |
Yeah the ES8311 can theoretically work without a dedicated MCLK by generating it internally from the SCLK, but as the ESP32-S3-BOX has an MCLK wired to GPIO2 anyway, we should figure out the best way to implement that in esphome, I guess. My ES8311 branch is at https://github.com/kroimon/esphome/tree/es8311 if you're interested, but it's still a WIP and a few days away from a proper PR. Also, getting the whole I2S stuff working on esp-idf would be great, because we could probably integrate libraries such as WakeNet much easier. However, I could not even get a very simple esphome config to run on my S3, because it kept resetting due to some watchdog. I did not debug this any further because using esp-idf wasn't of much use without the I2S components anyway. |
Adding MCLK to i2s_audio seems to me like the most straightforward path for that, is there any case where two devices might share an LRCLK and BCLK but have different MCLKs? I simply added MCLK as an optional param for i2s_audio: esphome/esphome@dev...rpatel3001:esphome:add_i2s_mclk |
Also I forked your box.yaml gist to add the RGB LED that comes with the kit, invert the sense of the settings button, and add my MCLK change and your ES8311 components |
i've successfully gotten home assistant to stream TTS and radio audio to the ESP-BOX using the config in my gist. the volume is quite low, though I expect your work on the codec interface will help with that. |
I probably won't have time to look into it before Friday afternoon, but that already sounds awesome! |
started some ADC code at https://github.com/rpatel3001/esphome/tree/es7210 this I2S stuff make very little sense to me right now, the frequencies I measure on the pins are not at all what it looks like is configured by the i2s components. It's difficult to debug the ADC without access to the raw audio, trying to send it to the home assistant pipeline with whisper actually causes an error in whisper so it's clearly doing something wrong. also the ADC datasheet is terrible, I could only find a register map on some sketchy chinese site by googling and it's version 2.0, compared to the most recent version 23 (without registers). |
dumping some thoughts here stream of consciousness style: I think ideally i2s_audio would have options for mclk frequency and sample rate and that would be pulled into i2s_audio_media_player and i2s_audio_microphone to setup the i2s peripheral in the same way the pin numbers are currently pulled in. the DAC and ADC I2C components would need options as well to setup the chips with the correct options based on the clock settings. it's unclear to me why i2s_audio_microphone and i2s_audio_speaker are calling esp-idf i2s functions but i2s_audio_media_player is not. The media player library is handling it internally? how do these two components work together? |
The media player library is a little difficult that way. It's kind of a black box right now. We tell it what to play and it just does it. |
This is because the Audio library handles the streaming, decoding and playing to i2s. It's not the best solution, but it was the easiest at the time given the timeframe I had. The weird thing is the library actually supports calling a function to give the i2s data to and not send it out, but it still requires to set up the i2s peripheral itself 🤦 |
I mean, we could probably make changes to the Audio library, as it is already a modified fork. The question is how close to upstream you want it to be. |
I spent some more time learning the inner workings of I2S and how the different components use it right now. The following is a list of findings and 'challenges' I ran into: The main issue we have is that there is currently no central instance that controls the parameters of the I2S bus. This makes it very hard to implement external ADCs and DACs whose configuration depend on the current clock speeds and sampling rates. Those audio codec components need a central instance to register for configuration change events so the new settings can be forwarded to the external controllers. With the current architecture, there is also no way for full-duplex operation of the same I2S port. The In general, full-duplex operation can only work if both input and output use the same clock parameters. The microphone and speaker components currently use fixed 16000 Hz sampling rates at 16 bits per sample. The media player switches the sampling rates based on the currently played files/streams. ESP-IDF 5.0 introduced the concept of 'channels' in the new i2s driver which would make full-duplex operation a somewhat easier task. (For reference, the latest currently available version of arduino-esp32 2.0.9 is based on ESP-IDF 4.4.4). In summary, I think we need a major refactoring of the
|
See how many ideas are outthere for media player in ESPHome: |
@rpatel3001 I found the full datasheet for the ES7210 here (Backup). I continued a bit on your work over in my branch, mostly formatting and cleanup for now. |
I made the "mistake" of trying to save some bucks and bought the ESP32-S3-Box-Lite instead of the full one. That one does not have touchscreen, but three additional buttons, and it has (apparently) an ST7789v display instead of an ILI9342C one. For some (to me yet unexplained) reason, I can show things on the display by using the ILI9342C configuration from @kroimon (https://gist.github.com/kroimon/f6692879f9c00702990801ae9dfa433b); it just doesn't need the mirroring, but the colors are somehow offset (e.g. Red is (255, 255,0), Green is (255, 0, 255) and Blue is (0, 255, 255); while White and Black would be the expected colors). I haven't managed to show anything useful using the standard st7789 component. Does anyone have an idea why this would be? Is it worth it, to track the S3-Box-Lite support here as well, or would it be better to create a separate Feature Request? (Since most of the components would be the same anyway). |
Seems like the peripherals of the ESP32-S3-Korvo-1 are really similar to ESP32-S3-BOX as well. One main difference is the ES7210 is on a different I2S bus from the ES8311. I have an ESP32-S3-Korvo-1 running this config and LED ring and buttons are working, audio not working at all yet so I'm not sure I have the two I2S buses configured correctly or maybe two Waiting on an ESP32-S3-BOX to be able to do more testing, but the Korvo is currently in stock on Amazon for 50USD if anyone else is curious about it. |
@guillempages I can add the Lite's display to the top post, but can't promise anyone will work on it as I don't have a Lite to play with. You'll probably get more visibility/help by creating a bug report for the st7789 component. @mattkasa I think two I2S buses ought to work, but not totally sure. Does the codec work by itself if you comment out the ADC config? The current tip of the ES8311 PR sets the volume to 0, try an earlier commit or my es8311 branch for now. |
@rpatel3001 I'm testing like this, but I have no idea how
Not getting any audible sound, but logs look like:
So I wonder if it's just my |
hm, I can't say about speaker.play, I've been using home assistant to send audio to the media_player component. Do you at least get clicks when the PA is muted/unmuted? Maybe try the media player component also, the I2S code is different. |
Ah yeah, I'm using the edit: I tried building with arduino to test with |
I did some testing with i2s_audio_speaker and it seems to be partially working (on Arduino). With a much longer data vector (8k samples = half a second, a full second crashed the board when played) I mostly just hear clicks but occasionally the tone plays for a fraction of the duration. Interestingly the tone is twice the frequency it should be, which is maybe a clue about what's wrong. I also tried compiling a barebones config for esp-idf but it bootloops. Fixed the bootloop with
but then it just hangs after booting. Haven't found a fix for that, it does this even with the most recent esp-idf version/platform_version. |
@rpatel3001 for esp-idf try: esp32:
board: esp32s3box
framework:
type: esp-idf
variant: ESP32S3 I was able to get arduino working on the Korvo with this: esp32:
board: esp32-s3-devkitc-1
variant: esp32s3
framework:
type: arduino And
|
Adding the variant and/or changing the board didn't change anything unfortunately. |
huh it was resolved? |
@rpatel3001 I use HA OS, so I have Home Assistant running in a VM on my hypervisor. I can send audio to the ESP32 box (cloud TTS works, as you pointed out, Piper does not work yet, plus it's about 5x as slow anyway). If my HA instance is accessible over the internet, is there a port I need to forward / hard-code somewhere to get it to work? |
Just to chime in here. I have been working on getting the s3-box working as a Voice Assistant for Home Assistant. |
@jesserockz Great work, seems to work well in VAD mode (I'm assuming remote wake word needs a dev build of HA?). I tried to get the tt21100 touchscreen working at the same time but it seems there's a conflict between the i2c component and esp-adf where the microphone or speaker don't provide/produce any audio:
|
@llamaonaskateboard I managed to get the wake word simply with the openwakeword addon installation (https://github.com/rhasspy/hassio-addons) without any issues.. |
hm... so I have to decide now between using the Wakeword - or the Display... |
I seem to have the same (BLE proxy + VAD/wakeword stuff coexistence) problem on an M5Stack echo. I wanted to make a super-duper sensor controllerout of it, I have it running fine as a BLE-Proxy + mmwave sensor node. But when extra adding the VAD/wakeword stuff (and avoid the switch behaviour clash), the sensor either doesn't do anything or goes haywire, needing physical reboot. Sad ... :( |
Adding in my experience here as well... I'm using (nearly verbatim) the example provided by ESPHome to get the ESP32-S3-Box working as a voice assistant. Here's what works as of Nov 2023:
Here's what I'm struggling to get working:
|
I have all of the same results as you. I get
Used standard config as linked by HA docs (+ my wifi info): https://github.com/esphome/firmware/blob/main/voice-assistant/esp32-s3-box.yaml Would be curious if others have the same issue? |
Once all features are implemented will this also work with the newest model: ESP32-S3-Box3? https://github.com/espressif/esp-box/blob/master/docs/hardware_overview/esp32_s3_box_3/hardware_overview_for_box_3.md |
I have the new ESP32 S3 Box 3, currently have Willow installed but would much rather use ESPhome. Id be happy to try a build on it and provide feedback if it helps. |
@rpatel3001 I've been testing your es7210 component on a Lilygo T-Embed ESP32-S3; it's been working well for me (I've had some issues with other aspects of the T-Embed, but mic input via the ES7210 has been working without issue. Would you mind putting up a PR for the es7210 component? It'd be nice to have it in ESPHome rather than having to use your fork as an external component. |
@pauln I think the ES8311 and ES7210 components are somewhat redundant now that esp-adf is being integrated. The ES8311 PR went stale and was closed. Your best bet is probably to make a PR adding the Lyra to the esp-adf supported boards after testing that it works. In other news I was unable to get the esp-adf example yaml to work, it complained at runtime about needing a patch to be applied to some freertos files. The ES8311 and ES7210 external components worked fine however, with both the arduino and esp-idf frameworks. I anticipate this is an issue with my local install since others have reported it working. |
Interesting. My box device works incredible with that voice assistant example. Well, excluding that fact that there's not much of head space processor- and memory-wise. Audio responses are somewhat cut out, and there's no way to use it for announcements because of lack of media_player component for adf - but it's still useable. I put clock instead of static icon to idle state, and focused on Assist capabilities themselves so far. |
@rpatel3001 I've been keeping half an eye on the esp-adf PR, but it seems to be taking its time landing. If it'll be usable for all boards with these audio devices, having one component that handles them all rather than one per audio device does sound a bit cleaner - but my understanding is that it (currently, at least) uses board configs from esp-adf, of which there isn't one for the T-Embed. None of the various Lyra configs (nor any of the others) seem to be particularly close matches, either - was there a specific one you thought might be suitable? |
hm no there wasn't I didn't realize esp-adf had a specific subset of supported boards. The ES7210 and ES8311 PRs in this issue basically reimplement the drivers in esp-adf. There's probably a way to directly instantiate the esp-adf one instead if you're inclined to figure out how and make your own PR. |
Curious if you could share ypur config to get the clock displayed instead of static icon. |
Created fork and added my changes. See THIS commit, but first read following:
Cheers! |
@formatBCE thanks a ton! I was able to modify the existing default firmware by extending some of the properties so i didnt have to fork the repo.
|
Are you keeping an eye on all the related experimental components in the esphome organization's new voice-kit repository? https://github.com/esphome/voice-kit https://github.com/esphome/voice-kit/tree/dev/esphome/components |
Is there currently a way to use the ESP32-S3-BOX-3B as a media player and voice assistant? |
Yes, using either work-in-progress Voice Kit components, or @gnumpi's add stack. |
Is there already a yaml I can add to the voice box template as I can't get it to work. |
There is this custom one https://github.com/BigBobbas/ESP32-S3-Box3-Custom-ESPHome by @BigBobbas which includes media_player + voice assistant + customisable touchscreen controls. |
Wow trying it out now. |
Describe the problem you have/What new integration you would like
Main features: support for peripherals on the ESP32-S3-BOX dev kit:
ICM-42607-P IMU(I'm going to ignore this, I don't think most people have any use for it)To get voice_assistant working:
Architectural changes to support wakeword and esp-idf framework (probably out of scope here and will be transferred to a new issue or 3 once the S3-BOX works for on-demand voice commands):
Please describe your use case for this integration and alternatives you've tried:
Use the peripherals on the board. Working on-demand voice_assistant.
Additional context
This device has recently had a bit of attention due to posts about Willow on hackernews and elsewhere. Willow is fantastic but I'd like to be able to use the full extent of existing esphome components, and I bet others would also. Adding hardware peripherals is the smallest part of this, wake word detection is the major missing feature missing to make esphome a viable alternative (out of scope for this feature request though).
Reference links:
https://github.com/espressif/esp-box
https://github.com/toverainc/willow
https://github.com/hugobloem/esp-ha-speech
espressif/esp-dev-kits#24 (comment)
https://components.espressif.com/components/espressif/es8311
https://components.espressif.com/components/espressif/es7210
https://github.com/espressif/esp-bsp/
https://github.com/espressif/esp-adf/
The text was updated successfully, but these errors were encountered: