Configuring ReSpeaker XVF3800 as Voice Assistant with ESPHome

5
(1)

This tutorial shows how to configure Seeed Studio’s reSpeaker XMOS XVF3800—featuring the XIAO ESP32-S3 microcontroller and XMOS XVF3800 voice processor—as a voice assistant in Home Assistant using ESPHome.
Tested on a Raspberry Pi 4 Model B running Home Assistant OS 2025.12.3, this setup uses the open-source Vosk toolkit for offline voice-to-text functionality.

XMOS XVF3800 voice processor

The XMOS XVF3800 features cutting-edge algorithms like multi-adaptive beamforming, acoustic echo cancellation (AEC), de-reverberation, noise suppression, direction-of-arrival (DoA), and voice activity detection (VAD) for clear voice pickup in noisy environments.

ESP32-S3 microcontroller

The onboard XIAO ESP32-S3 delivers native Wi-Fi and Bluetooth connectivity, enabling seamless Home Assistant integration via ESPHome for voice-controlled smart home projects.

4-Mic array design

The reSpeaker integrates 4 MEMS microphones using Pulse-Density Modulation (PDM) signaling. PDM delivers high signal-to-noise ratio, exceptional sensitivity, low power consumption, robust resistance to RF interference, and a smooth frequency response.

AC3104 Low-Power audio codec

The TLV320AIC3104 is a low-power stereo audio codec from Texas Instruments featuring integrated DAC (Digital-to-Analog Converter), ADC (Analog-to-Digital Converter), Class-D amplifiers, and DSP processing.

The codec receives processed audio streams from XVF3800, performs final analog conversion/amplification, and drives the speaker.

I²C and I²S

Though I²C and I²S have similar names and are both serial protocols, they serve distinct purposes with different characteristics.

I²C (Inter-Integrated Circuit) is a multi-device serial protocol for intra-board communication between microcontrollers and peripherals like sensors.

I²S (Inter-Integrated Circuit Sound) is a point-to-point serial protocol for transmitting two-channel, uncompressed digital audio between components like DACs, ADCs, or processors.

ESP32-S3 Pinout

On the ESP32-S3, the following pins are configured for I²C bus:

  • I²C SDA: GPIO5
  • I²C SCL: GPIO6

On the ESP32-S3, the following pins are configured for I²S bus:

  • I²S LRCLK/WS (Left/Right Clock or Word Select) → GPIO7
  • I²S BCLK/SK (Bit Clock or Serial Clock) → GPIO8
  • I²S MCLK (Master Clock) → GPIO9
  • I²S DIN (Serial Data In) → GPIO43
  • I²S DOUT (Serial Data Out) → GPIO44
Pinout of ESP32-S3 on the ReSpeaker XVF3800

ReSpeaker schematic

XIAO ESP32-S3 connects to both the XVF3800 voice processor and TLV320AIC3104 codec via I²C for configuration and control.

  • ESP32-S3 ↔ I²C : XVF3800 (configuration)
  • ESP32-S3 ↔ I²C : AIC3104 (configuration)

XVF3800 connects to both ESP32-S3 and AIC3104 for digital audio streaming.

  • ESP32-S3 ↔ I²S : XVF3800 (bi-directional audio)
  • XVF3800 → I²S : AIC3104 (playback stream)

The 12 RGB LEDs are controlled by the XVF3800, and the ESP32‑S3 configures them via I²C commands sent to the XVF3800.

The 4 MEMS microphones transmit audio to the XVF3800 using pulse‑density modulation (PDM).

The mute button is directly connected to the XVF3800 GPI (General-Purpose Input) pin.

Both the ESP32-S3 and XVF3800 feature integrated USB connectivity.

ReSpeaker XVF3800 schematic

XVF3800 firmware update

The XVF3800 mic array needs a 12.288 MHz Master Clock (MCLK) to work, but ESPHome (used in Home Assistant) can’t generate it due to API limits. An alternative is to modify XVF3800 firmware to make it act as the I²S master. So it can generate its own clocks without needing Master Clock from the ESP32. This is the purpose of the firmware provided by Seeed Studio, so the mic works properly with Home Assistant.

To upload XVF3800 firmware to the ReSpeaker, connect it to your computer using a USB cable plugged into the USB port opposite to ESP32-S3.

USB connection for XVF3800 firmware update

Put the ReSpeaker in safe mode :

  • Power off the device completely.
  • Press and hold the Mute button.
  • While holding the mute button, reconnect the power.

The red LED will start blinking — this confirms the device is now in Safe Mode. You can now follow the procedure described on Seeed Studio website.

Under Linux, the procedure is straightforward:

Bash
sudo apt install dfu-util
sudo dfu-util -l
sudo dfu-util -R -e -a 1 -D /path-to-dfu-firmware.bin

Integration into ESPHome

The first step is to integrate the ReSpeaker into ESPHome and to manually upload the initial ESPHome firmware into its ESP32-S3.

To upload ESPHome firmware to the ReSpeaker, connect it to your computer using a USB cable plugged into the USB port on the ESP32-S3 module.

Add Vosk speech regognition to Home Assistant

The voice assistant requires a speech recognition toolkit (Speech-To-Text, STT) to process voice commands received through the ReSpeaker. Vosk is an open‑source offline speech recognition toolkit that supports more than 20 languages and dialects. It is an excellent choice for Raspberry Pi‑based Home Assistant setups.

You have to install the Vosk add-on from this repository:

Plaintext
https://github.com/rhasspy/hassio-addons

To do so, you must add this repository from the Repositories menu, accessible via the three‑dots menu in Settings → Add‑ons → Add‑on store.

Then install the vosk add-on:

And add vosk integration to Home Assistant:

Note: Whisper is another local Speech-To-Text option, but it has slower processing speed and is not suitable for Raspberry Pi 4B performance.

Add Piper Text-To-Speech to Home Assistant

Vosk provides offline, local speech-to-text recognition. For text-to-speech, the default configuration uses the online Google translation service. If you prefer a fully local and offline assistant pipeline, you can install Piper as a text-to-speech add-on. It runs efficiently on a Raspberry Pi 4, offering excellent performance and low latency.

Piper add-on for local offline Text-To-Speech

And add Piper integration to Home Assistant:

Add a voice assistant

Add reSpeaker voice assistant using Vosk Speech-to-text and Piper Text-to-speech:

The list of entities exposed to voice assistant are configured in the Expose tab of the Voice assistants menu. I recommend you to define short aliases for the entities you want to control by voice, this will make the voice assistant much easier to use.

Example of alias added to voice control a light bulb in bedroom

ESPHome configuration

The ESPHome YAML configuration required to enable the ReSpeaker XVF3800 as Home Assistant voice assistant is available in formatBCE GitHub repository.

Simply copy and paste this configuration into your ReSpeaker setup, then customize the API and OTA settings with the values you defined when creating the device (or stored in your secrets).

After compiling and uploading the configuration to the ReSpeaker for the first time (using ESPHome Web via the ESP32-S3’s USB port), add the ReSpeaker integration in Home Assistant. The device should then appear automatically in the list of available devices.

You have now to configure this new device. Select the voice assistant configuration you created for the ReSpeaker, then choose a wake word (for example, “Okay Nabu”). When the wake word is detected, the ReSpeaker wakes up and listens for your voice command.

Focus on LED management

The Respeaker XVF3800 ESPHome component provides a set_led_ring() method for precise control of its 12-LED ring.

C++
  // Individual LED ring control (12 LEDs)
  void set_led_ring(uint32_t *rgb_array);

For each LED, color and brightness are encoded in a 32-bit value (4 bytes with 0x00RRGGBB format):

  • Byte 3 (MSB, bits 24-31): 0x00 (unused/reserved)
  • Byte 2 (bits 16-23): (RED component × brightness) & 0xFF
  • Byte 1 (bits 8-15): (GREEN component × brightness) & 0xFF
  • Byte 0 (LSB, bits 0-7): (BLUE component × brightness) & 0xFF

An interval triggers a script every 50ms to create smooth LED animations. The example below shows a rotating purple effect with cycling brightness on a 12-LED ring.

YAML

esphome:
  name: respeaker-xvf3800-tutorial
  friendly_name: ReSpeaker XVF3800 Tutorial

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: esp-idf

wifi:
  id: wifi_id
  ssid: !secret wifi_ssid
  password: !secret wifi_password

logger:
  level: DEBUG

api:
  id: api_id  
  encryption:
    key: !secret api_key

ota:
  - platform: esphome
    id: ota_esphome
    password: !secret ota_password

i2c:
  - id: internal_i2c
    sda: GPIO5
    scl: GPIO6
    scan: true
    frequency: 100kHz

psram:
  mode: octal
  speed: 80MHz

globals:
  - id: led_brightness
    type: uint8_t
    restore_value: no
    initial_value: '0'
  - id: illuminated_led
    type: uint8_t
    restore_value: no
    initial_value: '0'
     
binary_sensor:

interval:
  - interval: 50ms
    id: led_animation_interval
    then:
      - lambda: |-
          id(update_led).execute();

script:
  - id: update_led
    then:
      - lambda: |-
          constexpr int NUM_LEDS = 12;
          uint32_t colors[NUM_LEDS];
          
          // Set all LEDs to black (0x00000000)
          for (int i = 0; i < NUM_LEDS; i++) {
            colors[i] = 0x00000000;
          }
          
          // Purple: high RED + BLUE, no GREEN. Use 0x00RRGGBB format
          uint8_t bright = (id(led_brightness)+0x30)%0xFF;
          colors[id(illuminated_led)] = (0x00 << 24) | (bright << 16) | (0x00 << 8) | bright;
          
          // LED ON position
          id(illuminated_led) = (id(illuminated_led) + 1) % NUM_LEDS;
          
          // Increase led brightness
          id(led_brightness)++;
          
          // Send led info to XVF3800 chip via I2C command
          id(respeaker).set_led_ring(colors);

external_components:
  - source:
      type: git
      url: https://github.com/formatBCE/Respeaker-XVF3800-ESPHome-integration
      ref: main
    components: 
      - respeaker_xvf3800
      - aic3104
    refresh: 0s


respeaker_xvf3800:
  id: respeaker
  address: 0x2C
  dfu_version:
    name: "Firmware Version" 
    update_interval: 120s
  led_beam_sensor:
    name: "Voice Beam Direction"
    id: beam_direction
    internal: true
  firmware:
    url: https://github.com/formatBCE/Respeaker-XVF3800-ESPHome-integration/raw/refs/heads/main/respeaker_xvf3800_i2s_master_dfu_firmware_v1.0.5_48k.bin
    version: "1.0.5"
    md5: be440ddef7d34571f01b74513005887f

    

ESP32-S3 configuration

The ESP-S3 micro-controller configuration of ReSpeaker XVF3800 is optimized for Home Assistant voice integration, real-time I²S audio streaming and flicker-free LED animations.

The configuration increases default 160MHz CPU to the maximum stable 240MHz. Memory is optimized for Micro-Wake-Word detection efficiency and enabling flash operations (OTA updates) without stalling execution – critical for continuous voice detection.

YAML
esp32:
  board: esp32-s3-devkitc-1
  cpu_frequency: 240MHz
  variant: esp32s3
  flash_size: 8MB
  framework:
    type: esp-idf
    version: recommended
    sdkconfig_options:
      CONFIG_ESP32S3_DATA_CACHE_64KB: "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B: "y"
      CONFIG_ESP32S3_INSTRUCTION_CACHE_32KB: "y"

      # Moves instructions and read only data from flash into PSRAM on boot.
      # Both enabled allows instructions to execute while a flash operation 
      # is in progress without needing to be placed in IRAM.
      # Considerably speeds up mWW at the cost of using more PSRAM.
      CONFIG_SPIRAM_RODATA: "y"
      CONFIG_SPIRAM_FETCH_INSTRUCTIONS: "y"

      CONFIG_BT_ALLOCATION_FROM_SPIRAM_FIRST: "y"
      CONFIG_BT_BLE_DYNAMIC_ENV_MEMORY: "y"

      CONFIG_MBEDTLS_EXTERNAL_MEM_ALLOC: "y"
      CONFIG_MBEDTLS_SSL_PROTO_TLS1_3: "y"  # TLS1.3 support isn't enabled by default in IDF 5.1.5

Let’s test it!

Watch this complete ReSpeaker XVF3800 + Home Assistant demo showing:

  • “Okay Nabu” wake word detection
  • Beamforming LEDs pointing to speaker
  • Real voice commands (“turn on lights”)
  • Smooth LED feedback through all phases

How useful was this post?

Click on a star to rate it!

Average rating 5 / 5. Vote count: 1

No votes so far! Be the first to rate this post.

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?