← Back to home

Speaking Duolingo Owl — Voice-Enabled Reminder Toy

Birthday Gift Project ESP32-S2 • I2S Audio • LittleFS • Duolingo API Hardware • Embedded • IoT • Voice

What is it?

A voice-enabled Duolingo plush toy that checks your daily lesson status every 30 seconds. If you haven't completed your lesson, it plays friendly reminders at 9 PM, 9:15 PM, and 9:30 PM (in both Polish and Spanish). When you do complete it, it immediately celebrates with a message tailored to the time of day. The ESP32-S2 polls the Duolingo API directly via Wi‑Fi and plays audio files stored in LittleFS.

I built this for my dad's birthday because he's a huge Duolingo fan. As far as I can remember, he's been learning German and now Spanish there, and his streak is well above 1000 days... I bought a Duolingo plushie owl, cut it open, and put the electronics inside so that you can plug this plushie into any socket and it works 24/7.

The owl stays on all day, polling the Duolingo public API every 30 seconds to check if today's lesson was completed. If no lesson is detected, it plays reminder messages at 9 PM, 9:15 PM, and 9:30 PM ("Krzysiu! czas na twoją lekcję hiszpańskiego!" in Polish or "Krzysiu, ¡es hora de tu clase de español!" in Spanish). When a lesson is completed, it immediately plays a celebration message based on the time of day: early morning (00:00-08:00), daytime (08:00-19:30), or evening (19:30-23:59). All messages play in both Polish and Spanish, one after another with a 300ms gap.

Speaking Duolingo Owl Build Guide

Turn a Duolingo plush into a voice-enabled reminder system that keeps your learning streak alive.

The Story Behind It

My dad has been learning languages on Duolingo for years. His streak is over 1000 days now—German first, then Spanish. For his birthday, I wanted to make something personal that connected to his daily routine. So I got a Duolingo owl plush, hollowed it out, and stuffed it full of electronics.

The idea was simple: the owl should check if he did his lesson today, and if not, remind him at 9 PM, 9:15 PM, and 9:30 PM. If he did do it, celebrate it immediately with a message that fits the time of day. Nothing complicated—just a friendly nudge from a plush toy.

I didn't do anything special to isolate the electronics—just put them in a plastic bag and stuffed them inside the plush. It's been working fine plugged into the wall for months now.

How It Works

The system stays on all day and works like this:

  1. ESP32-S2 connects to Wi‑Fi (IoT Wi-Fi network) and syncs time via NTP
  2. Polls Duolingo API every 30 seconds using the public endpoint https://www.duolingo.com/2017-06-30/users?username=%%MY_DADS_USERNAME%%
  3. Checks multiple indicators to determine if today's lesson was completed:
    • First checks streak_today_increase flag (boolean or int)
    • Then looks at calendar entries for today's date (local and UTC)
    • Finally checks streakData.currentStreak dates as a heuristic
  4. If no lesson detected: Plays reminder messages at 9:00 PM, 9:15 PM, or 9:30 PM (only once per day per reminder)
  5. If lesson completed: Immediately plays celebration message based on time band (00:00-08:00, 08:00-19:30, or 19:30-23:59)
  6. Audio playback: Each message plays in Polish, then Spanish with a 300ms gap between them

Why direct API polling? I wanted to keep it simple—no backend server to maintain, no AWS Lambda costs, no authentication tokens to manage. The ESP32 just queries the public Duolingo API endpoint directly every 30 seconds. The trade-off is it needs to stay on all day, but since it's plugged into the wall, that's fine. The code handles timezone conversion (Europe/Warsaw with DST), date comparison, and all the logic locally on the ESP32.

What You'll Need

Hardware (BOM)

  • OLIMEX ESP32-S2-DEVKIT-LIPO — Wi‑Fi prototyping board with ESP32-S2-WROOM module and built-in LiPo charging
  • MAX98357A I2S Audio Amplifier (or similar I2S DAC/amp) — Digital audio amplifier breakout board
  • Full-range Speaker, 75mm, 3W, 8Ω — Wideband loudspeaker with paper cone
  • Duolingo Owl Plush — The victim... I mean, the host
  • USB-C Cable & Power Adapter — For 24/7 power

Note on audio storage: I originally planned to use a DFPlayer Mini with a microSD card for audio playback. The DFPlayer gave me trouble (more on that later), so I ended up storing the audio files directly in the ESP32's LittleFS file system instead. I created a custom partition table with 3MB allocated for LittleFS (labeled "spiffs"). After converting and compressing the audio files, I managed to fit all 12 WAV files in the internal storage. The files are stored in /Reminders_converted/Not_done/ and /Reminders_converted/Done/ directories.

Software & Services

  • Arduino IDE with ESP32 board support
  • Python for audio conversion (pydub, ffmpeg, audioop)
  • Duolingo Public API — The ESP32 queries https://www.duolingo.com/2017-06-30/users?username=USERNAME directly
  • ElevenLabs (or similar) for text-to-speech audio generation
  • Wi‑Fi Network — I created a dedicated IoT network (Orange_IoT_7B40) for better reliability

Step-by-Step Build Guide

Step 1: Generate Audio Files

I used ElevenLabs to generate the voice messages. I created:

  • 3 reminder messages in Polish: "Nie zrobiłeś dzisiaj lekcji!" (and variations)
  • 3 reminder messages in Spanish: "¡No hiciste la lección hoy!" (and variations)
  • 3 celebration messages when the lesson was done: "Yuppiiii!" (and variations)

Each message was saved as an MP3 file, then converted to the format the ESP32 can play.

Step 2: Convert Audio Files

The audio files need to be converted to WAV format (mono, 22050 Hz, 16-bit) for the ESP32's I2S audio playback. I used Python with pydub and ffmpeg. The project includes convertWAV.py which automatically finds audio files in the Reminders directory and converts them:

from pathlib import Path
from pydub import AudioSegment
import wave

# Convert MP3 to WAV: mono, 22050 Hz, 16-bit PCM
def convert(src_path: Path, dst_path: Path):
    audio = AudioSegment.from_file(src_path)
    # Enforce: mono, 22050 Hz, 16-bit (2 bytes per sample)
    audio = audio.set_channels(1).set_frame_rate(22050).set_sample_width(2)
    audio.export(dst_path, format="wav")

The script also includes shrink.py which can further compress files by resampling to 16 kHz if needed to fit within the LittleFS partition size.

Convert all your audio files, then upload them to the ESP32's LittleFS file system using Arduino IDE's "Tools → ESP32 Sketch Data Upload" or PlatformIO's filesystem upload feature.

Step 3: Wiring

Connect the MAX98357A I2S amplifier (or similar I2S DAC/amp) to the ESP32-S2. The code defines these pins in audio_wav.cpp:

ESP32-S2 ↔ MAX98357A I2S Connections
MAX98357A Pin ESP32-S2 Pin Notes
VIN 5V Power supply (3.3V also works)
GND GND Ground
LRCLK (WS) GPIO 15 Left/right clock (word select)
BCLK GPIO 14 Bit clock
DIN GPIO 13 Data input (I2S DOUT)

Connect the speaker to the MAX98357A's speaker outputs (SPK+ and SPK-). The 8Ω speaker works well with the MAX98357A. The I2S configuration in the code outputs 16-bit stereo PCM at the WAV file's native sample rate (usually 22050 Hz).

Step 4: Configure Wi‑Fi and Duolingo Username

In sound.ino, set your Wi‑Fi credentials and Duolingo username:

// Wi‑Fi configuration
static const char* WIFI_SSID = "Orange_IoT_7B40";  // Change to your network
static const char* WIFI_PASS = "YOUR_PASSWORD";        // Change to your password

// Duolingo username
static const char* DUO_USER = "MY_DADS_USERNAME";    // Change to target user

The code includes robust Wi‑Fi connection handling with automatic retries, radio tuning for better reliability, and disconnect recovery. I had issues with my regular Wi‑Fi network, so I created a dedicated IoT network (2.4 GHz) which worked perfectly.

Step 5: Set Up Partition Table

The project uses a custom partition table (partitions.csv) to allocate 3MB for LittleFS:

# 4MB flash — 3.09MB LittleFS labeled "spiffs"
# Name,   Type, SubType, Offset,  Size,     Flags
nvs,      data, nvs,     0x9000,  0x5000,
app0,     app,  factory, 0x10000, 0x100000,
spiffs,   data, spiffs,  0x110000,0x2F0000,

In Arduino IDE, go to "Tools → Partition Scheme → Custom Partition Table" and point it to your partitions.csv file.

Step 6: Program the ESP32

The main code (sound/sound.ino) does the following:

  • Setup: Initializes LittleFS, I2S audio, Wi‑Fi, and time sync
  • Wi‑Fi connection: Robust blocking connection with retries and radio tuning
  • Time sync: Syncs with NTP servers using Europe/Warsaw timezone (CET/CEST with DST)
  • Initial check: On boot, checks Duolingo status and plays appropriate message
  • Main loop:
    • Checks for midnight rollover to reset daily flags
    • Every minute: checks if it's time for reminder (9:00, 9:15, or 9:30 PM)
    • Every 30 seconds: polls Duolingo API to check if lesson was completed
    • If lesson completed: plays celebration message based on current time band
  • Audio playback: Uses I2S to play WAV files from LittleFS, supports 8/16-bit mono/stereo
  • Persistent storage: Uses Preferences (NVS) to track which reminders were already played today

The code includes comprehensive error handling, Wi‑Fi disconnect recovery, and detailed serial logging for debugging.

Step 7: Assembly

Time to stuff everything inside the owl:

  1. Cut a small opening in the plush (I did it at the bottom seam)
  2. Put all the electronics in a plastic bag (seriously, that's all I did for insulation)
  3. Stuff the bagged electronics into the plush
  4. Route the USB cable out through the opening
  5. Sew or tape the opening closed (leaving the cable accessible)

Make sure the speaker is positioned so the sound can actually come out. I placed mine near the opening so the sound isn't muffled.

Troubleshooting

DFPlayer Problems (Original Plan)

I originally wanted to use a DFPlayer Mini with a microSD card for audio playback. The ESP32 was successfully sending command frames to the DFPlayer (I could see "Sent frame: …" in the serial output), but:

  • No audible sound from the speaker
  • No confirmed response from the DFPlayer
  • The red light on my card reader wasn't working when connected, which might have been a sign

Rather than debug the DFPlayer issue, I switched to using the ESP32's internal storage with LittleFS and direct I2S audio output. The ESP32-S2 has enough flash to store the 12 audio files after conversion and compression. This worked immediately and was actually simpler—no need to manage a separate audio module and SD card.

Wi‑Fi Connection Issues

I had trouble connecting to the regular Wi‑Fi network. The solution was to create a dedicated IoT Wi‑Fi network (Orange_IoT_7B40, separate 2.4 GHz network) and connect the ESP32 to that. The code also includes Wi‑Fi radio tuning (country code, protocol settings, power management) which significantly improved reliability. This worked perfectly and is actually a better practice anyway—keeps IoT devices isolated from your main network.

Audio File Size

The LittleFS partition is 3MB, which is enough for 12 WAV files at 22050 Hz, 16-bit, mono. I used shrink.py to resample larger files down to 16 kHz if needed. Make sure to optimize your audio files for size while maintaining acceptable quality. The files need to be in WAV format (PCM), not MP3, since the ESP32 decodes them directly.

Timezone and Date Handling

The code handles timezone conversion carefully. It maintains both local (Warsaw) and UTC date strings and checks both when looking at the Duolingo calendar entries. This is important because the API might return dates in UTC while your local date might differ. The code also handles DST (daylight saving time) transitions automatically.

Repository Structure

The GitHub repository is organized like this:

Speaking-Duolingo-Owl/
├── sound/
│   ├── sound.ino              # Main ESP32 code
│   ├── audio_wav.h            # Audio playback header
│   ├── audio_wav.cpp          # I2S audio implementation
│   ├── partitions.csv         # Custom partition table (3MB LittleFS)
│   └── data/
│       └── Reminders_converted/
│           ├── Not_done/      # Reminder WAV files
│           │   ├── 9pm_pl.wav
│           │   ├── 9pm_sp.wav
│           │   ├── 9pm15_pl.wav
│           │   ├── 9pm15_sp.wav
│           │   ├── 9pm30_pl.wav
│           │   └── 9pm30_sp.wav
│           └── Done/          # Celebration WAV files
│               ├── 0_8_pl.wav
│               ├── 0_8_sp.wav
│               ├── 8_7pm30_pl.wav
│               ├── 8_7pm30_sp.wav
│               ├── 7pm30_11pm59_pl.wav
│               └── 7pm30_11pm59_sp.wav
├── Reminders/                 # Original MP3 files (before conversion)
│   ├── Done/
│   └── Not_done/
├── check_duo.py               # Python script to test Duolingo API checking
├── convertWAV.py              # Audio conversion script (MP3 → WAV)
├── convertMP3.py              # Original MP3 conversion (for DFPlayer attempt)
├── shrink.py                  # Audio compression script
├── schedule.txt               # Detailed schedule and file mapping
└── testing_wifi_connection_esp32/
    └── testing_wifi_connection_esp32.ino  # Wi‑Fi connection test sketch

Audio File Schedule

The audio files are organized by type, language, and timing:

  • Reminders (Not_done): Played at 9:00 PM, 9:15 PM, or 9:30 PM if no lesson completed
    • 9pm_pl.wav / 9pm_sp.wav: "Krzysiu! czas na twoją lekcję hiszpańskiego!" / "Krzysiu, ¡es hora de tu clase de español!"
    • 9pm15_pl.wav / 9pm15_sp.wav: "Hola, señor! pamiętasz o mnie? to tylko parę minut — twoja żona poczeka." / "¡Hola, señor! ¿Te acuerdas de mí? Solo serán unos minutos; tu esposa esperará."
    • 9pm30_pl.wav / 9pm30_sp.wav: "No dobra, przesadziłeś — lekcja sama się nie zrobi..." / "Bueno, te pasaste; la lección no se aprende sola..."
  • Celebrations (Done): Played immediately when lesson is completed, based on time of day
    • 0_8_pl.wav / 0_8_sp.wav: For lessons done between 00:00-08:00 ("Brawo Ty! jesteś GOAT-em!")
    • 8_7pm30_pl.wav / 8_7pm30_sp.wav: For lessons done between 08:00-19:30 ("Świetnie! masz cały wieczór dla siebie — albo dla Renaty.")
    • 7pm30_11pm59_pl.wav / 7pm30_11pm59_sp.wav: For lessons done between 19:30-23:59 ("Ekstra! nie trać tempa!")

The code automatically selects the correct file based on the current time when a lesson is detected. Each message plays in Polish first, then Spanish with a 300ms gap between them.

Code & Resources

📦 All code is available on GitHub:

View on GitHub →

Final Thoughts

This project was more about the gesture than the tech. My dad loved it, even though it's not the most polished build. The owl sits on his desk, plugged in, quietly checking every 30 seconds if he's done his lesson. Sometimes it reminds him at 9 PM, sometimes it celebrates with him as soon as he completes it. It's become part of his routine.

What I learned: Sometimes the simplest solution (direct API polling, internal storage instead of DFPlayer, plastic bag insulation) works just fine. You don't always need the fanciest tech—you just need it to work reliably. The ESP32-S2 is perfectly capable of handling all the logic locally, including timezone conversions, date parsing, and audio playback.

If you build one, remember: the hardware is the easy part. The tricky bit is handling timezone conversions properly (especially DST transitions), parsing the Duolingo API response correctly (it has multiple ways to indicate completion), and making sure the audio files fit in LittleFS. But once it's running, it's pretty hands-off. Just plug it in and let it do its thing.

The code includes comprehensive serial logging, so you can debug issues easily by watching the Serial Monitor. It logs Wi‑Fi connection status, API responses, date calculations, and which audio files are being played.