From the bench
Existing wired doorbell chimes are weird — most are 16V AC, not DC. I wired my relay's NO into the chime line assuming DC, got nothing, then a faint hum, then a friend pointing at the transformer in the wall. The relay handles AC fine. What you cannot do is power the ESP32 from the same line, which seems obvious in hindsight and was not obvious to me at the time. Trace the existing wiring before you cut anything.
The smart doorbell market is roughly Ring vs Nest vs everyone else, and all of them want a monthly subscription to keep your own video accessible. The DIY version of the same thing is genuinely competitive: an ESP32-CAM, a momentary button, a small enclosure, and a Telegram bot or your own server. No subscription, no data going to a cloud you don't control, and the parts cost is around $20.
This is different from the motion-activated security camera we built earlier. That one uses PIR to wake on motion. This one wakes on a button press, which means it has a much narrower trigger and can run on continuous power without exhausting itself. It is also designed to identify visitors quickly — a single still image plus a short audio clip if you add a microphone — rather than to record general activity.
What we are building
capture 800x600 JPEG] ESP -->|optional| Mic[I2S microphone
5 sec clip] Cam --> ESP Mic --> ESP ESP -->|HTTPS POST| Telegram[Telegram bot API] Telegram --> Phone[Your phone
image + caption] ESP -->|GPIO HIGH| Speaker[Optional
chime relay]
Press button, capture image, send via Telegram. Optional indoor chime via relay. The whole flow takes 3–5 seconds end-to-end.
Hardware
- ESP32-CAM (AI Thinker) — $8
- FTDI USB-TTL adapter for flashing — $5 (one-time)
- Momentary push button (waterproof if outdoor) — $3
- 5V 2A power supply (run wired from inside) — $5
- 3D-printed or off-the-shelf doorbell enclosure — $5–15
- Optional: INMP441 I2S microphone for audio — $4
- Optional: relay module + indoor chime/buzzer — $3
About $25 base, $30 with audio. Outdoor enclosure quality matters more than the electronics — the ESP32-CAM is fine in moderate weather inside a sealed box; direct rain or freeze-thaw cycles will kill it within a year.
Why wired power, not battery
The motion-activated security camera in our earlier project ran on 18650 cells because PIR triggers are infrequent. A doorbell button press is also infrequent, but a doorbell needs to respond fast — within 1–2 seconds — and that means staying connected to WiFi continuously. Continuous WiFi is around 80–120 mA on the ESP32-CAM, which kills any reasonable battery in a day or two.
Most existing doorbells already have wired power for the chime — usually 16–24V AC from a transformer. You can convert that to 5V DC with a small AC-DC module ($3) and reuse the existing wire run. If you have to run new wire, USB-C cable through a hidden conduit works.
What goes wrong
- WiFi range. Doorbells live near doors, often the worst spot for WiFi. Add an external antenna (the AI Thinker board has a u.FL connector) or place a WiFi extender nearby.
- Mechanical button bounce. A 3-second debounce in the ISR matches typical visitor behaviour (no normal person mashes the button repeatedly). Lower it for a hyperactive household.
- Cold weather. The ESP32-CAM is rated to −40°C but the lithium battery in any battery-powered version is not. For wired builds, fine.
- The bot's response delay. Telegram is fast (sub-second) but only after WiFi is connected. From cold start the chain is: button press → WiFi authenticate → TLS handshake → Telegram POST → notification. Total around 3 seconds in good conditions, 6–8 in poor ones.
Going further
- Two-way audio. Add an INMP441 microphone and a small speaker. Stream voice via WebRTC.
- Face recognition for known visitors. The ESP32-S3 (different chip, but available in CAM-board variants) has enough horsepower for face detection via TensorFlow Lite Micro.
- Local recording. Add a microSD card; save every visitor image with timestamp.
- Integration with smart home. Replace the Telegram bot with MQTT to Home Assistant.
- Better camera. The OV2640 is mediocre in low light. Swap for an OV5640.
Frequently Asked Questions
How is this different from a Ring doorbell?
Cheaper (one-time $25 vs $100–200 + $5/mo subscription), fully owned (your video, your bot, your storage), but less polished — no slick app, no community-detected porch pirates, no facial-recognition alerts out of the box. Tradeoff is yours.
Can I use this without Telegram?
Yes. Replace the Telegram POST with HTTP POST to your own webhook, or with MQTT publish, or with a SignalCLI bridge. The image is just a JPEG; any service that accepts uploads will do.
How long does the SD card last for recording?
A full-quality JPEG is ~50–100 KB on the OV2640 at 800×600. With one capture per visitor and ~10 visitors a day, a 32 GB card holds years of doorbell history.
Key code: main loop
This is the heart of the firmware, taken from the working sketch. The complete file (with config template, library list, and the rest of the helpers) is around 79 lines and is included in the downloadable project package — request it via the form below.
void loop() {
if (digitalRead(BUTTON_PIN) == LOW && millis() - lastPress > COOLDOWN_MS) {
Serial.println("[btn] PRESS");
lastPress = millis();
ringChime();
captureAndSend();
}
delay(20);
}Get the complete project package
The article above shows the core firmware and the principles behind it. The complete project package — assembled, tested, and ready to flash — is available by email request. We send it manually, and we read every request.
- Complete Arduino sketch (.ino) with full error handling
- List of required libraries with version numbers
- Printable wiring diagram (PDF)
- Bill of materials with current part numbers
- Build guide and troubleshooting tips
- Configuration template (WiFi, MQTT, etc.)
Share your thoughts
Worked with this in production and have a story to share, or disagree with a tradeoff? Email us at support@mybytenest.com — we read everything.