Skip to content

fix(power): sleep/battery reliability hardening#10425

Closed
DatanoiseTV wants to merge 6 commits into
meshtastic:developfrom
DatanoiseTV:fix/power-sleep-reliability
Closed

fix(power): sleep/battery reliability hardening#10425
DatanoiseTV wants to merge 6 commits into
meshtastic:developfrom
DatanoiseTV:fix/power-sleep-reliability

Conversation

@DatanoiseTV
Copy link
Copy Markdown
Contributor

Summary

A bundle of small power-management fixes uncovered while auditing the sleep state machine and battery-presence logic. All cross-cutting (no board-specific pin changes); each commit independently revertable.

Fixes

  • sleep: arm deep-sleep hold so per-pin gpio_hold_en actually holdsdoDeepSleep calls gpio_hold_en on BUTTON_PIN and LORA_CS, but the ESP-IDF API contract is that gpio_hold_en alone only retains pin state during light sleep. Deep sleep additionally needs a one-time gpio_deep_sleep_hold_en() before esp_deep_sleep_start to arm the holds across the boundary. Without it the pins float in DSLP, which on some boards leaks current through LORA_CS and on others triggers spurious wake from a noisy button line.
  • gps: soft-sleep the receiver during CPU light-sleepGPS only subscribed to notifyDeepSleep. During light sleep (the default non-router path) the receiver kept drawing ~25 mA the entire time the MCU was asleep — the dominant drain on devices that spend most of their time in LIGHT_SLEEP. Subscribe to notifyLightSleep / notifyLightSleepEnd and use GPS_SOFTSLEEP instead of full disable() so the state machine survives across the short window without paying re-init / re-acquire on every cycle.
  • power: latch low-battery decision in RTC memory and add recovery hysteresis — the previous handler counted 11 consecutive sub-OCV[min] readings and triggered EVENT_LOW_BATTERY, but on SDS wake low_voltage_counter reset to 0. With the cell still flat, the count climbed back to 11 and re-deep-slept — wake-thrash burning boot energy each cycle. There was also no recovery hysteresis: a brief spike across threshold cleared the count even with the cell effectively flat. Persist the latch in RTC_DATA_ATTR (plain static on non-ESP32 — same effective semantics since SDS doesn't reset there). Once latched, clearing requires K consecutive readings above OCV[min] + 100 mV. If the device wakes still latched-flat, it immediately re-enters SDS.
  • sleep: replace assert(0) on stuck preflight with controlled restartwaitEnterSleep panicked with assert(0) when an observer vetoed sleep for >30 s (FIXME wait to sleep bug - "wait to sleep" is starving the router transmit timer handler #167). On a field device that turns into a noisy crash log every 30 s rather than a clean reboot. Use the same per-arch restart primitives Power::reboot already uses (ESP.restart / NVIC_SystemReset / rp2040.reboot / HAL_NVIC_SystemReset).
  • power: use BQ25896 ADC for both battery voltage and presence — on the BQ27220+BQ25896 combo, getBattVoltage returned the fuel-gauge reading while isBatteryConnect derived presence from the BQ25896 charger ADC. The two could disagree on a freshly-attached cell before the gauge had learned: hasBattery would say false but voltage 4.0 V, and the upstream PowerFSM would treat the unit as externally powered. Read voltage from the same source as presence so both views always agree.

Build verification

  • pio run -e t-deck-tft — SUCCESS, 38.4% RAM / 55.3% flash. No new warnings.
  • Not yet run on hardware — please request community testing for the GPS soft-sleep behaviour and the low-battery latch in particular.

Deferred (scope kept tight)

  • T-Deck Pro v1.1 board-specific power gating (KB_BL pin 42, GPS_EN pin 39, LORA_EN pin 46, DRV2605 pin 2, BMS IRQ wiring) — separate follow-up branch when desired.
  • I²C-keyboard polling-rate reduction in DARK/LS — needs a small refactor to expose PowerFSM state to kbI2cBase, didn't want to land a half-fix here.
  • setCPUFast carve-outs on T-Deck — risks TFT/WiFi instability without bench testing.

Attestations

  • I have tested that my proposed changes behave as described — review/static-analysis only.
  • I have tested that my proposed changes do not cause any obvious regressions on the following devices:
    • Heltec (Lora32) V3
    • LilyGo T-Deck
    • LilyGo T-Beam
    • RAK WisBlock 4631
    • Seeed Studio T-1000E tracker card
    • Other — build-verified t-deck-tft only

Notes for reviewers

  • Each commit is independent; happy to split into 5 PRs if preferred.
  • The low-battery latch is the highest-impact fix in this set.
  • Edits-by-maintainers should be enabled.

doDeepSleep called gpio_hold_en for BUTTON_PIN and LORA_CS, but the
ESP-IDF API contract is that gpio_hold_en alone only covers light
sleep — deep sleep additionally requires gpio_deep_sleep_hold_en
once before esp_deep_sleep_start to arm the holds across the deep-
sleep boundary. Without it the pins float in DSLP, which on some
boards leaks current through LORA_CS and on others triggers
spurious wake from a noisy button line.
GPS only subscribed to notifyDeepSleep, so during light-sleep the
receiver kept drawing ~25 mA on a non-router device — the dominant
drain on devices that spend most of their time in LIGHT_SLEEP.
Subscribe to notifyLightSleep/notifyLightSleepEnd as well, but use
GPS_SOFTSLEEP instead of disable()+enable() since the state machine
survives across the short light-sleep window and full power-cycle
re-acquisition would cost more than it saves.
…eresis

The previous low-voltage handler counted 11 consecutive readings under
OCV[min] and triggered EVENT_LOW_BATTERY, but on SDS wake the counter
reset to 0. With the cell still flat, the count climbed back to 11 and
deep-slept again — wake-thrash burning boot energy each cycle. There
was also no recovery hysteresis: a brief spike across the threshold
would clear the count even with the battery still effectively flat.

Persist the latch in RTC memory so SDS doesn't lose it. Once latched,
clearing requires K consecutive readings above OCV[min] + 100 mV. If
the device wakes and is still latched-flat, immediately re-enter SDS
without running a normal boot.
waitEnterSleep panicked with assert(0) when an observer vetoed sleep
for >30s (FIXME meshtastic#167). On a field device that turns into a noisy
crash log every 30s rather than a clean reboot. Use the same
per-arch restart primitives Power::reboot uses, fall back to assert
only for archs we don't have a primitive for.
getBattVoltage on the BQ27220+BQ25896 combo returned the fuel-gauge
reading while isBatteryConnect derived presence from the BQ25896
charger ADC. The two could disagree on a freshly-attached cell before
the fuel gauge had learned: hasBattery would say false but voltage
4.0V, and the upstream PowerFSM would treat the unit as externally
powered. Read voltage from the same source as presence so both views
move together.
@github-actions github-actions Bot added the bugfix Pull request that fixes bugs label May 8, 2026
@thebentern thebentern requested a review from Copilot May 8, 2026 20:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens sleep and battery-handling behavior to reduce power drain and avoid unstable boot/sleep loops, primarily by improving sleep-state transitions and making low-battery behavior more deterministic across resets.

Changes:

  • Replace the “stuck sleep preflight” panic with a controlled platform reset, and enable ESP32 deep-sleep GPIO hold globally so per-pin holds actually apply in deep sleep.
  • Add ESP32 light-sleep hooks to soft-sleep the GPS during CPU light sleep and restore it on wake.
  • Add a persistent low-battery latch with hysteresis to prevent repeated wake→detect-low→sleep thrash; align BQ25896 voltage reads with presence detection.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
src/sleep.cpp Controlled restart when sleep preflight is vetoed too long; arm ESP32 deep-sleep hold before entering deep sleep.
src/Power.cpp Add RTC-persisted low-battery latch + recovery hysteresis; use BQ25896 ADC for battery voltage in the BQ25896/BQ27220 path.
src/gps/GPS.h Add ESP32-only light-sleep observer state and callbacks.
src/gps/GPS.cpp Register/unregister light-sleep observers; soft-sleep GPS during CPU light sleep and restore on wake.

Comment thread src/Power.cpp Outdated
Comment thread src/Power.cpp
Comment thread src/sleep.cpp
- Scale OCV by NUM_CELLS in the low-battery threshold check.
  getBattVoltage() returns pack voltage but OCV[] is per-cell, so the
  comparison was wrong on multi-cell variants (chatter2, station-g1).
  Scale both the latch and the recovery-hysteresis threshold so
  behaviour is correct across all NUM_CELLS configurations.
- Correct the getBattVoltage docstring on the BQ27220+BQ25896 path:
  the function returns uint16_t with 0 as the unknown sentinel, not
  NAN as the comment claimed.
- Notify rebootObservers from waitEnterSleep before the platform
  reset primitive, matching Power::reboot's contract so modules like
  InkHUD that register a notifyReboot observer get a last-second
  cleanup pass.
@DatanoiseTV
Copy link
Copy Markdown
Contributor Author

Thanks for the review — all three points addressed in ab77d4dba:

  1. NUM_CELLS scaling. Good catch. getBattVoltage() returns pack voltage and OCV[] is per-cell; the comparison was wrong on multi-cell variants (chatter2, station-g1). Scaled both the low-battery latch threshold and the recovery hysteresis offset by NUM_CELLS so behaviour matches the existing percentage-calc convention at Power.cpp:890.
  2. getBattVoltage docstring. Fixed — returns uint16_t with 0 as the unknown sentinel, not NAN.
  3. notifyReboot from waitEnterSleep. Added the notifyReboot.notifyObservers(NULL) call before the platform reset primitives, matching Power::reboot's contract so observers like InkHUD get the last-second cleanup pass.

Build still clean on pio run -e t-deck-tft. No new warnings.

@DatanoiseTV
Copy link
Copy Markdown
Contributor Author

Splitting this proactively per the same atomic-PR feedback @thebentern gave on #10424. Re-opened as 5 separate PRs, one fix per PR (all incorporate the Copilot review feedback from this thread):

Each is independent and revertable.

@DatanoiseTV DatanoiseTV closed this May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Pull request that fixes bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants