Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

watchdog may reboot in the middle of storage compact #2075

Closed
fanoush opened this issue Oct 25, 2021 · 4 comments
Closed

watchdog may reboot in the middle of storage compact #2075

fanoush opened this issue Oct 25, 2021 · 4 comments

Comments

@fanoush
Copy link
Contributor

fanoush commented Oct 25, 2021

the jsfCompactInternal is tight loop so when the compact takes longer than watchdog interval, device is rebooted in the middle of compact leaving it corrupted or erased.

With nRF52840 there is >500KB of free internal storage. When it is used for storage and is almost full, compact can take up to 50 seconds(!). This is too long for any typical watchdog period.

Workaround is to add jshKickWatchDog(); e.g. to end of while loop here https://github.com/espruino/Espruino/blob/master/src/jsflash.c#L349 or here
https://github.com/espruino/Espruino/blob/master/src/jsflash.c#L401

However in case of possible bug in this code causing neverending loop we will kick watchdog indefinitely preventing reboot.
Would you accept PR with just adding jshKickWatchDog() somewhere there? Better solution would be to actually track the progress somehow and ping watchdog only if we are sure it is going somewhere but not sure how to do that. or maybe some timer that is much longer and would limit watchdog kicking to e.g. 5 minutes.

This is may be related to #2009 as some corruptions seen can be because of watchdog firing unnoticed but otherwise is a separate issue.

Another issue is why compact of 500KB of internal storage takes 50 seconds while external SPI flash of larger size is faster. But I guess it may be because of BLE being enabled and softdevice flash erase/write scheduling being slow due to nature of Bluetooth timing(?)

@gfwilliams
Copy link
Member

Thanks for spotting this!

But yes, as you say, it sounds strange that accessing the flash takes so long. I remember page erases always took surprisingly long, but not that much!

Maybe, just in case we could do something like:

if (addr > wdtAddr) {
  wdtAddr = addr;
  jshKickWatchDog();
}

Obviously it'd need resetting but it has the bonus that if there were an infinite loop then the WDT would at least stop firing

@fanoush
Copy link
Contributor Author

fanoush commented Oct 25, 2021

I remember page erases always took surprisingly long, but not that much!

oh, just checked
https://infocenter.nordicsemi.com/topic/com.nordic.infocenter.nrf52832.ps.v1.1/nvmc.html?cp=4_2_0_10_1#concept_erase_page_code_memory and one known thing is the "The CPU is halted while the NVMC performs ..." for both write and erase. However then I clicked through tERASEPAGE time and erasing one page takes up to 90ms! so compacting 140 pages by shifting data and erasing each page takes full 12s while the CPU is completely halted. Since the BLE stack must schedule this between BLE traffic it may take much longer to find right time to do all this.

since I recall 52840 has partial page erase feature I did some search and found this https://devzone.nordicsemi.com/f/nordic-q-a/63810/sd_flash_page_erase-time-execution-on-nrf52840 see answer on the bottom, 1 year ago
"I checked with the softdevice developers. The softdevice does not support partial page erase, and will disable all interrupts during the erase (up to 90 ms). You can implement partial erase in timeslots"

as for the watchdog, if addr is going in one direction (up in this case) then this looks like nice solution :-) I'll try it.

@fanoush
Copy link
Contributor Author

fanoush commented Oct 26, 2021

tested it yesterday and put it near the end of do { ... } while (jsfGetNextFileHeader(&addr, &header, GNFH_GET_ALL)) loop in jsfCompactInternal but it was not enough and it still rebooted on larger files (15s watchdog, mix of files 5,50,250KB) so I ended up with putting jshKickWatchDog() into the jsfCompactWriteBuffer - while (*swapBufferUsed) loop unconditionally and that worked. However maybe it is unlikely that someone will have 250KB file in internal storage - maybe some logs(?) Will test more, still don't understand what exactly jsfCompactWriteBuffer does - compacts exactly one file? or one continuous chunk of storage blocks?

@gfwilliams
Copy link
Member

Ok, well I think kicking the watchdog is preferable to crashing - so I'd be happy to have a PR. Even if it crashes it's drawing so much power it'll flatten itself quite quick ;)

So... The idea of jsfCompactWriteBuffer is suppose you've got this:

FILE1
old file
FILE2
old file
FILE3
--page boundary---
FILE3-continued over lots of pages
FILE4

You need to remove the old files, BUT you can't just erase the first page, because it's got FILE1/2/3 in it.

So you have to read FILE1/2/3 into RAM, erase the page and write them back out into it... but it's not even that easy because FILE3 itself won't fit into RAM, so you have to read just the first bit of it, then copy the rest over 4k at a time.

Anyway, that's why it's so complicated :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants