Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

advertising broken with S140 >= 6.1.0 #2000

Closed
fanoush opened this issue Apr 21, 2021 · 12 comments · Fixed by #2367
Closed

advertising broken with S140 >= 6.1.0 #2000

fanoush opened this issue Apr 21, 2021 · 12 comments · Fixed by #2367

Comments

@fanoush
Copy link
Contributor

fanoush commented Apr 21, 2021

with 6.0.0 it works but with 6.1.0 and up no device is visible. Found the issue and solution described here

https://devzone.nordicsemi.com/f/nordic-q-a/37473/advertising-does-not-start-with-s140-6-1-0-same-hex-run-without-issue-with-s140-6-0-0/145157#145157

without understanding details of espruino advertising data lifecycle, quick fix like this seems to work (= device is visible so I can at least connect to console over bluetooth repeatedly)

diff --git a/targets/nrf5x/bluetooth.c b/targets/nrf5x/bluetooth.c
index b667c0ff..1d9815da 100644
--- a/targets/nrf5x/bluetooth.c
+++ b/targets/nrf5x/bluetooth.c
@@ -2513,7 +2513,7 @@ uint32_t jsble_advertising_start() {
   adv_params.interval = bleAdvertisingInterval;

   uint32_t err_code = 0;
-  uint8_t m_enc_scan_response_data[31]; // BLE_GAP_ADV_SET_DATA_SIZE_MAX
+static  uint8_t m_enc_scan_response_data[31]; // BLE_GAP_ADV_SET_DATA_SIZE_MAX
   uint16_t m_enc_scan_response_data_len = sizeof(m_enc_scan_response_data);
 #if NRF_SD_BLE_API_VERSION<5
   err_code = adv_data_encode(&scanrsp, m_enc_scan_response_data, &m_enc_scan_response_data_len);
@@ -2528,8 +2528,10 @@ uint32_t jsble_advertising_start() {

   //jsiConsolePrintf("adv_data_set %d %d\n", advPtr, advLen);
 #if NRF_SD_BLE_API_VERSION>5
-  ble_gap_adv_data_t d;
-  d.adv_data.p_data = (uint8_t*)advPtr;
+static  ble_gap_adv_data_t d;
+static  uint8_t m_adv_data[BLE_GAP_ADV_SET_DATA_SIZE_MAX];
+memcpy(m_adv_data,advPtr,MIN(advLen,BLE_GAP_ADV_SET_DATA_SIZE_MAX));
+  d.adv_data.p_data = m_adv_data;
   d.adv_data.len = advLen;
   d.scan_rsp_data.p_data = m_enc_scan_response_data;
   d.scan_rsp_data.len = m_enc_scan_response_data_len;

however I guess there is maybe better way? the
JSV_GET_AS_CHAR_ARRAY(advPtr, advLen, advDataVar); could read it into static array directly?

@gfwilliams
Copy link
Member

Wow, thanks! That's a really good one to know.

Actually, SD 6 always had an issue with setting the scan response (because you had to update scan response and advertising at the same time IIRC and there weren't global buffers) so having static advertising buffers like this would mean that was trivial to solve as well.

Also, if you're on SD 6 you're probably on something with enough RAM that it doesn't matter.

@fanoush
Copy link
Contributor Author

fanoush commented Apr 27, 2021

as for static buffers the examples and documentation mention that any change while advertising requires second set of buffers and switch :-(

https://infocenter.nordicsemi.com/topic/com.nordic.infocenter.s132.api.v6.0.0/structble__gap__adv__data__t.html?cp=4_7_3_5_3_0_14

  • To update advertising data while advertising, provide new buffers to sd_ble_gap_adv_set_configure

but when thinking about it maybe sd_ble_gap_adv_set_configure could be called twice - once with new buffer on stack, then copied from stack to static buffer and switched again to static copy

@gfwilliams
Copy link
Member

Wow, just saw this again. Are you still hitting this?

Interestingly Bangle.js 2 has been running with non-static buffers for ages on 6.0.0 now. Having said that, while scan response is non-static, I believe the way we now write advertising data into JsVars ensures that it is now a flat string of data, so the pointer returned by JSV_GET_AS_CHAR_ARRAY should effectively be static.

But since for Espruino 6.0.0+ generally means nRF52840, we could definitely afford to drop <64 bytes of RAM on 2 buffers for this.

@fanoush
Copy link
Contributor Author

fanoush commented Jun 10, 2022

Yes, it is still an issue, just tried with Q3/Bangle 2, flashed S140-6.1.0 (have S140 dfu packages here https://github.com/fanoush/ds-d6/tree/master/espruino/DFU/Magic3) and it is broken I can't find the watch properly advertising and mostly can't reconnect via WebIDE over remembered connection. It does work sometimes. I can also see the device via nrfconnect but it has no name and advertising data is random.

It is feature of 6.1.0 and up, as described in that devzone link they call 6.1.0 compatible with 6.0.0 because the requirement is documented here https://infocenter.nordicsemi.com/topic/com.nordic.infocenter.s132.api.v6.0.0/structble__gap__adv__data__t.html?cp=4_7_3_6_3_0_14 and worked by coincidence with 6.0.0 when not static.

BTW so far 6.0.0 was behaving better with espruino than 6.1.x with advertising fixed - @enaon reported connection issues from espruino device to other devices with 6.1.0 and 6.1.1 - opening connection seemed to be slower with 6.1.x.
Could be also because of new features in 6.1 The main new feature for s140_nrf52_6.1.0 compared to s140_nrf52_6.0.0 is the full support for all mandatory LE Advertising Extensions features and qualified LE Coded PHY feature. Maybe it needs some extra handling in espruino to work better. We'll see when you eventually move to 6.1 with Bangle 2

@fanoush
Copy link
Contributor Author

fanoush commented Jun 10, 2022

Oh, BTW I just managed to brick Bangle 2 when downgrading back to S140-6.0.0, the DFU bootloader does not handle case with no valid app correctly, it jumps into the app even if not valid. When upgrading to 6.1 I was lucky as it tried repeatedly to jump to application and just rebooted via watchdog few times until I managed to download https://www.espruino.com/binaries/espruino_2v14_banglejs2.zip However second time when trying S140 6.0.0 dfu zip the bootloader did this again but for some reason display went blank and watchdog didn't help and I can't enter dfu bootlader anymore. So if you want you can try softdevice upgrade/downgrade via S140-6.x.x-magic3.zip packages from https://github.com/fanoush/ds-d6/tree/master/espruino/DFU/Magic3 and see that it tries to jump into invalid app instead of staying in DFU indefinitely. EDIT: building full hex from source and flashing over SWD fixed it

@gfwilliams
Copy link
Member

Thanks for the update! This definitely seems worth fixing properly then :)

Interesting about the 6.1.x delay issues too

The bootloader+app thing is odd - but maybe the full CRC check got disabled because the update from external flash doesn't update the saved CRC. Either way it should start the watchdog before loading the app, so it should reboot after a few seconds and then it's trivial to hold the button down to re-enter DFU?

@fanoush
Copy link
Contributor Author

fanoush commented Jun 14, 2022

it should start the watchdog before loading the app, so it should reboot after a few seconds and then it's trivial to hold the button down to re-enter DFU

Yes it should work like that but it did not for some reason. When doing DFU with https://github.com/fanoush/ds-d6/raw/master/espruino/DFU/Magic3/S140-6.0.0-magic3.zip bootloader started showing white screen, then it timed out and jumper into invalid data, display went off/black and it froze completely. Even reset/reboot via SWD did not show bootloader white screen so something was really broken. i did not save flash but I will retry this again.

Maybe when entering DFU by button it should never time out since the watchdog is running? You could exit it by holding button and letting watchdog reboot it again. At least the timeout is really short now, even when nrfconnect is started on the phone and I quickly connect to dfu, click dfu icon and then picking up zip file it often times out before I manage to pick the file from longer list. Or maybe it should at least not timeout when it is already connected?

BTW those softdevice packages do work for other 52840 watches with stock nordic bootloader. Most people are currently downgrading to 6.0.0 with 52840 watches for running Espruino so that they can build from unpatched source and also because enaon reported that connection stability issue.

As for the 6.0 vs 6.1 connection speed I found the conversation, it was actually the other way - 6.0 connection was slower than 6.1 but was 100% reliable, Newer 6.1.0 6.1.1 needed retries but were faster when it succeeded.

here is what @enaon wrote back then, device came with 6.1.1 by default

enaon: all BT stuff looks stable and fast, except for the initial connect part, it looks like it drops trying it very fast with "timeout", I cannot make a pattern yet, sometimes it needs 2-3 tries, sometimes it needs more that 10 retries.
fanoush: from watch to some other device?
enaon: yes, watch to two wheels I have here, different makers.
enaon: ok, 6.0.0 solved it 🙂
enaon: like on p8, only faster 🙂
fanoush: oh, really?
enaon: it sure must be a matter of delay in the initial connection.
on old 23 MTU images, the connection was almost instant
on the 59 MTU it added a ~500ms delay.
on the images with the 6.1.1, both 59 and the 131 one, when it connected it was as fast as on the old 23 long images
now at the 6.0.0, it has that ~300ms delay and never fails.

@fanoush
Copy link
Contributor Author

fanoush commented May 8, 2023

funny thing, I just hit one nasty bug with S140 6.1.1 - writing 4096 page of zeroes to internal flash crashes the device. The 6.0.0 works!
It is because of this issue https://devzone.nordicsemi.com/f/nordic-q-a/40088/sd_flash_write-cause-nrf_fault_id_sd_assert and is related to this line - value 4096 is not safe here with 6.1.x so beware when possibly upgrading. Reducing to 2048 as per comment https://devzone.nordicsemi.com/f/nordic-q-a/40088/sd_flash_write-cause-nrf_fault_id_sd_assert/244918 seems to work.

@fanoush
Copy link
Contributor Author

fanoush commented May 9, 2023

And BTW I was looking into this 6.1.x advertising issue again and made a bit more invasive changes. I noticed it is not just this one place I patched, there is also 2 more places https://github.com/search?q=repo%3Aespruino%2FEspruino%20sd_ble_gap_adv_set_configure&type=code one even in common jswrap_bluetooth.c

So I moved static arrays out of this method and used it also here

uint32_t jsble_advertising_update_advdata(char *dPtr, unsigned int dLen) {
#if NRF_SD_BLE_API_VERSION>5
ble_gap_adv_data_t d;
d.adv_data.p_data = (uint8_t *)dPtr;
d.adv_data.len = dLen;
d.scan_rsp_data.p_data = NULL;
d.scan_rsp_data.len = 0;
// TODO: scan_rsp_data? Does not setting this remove it?
return sd_ble_gap_adv_set_configure(&m_adv_handle, &d, NULL);
and also I added symmetrical jsble_advertising_update_scanresponse to nrf52 source below it and moved nrf5x code out of common code
#ifdef NRF5X
#if NRF_SD_BLE_API_VERSION<5
err_code = sd_ble_gap_adv_data_set(NULL, 0, (uint8_t *)respPtr, respLen);
#else
extern uint8_t m_adv_handle;
// Get existing advertising data as on SDK15 we have to be able to re-supply this
// when changing the scan response
JsVar *advData = jswrap_ble_getCurrentAdvertisingData();
JSV_GET_AS_CHAR_ARRAY(advPtr, advLen, advData);
ble_gap_adv_data_t d;
d.adv_data.p_data = (uint8_t *)advPtr;
d.adv_data.len = advLen;
d.scan_rsp_data.p_data = (uint8_t *)respPtr;
d.scan_rsp_data.len = respLen;
if (bleStatus & BLE_IS_ADVERTISING) sd_ble_gap_adv_stop(m_adv_handle);
err_code = sd_ble_gap_adv_set_configure(&m_adv_handle, &d, NULL);
if (bleStatus & BLE_IS_ADVERTISING) sd_ble_gap_adv_start(m_adv_handle, APP_BLE_CONN_CFG_TAG);
jsvUnLock(advData);
#endif
into this , do you want to have PR with this?

Also I am not sure why advertising is restarted around line here

if (bleStatus & BLE_IS_ADVERTISING) sd_ble_gap_adv_stop(m_adv_handle);
err_code = sd_ble_gap_adv_set_configure(&m_adv_handle, &d, NULL);
if (bleStatus & BLE_IS_ADVERTISING) sd_ble_gap_adv_start(m_adv_handle, APP_BLE_CONN_CFG_TAG);
and not here
return sd_ble_gap_adv_set_configure(&m_adv_handle, &d, NULL);
it is same type of code. I would remove the restart. Since both packets are now in static array and are known, both can be set, so it solves this comment
// TODO: scan_rsp_data? Does not setting this remove it?
return sd_ble_gap_adv_set_configure(&m_adv_handle, &d, NULL);
if it was an issue and maybe it was the reason for the advertising restart too?

Also one more related thing - scan response is now also kept in variable here

// only set data if we managed to decode it ok
jsvObjectSetOrRemoveChild(execInfo.hiddenRoot, BLE_NAME_SCAN_RESPONSE_DATA, data);
maybe this can be removed when having static array? and funnily while this one is in JS variable (and advertising packet is not), there is actually no NRF.getScanResponse(Data?) to get it like there is https://www.espruino.com/Reference#l_NRF_getAdvertisingData . OTOH there is both NRF.setScanResponse(data) and NRF.setAdvertising(data, options) so it is a bit of a mess (possibly for reasons) - EDIT: now I see this is also related to #1966

@gfwilliams
Copy link
Member

Ouch, ok - definitely worth adding an ifdef for 2048 vs 4096 then! Or maybe we just go down to 2048 for both - I guess it won't hurt write speed too much.

Those advertising changes sound good - I think... do you have a link to them? I didn't see an obvious branch on https://github.com/fanoush/Espruino

Scan Response has always been a bit of a pain because it was hacked it at an earlier API level when I think it could be changed independently. As you noted #1966 is very relevant here - I'm very open to suggestions for how to make a nicer API around it without breaking existing code.

it is same type of code. I would remove the restart. Since both packets are now in static array and are known, both can be set

I'm not sure I understand - you mean they can now be changed without even needing an API call and the changes will take effect? I'd be up for that but we need to be sure it still works on older API levels too (we're still using SDK11 for Microbit 1 I think?)

this one is in JS variable (and advertising packet is not)

I was pretty sure it is (although I think on later SDKs we had to have the static var as well). For instance when rotating advertising packets we do it here:

if ((bleStatus & BLE_IS_ADVERTISING) && (bleStatus & BLE_IS_ADVERTISING_MULTIPLE)) {
int idx = (bleStatus&BLE_ADVERTISING_MULTIPLE_MASK)>>BLE_ADVERTISING_MULTIPLE_SHIFT;
JsVar *advData = jsvObjectGetChildIfExists(execInfo.hiddenRoot, BLE_NAME_ADVERTISE_DATA);
bool ok = true;

I am not sure why advertising is restarted around line here and not here

I think it's that jsble_advertising_update_advdata was also called from SWI1_IRQHandler so we didn't want to mess with when advertising starts/stops at that point.

I assume that at least for some API levels the advertising had to be stopped before the scan response could be updated? Maybe not but it seems odd I'd have done that if not.

@fanoush
Copy link
Contributor Author

fanoush commented May 10, 2023

do you have a link to them?

No, I have like 5 types of different changes together so need to make branch just with that - will do and post a link.

I assume that at least for some API levels the advertising had to be stopped before the scan response could be updated? Maybe not but it seems odd I'd have done that if not.

Oh maybe that. Or I was not sure whether the stop/start could build both advertising data and scan response from the scratch again while taking the stored change of one or the other into consideration(?)

Because sometimes you want to set each separately without affecting the other as you know what you are doing and then there is api like NRF.setAdvertising which may (or may not?) also affect scan response when you pass object structure (yes?) vs raw array(not) or otherwise it would be named setAdvertisingData ?

Anyway I'll prepare the branch so we can discuss it over that.

@fanoush
Copy link
Contributor Author

fanoush commented May 10, 2023

ok made #2367 but will retest with S132 3.0 and below, or maybe I can put the restart back into a6c4118#diff-19dab7787aab7c6471994842e692eceeb00f5a9d4742dbaa3374aae2d5051e5b around call to err_code=jsble_advertising_update_scanresponse((uint8_t *)respPtr, respLen);

oh, but now I see it is there only in #ifdef for version 6!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants