Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bthrm] Various fixes #1655

Merged
merged 16 commits into from May 24, 2022
Merged

Conversation

GrandVizierOlaf
Copy link
Contributor

After opening espruino/Espruino#2178 and getting great support from @gfwilliams I spent a while tinkering with the bthrm app and found some issues with reading values versus handling events. This sorts that out and handles some housekeeping like var not being specified and whitespace.

I'm marking this draft until we sort out !2178 so we know if active: true is necessary, but am also happy to split that out into a separate PR if the other changes would be beneficial in the meantime.

@gfwilliams
Copy link
Member

Thanks! As I just mentioned in espruino/Espruino#2178 I really think it'd be a massive improvement to just filter based on device ID rather than name (in crowded BT environments, scan response doesn't always work).

You can then use NRF.connect if you know the device ID (which will be faster than doing a findDevice)

In createMenuFromScan in settings you could then just do an active NRF.findDevices based on service UUID, and then name the item with: dev.name||dev.id.substr(0,17).

That way it'll work on existing 2v12 devices (but will just show the MAC address) but when users update to latest firmware, the actual device name will be shown.

@GrandVizierOlaf
Copy link
Contributor Author

@gfwilliams I've implemented the changes you described along with a little more cleanup and storing the btid while still keeping btname so it can be displayed to the user later. I've tested it on the latest version of Espruino and am going to start testing on 2v12, but I wanted to get your opinion on one more change.

I think it's worth removing the serviceFilters scan from boot.js altogether now that the logic to find devices is more robust. This will make boot significantly less chatty, but will require users to have set a device in the BTHRM app, which is not the previous behavior. I think the decreased noise is worth the breaking change, but wanted to get your opinion before I remove that logic.

@GrandVizierOlaf
Copy link
Contributor Author

I've run into a potential problem with the proposed implementation, but I'm not sure how serious it is; when trying to connect by ID it will fail when the ID is of any type but public (the four types are described here and here) and the BLE device has changed the ID (for random the spec says this will not happen until at least a reboot of the BLE device). This was not happening when connecting by name because the names are less likely to randomize.

For my device it will not be a problem until it is time to replace the battery because it is always on, but for others I could see this being a problem if they change the ID every reboot, but are not always on or use one of the ID schemes that change more frequently. It would be good to know if this is a common occurrence or not before making this change. If it is not then we are probably ok to push on, but otherwise I believe the solution is to keep the initial settings.js connection alone and then when it connects via gatt in boot.js it can bond and somehow store the bonding information for later.

Separately, it might be helpful to add connected_addr to BluetoothRemoteGATTServer.getSecurityStatus as it has already been added to NRF.getSecurityStatus for the latest Espruino. I assume it would still return the random address, but if it does expose the reconnectable one, that might be the path forward, though that would have 2v12 implications if so.

@gfwilliams
Copy link
Member

I think it's worth removing the serviceFilters scan from boot.js

Yes, that sounds good to me. Maybe just mention in the README you have to configure the HRM you're using first?

it will fail when the ID is of any type but public and the BLE device has changed the ID

Have you actually experienced this? My understanding was that 'random' type addresses didn't ever really change. It was more that public addresses were guaranteed to be globally unique, but there might be more than one device in the world with the same random MAC address (although the chances of two the same in the same place is vanishingly small)

@GrandVizierOlaf
Copy link
Contributor Author

On mine if you pull the battery and put it back it will randomize the address. This would only be needed when you are going to replace the battery and on this device it shouldn't be needed for a couple hundred hours of use. It is a significant enough event that you need to go through initial setup of the device again.

I'm most specifically worried about Polar's other type of sensors that are worn on the wrist and require powering on and off between use, such as the OH1. They might use a public address or hold onto a random address, but that's a guess and I'd hate to introduce breaking changes with a guess.

Someone on the forum mentioned having an OH1 in the run app review thread so I'll see if they can help with getting some more data points.

@gfwilliams
Copy link
Member

Someone on the forum mentioned having an OH1 in the run app review thread

That's be great. Honestly, having the address change feels like a very odd quirk.

Operating systems like Linux (and so probably Android too) seem to store all the pairing info based on MAC address - so when the device changes address you'd have to redo everything.

@GrandVizierOlaf
Copy link
Contributor Author

Given what Mi tested in the forums, I think we're good on the ID issue.

@gfwilliams One final question for you before I clean up this PR; are you ok if I enable bonding as part of the connection process? I've tried already and run into issues (to the point where the bangle decided storage was corrupt and wiped itself during a run yesterday), but think the security/reconnection benefits are worth the effort to get it working, unless you say otherwise.

@gfwilliams
Copy link
Member

to the point where the bangle decided storage was corrupt and wiped itself during a run yesterday

That's odd - it may be that's not actually related though?

but think the security/reconnection benefits are worth the effort to get it working, unless you say otherwise.

Can you elaborate on what the reconnection benefits are? Personally I'd say it's not worth adding (or if you do it should be a default-off option... maybe just a single button in the settings menu that triggers bonding is enough?). Problem IMO is if a passkey is required on some devices and you don't implement that, I think you've then just made them unusable.

In terms of security, what confidential information are you concerned about? Heart rate? Pretty sure I've seen some examples of people reading heartrate just using the image from a video camera, so unless you're going to run around with your face covered I think there are other, easier ways to get heart rate than trying to bluetooth sniff :)

@GrandVizierOlaf
Copy link
Contributor Author

To the first point about it deciding storage was corrupt, adding a call to startBonding and waiting for its promise was causing the retry mechanism to fire, regardless of how high I was setting the retry timeout. This caused some sort of resource consumption to where the only thing I could do was reboot the watch, but since the BTHRM code was firing on boot it would end up right back there. Since I was already on a run I left it alone to debug when I got home and it eventually gave me the message about storage being corrupt and came back reset. Once I get my watch reflashed I'll do some more debugging.

I'll preface the next two paragraphs that I totally agree that bonding shouldn't default to on for BTHRM, but I would like to provide the option.

To the point about reconnecting, during all my reading about BLE the last couple of days I keep seeing references to bonds storing keys that allow for reconnecting without scans or peripheral advertising. I think the terminology I should have been using is "direct advertising" which is dependent on the peripheral implementation so that's not as likely to be beneficial as I'd thought.

To the point about security, that's just my background wanting everything that can be secured to be so. There is no particular BTHRM information that I really care if anyone sees in practice right now (I was posting my local BT scans to the internet with all kinds of information), but one of my reasons for getting a bangle was to be able to tighten down things that I would otherwise say are too permissive or ready to reach out over a network. That said, my HRM has BLE characteristics enabled to return my name, age, weight, height, etc and while it doesn't store that today it has the capability to store training information and all it would take is Polar flipping a switch out of my control to have that more sensitive information available to anyone I run by and my perfect world is one in which the watch and HRM and known only to each other and there is no unencrypted broadcasting. I acknowledge that this is overly paranoid, but that's the work I do and tinkering with stuff like this helps me there too.

@gfwilliams
Copy link
Member

my HRM has BLE characteristics enabled to return my name, age, weight, height, etc

Wow, ok - yes, that makes a lot more sense.

I'm 100% behind having the option for bonding - I'd just leave it off by default so it 'just works' for as many people as possible :)

@GrandVizierOlaf
Copy link
Contributor Author

GrandVizierOlaf commented Apr 8, 2022

I haven't been able to add the bonding code yet because I've been dealing with a memory leak issue that I'm hoping I can get some more eyes on. If I don't have my HRM on the watch will continue to search and search and eventually run out of memory with lots of FIFO_FULL errors in the meantime. I'm not sure if it is memory that could be GCed and it is using the resources that GC would need or if it is memory that can't be GCed. It's exacerbated by unlocking the watch on the clock and then navigating to the BT HRM app which causes the boot.js code to run multiple times and there are conflicting BT connection/read/disconnection errors while they step on each other.

When running my pushed version of the app I see many definitions of Bangle.isBTHRMConnected in the output of trace(). This might be due to using function expressions rather than declarations, but I modified them based on what the Web IDE was telling me.

I loaded the released version of the app from the official app loader last night and had the same issues, but I find it hard to believe that no one else has seen this so I'm trying to figure out what I'm doing differently. I still need to run trace() with the official version of the app, but won't be able to until later.

My preference is to fix this issue, but I would also like to add an option for "stop trying after X connection failures" where the user would have to go and re-establish the connection when their BTHRM is available again. I assume most people are not wearing their monitors continuously and this option would allow for reduced scanning even in a memory-leak-free environment.

@gfwilliams
Copy link
Member

You could give https://github.com/espruino/EspruinoMemView to try and give you a better idea of what's happening with memory? trace may well show things twice as it'll show what's currently in RAM, plus the contents of .boot0

The 'X connections' thing sounds like a good idea - although I guess scanning for ~2 secs each time an app changes probably isn't too bad.

Do you think these changes might be in a position to merge?

@GrandVizierOlaf
Copy link
Contributor Author

Unfortunately these changes aren't ready to merge; when using BTHRM I am unable to get a GPS fix at the same time (making it incompatible with the main benefits of the run app for me) and I think it's due to the memory issue. I've been busy with other things, but will check out EspruinoMemView in the next few days to try and get this back on track.

@GrandVizierOlaf
Copy link
Contributor Author

This isn't very scientific, but going into the BTHRM app with an HRM unavailable to connect, this is what happens when running process.memory(false) while letting it search for about 5 minutes.

>process.memory(false)
={ free: 353, usage: 11647, total: 12000, history: 9,
  blocksize: 15, stackEndAddress: 537087624, flash_start: 0, flash_binary_end: 598168, flash_code_start: 1610612736,
  flash_length: 1048576 }
2022-04-19T21:50:47.027Z - Error: "No device found matching filters"
2022-04-19T21:50:47.030Z - Disconnect: No device found matching filters
2022-04-19T21:50:47.033Z - GATT:
2022-04-19T21:50:47.035Z - Characteristics:  []
2022-04-19T21:50:47.042Z - Retry
2022-04-19T21:50:47.086Z - Set timeout for retry as 100
2022-04-19T21:50:47.191Z - Retrying
2022-04-19T21:50:47.193Z - initBt with blockInit: false
2022-04-19T21:50:47.197Z - Configured device id "d2:7b:18:2b:cf:f3 random"
2022-04-19T21:50:47.201Z - Requesting device with filters [{"id":"d2:7b:18:2b:cf:f3 random"}]
>process.memory(false)
={ free: 204, usage: 11796, total: 12000, history: 9,
  blocksize: 15, stackEndAddress: 537087624, flash_start: 0, flash_binary_end: 598168, flash_code_start: 1610612736,
  flash_length: 1048576 }
2022-04-19T21:50:49.343Z - Error: "No device found matching filters"
2022-04-19T21:50:49.352Z - Disconnect: No device found matching filters
2022-04-19T21:50:49.354Z - GATT:
2022-04-19T21:50:49.357Z - Characteristics:  []
2022-04-19T21:50:49.364Z - Retry
2022-04-19T21:50:49.367Z - Set timeout for retry as 100
2022-04-19T21:50:49.472Z - Retrying
2022-04-19T21:50:49.475Z - initBt with blockInit: false
2022-04-19T21:50:49.481Z - Configured device id "d2:7b:18:2b:cf:f3 random"
2022-04-19T21:50:49.485Z - Requesting device with filters [{"id":"d2:7b:18:2b:cf:f3 random"}]
>process.memory(false)
={ free: 171, usage: 11829, total: 12000, history: 9,
  blocksize: 15, stackEndAddress: 537087624, flash_start: 0, flash_binary_end: 598168, flash_code_start: 1610612736,
  flash_length: 1048576 }
2022-04-19T21:50:51.521Z - Error: "No device found matching filters"
2022-04-19T21:50:51.525Z - Disconnect: No device found matching filters
2022-04-19T21:50:51.527Z - GATT:
2022-04-19T21:50:51.530Z - Characteristics:  []
2022-04-19T21:50:51.537Z - Retry
2022-04-19T21:50:51.540Z - Set timeout for retry as 100
2022-04-19T21:50:51.646Z - Retrying
2022-04-19T21:50:51.648Z - initBt with blockInit: false
2022-04-19T21:50:51.652Z - Configured device id "d2:7b:18:2b:cf:f3 random"
2022-04-19T21:50:51.655Z - Requesting device with filters [{"id":"d2:7b:18:2b:cf:f3 random"}]
>process.memory(false)
={ free: 125, usage: 11875, total: 12000, history: 9,
  blocksize: 15, stackEndAddress: 537087624, flash_start: 0, flash_binary_end: 598168, flash_code_start: 1610612736,
  flash_length: 1048576 }
>process.memory(false)
={ free: 96, usage: 11904, total: 12000, history: 9,
  blocksize: 15, stackEndAddress: 537087624, flash_start: 0, flash_binary_end: 598168, flash_code_start: 1610612736,
  flash_length: 1048576 }
2022-04-19T21:50:53.749Z - Error: "No device found matching filters"
2022-04-19T21:50:53.753Z - Disconnect: No device found matching filters
2022-04-19T21:50:53.761Z - GATT:
2022-04-19T21:50:53.763Z - Characteristics:  []
2022-04-19T21:50:53.771Z - Retry
2022-04-19T21:50:53.774Z - Set timeout for retry as 100
2022-04-19T21:50:53.883Z - Retrying
2022-04-19T21:50:53.886Z - initBt with blockInit: false
2022-04-19T21:50:53.889Z - Configured device id "d2:7b:18:2b:cf:f3 random"
2022-04-19T21:50:53.893Z - Requesting device with filters [{"id":"d2:7b:18:2b:cf:f3 random"}]
>process.memory(false)
={ free: 35, usage: 11965, total: 12000, history: 4,
  blocksize: 15, stackEndAddress: 537087624, flash_start: 0, flash_binary_end: 598168, flash_code_start: 1610612736,
  flash_length: 1048576 }
Execution Interrupted during event processing.
Interpreter error: [
  "MEMORY"
 ]
New interpreter error: MEMORY
Execution Interrupted
New interpreter error:
ERROR: Ctrl-C while processing interval - removing it.
Execution Interrupted during event processing.
Interpreter error: [
  "FIFO_FULL",
  "CALLBACK"
 ]
New interpreter error: FIFO_FULL,CALLBACK

I did find one bug where retryTime was being set as a multiple of itself, rather than clampedTime, which is why it is always retrying at 100ms instead of backing off, but it seems like that fix will just make the memory leak take hours instead of minutes.

@gfwilliams
Copy link
Member

Just checking - I see a few more commits - is this ready to merge?

@GrandVizierOlaf
Copy link
Contributor Author

@gfwilliams it mostly works, but there is still the problem that if a device can't be found then the watch will run out of memory. When I go for a run I need to turn on my BTHRM then I switch the mode in the app from "off" to "default", then let it connect. When I'm done I make sure to revert the app to "off" before I turn the BTHRM off, but none of that is very user friendly. I would still prefer some solution where it handles this automatically and gracefully, but have not been able to address it yet.

@gfwilliams
Copy link
Member

Ok, thanks - I'll merge now then as it seems from the issue #1859 the bug was there before :)

At least any new fixes can start from your most recent code

@gfwilliams gfwilliams marked this pull request as ready for review May 24, 2022 15:26
@gfwilliams gfwilliams merged commit e517262 into espruino:master May 24, 2022
@halemmerich
Copy link
Collaborator

I have done a short test using the memory visualizer with about 10 minutes between left and right pane:
BTHRM 0.08:
grafik
BTHRM 0.09:
grafik
It seems to me that remaining memory leaking is a lot reduced if any is left. This is with uploading only Bangle.setBTHRMPower(1) as code to RAM to the watch. So essentially only stuff from my .boot is running. BTHRM is configured to default mode.

@gfwilliams
Copy link
Member

Great! Thanks for digging into that. It's hard to be sure but it seems there's very little difference in 0.09 now - I'd call this fixed :)

The 'floating' bits are interesting though. That almost looks like it's a memory leak in bangle.js

@gfwilliams
Copy link
Member

gfwilliams commented May 26, 2022

Thanks again for this - I just looked into this further, with 0.08, and narrowed it down to a bug in Espruino.

It seems in 2v13 there ended up being a regression which meant that every bluetooth address ever received ended up leaked. It's fixed now in cutting edge builds!

Fixed with espruino/Espruino@b7ebc81

@halemmerich
Copy link
Collaborator

Great! That seems to fix a lot of memory problems I had seen with several apps/use cases while BTHRM was active.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants