Bluetooth related question

  • Thread starter Thread starter Markus Humm
  • Start date Start date
M

Markus Humm

Hello,

we're currently implementing a device search for Bluetooth devices to
find our device in order to be able to open it. This device search
source code has been put in a C/C++ DLL to avoid all those P/Invoke
declarations in our VB.net compact framework application. So it might
not have to do much with CF, but since the code is called from VB.net
(CF 1.0) it's not entirely out of the equation.

Normally the code just works and gives us a list of all devices
currently visible on the location the PDA is. But I found out during
testing that it crashes with a 0xC0000005 exception if it's run in a
environment where many (up to a dozen and more) Bluetooth devices are
reachable. My developper then said he couldn't catch that exception
(says eVC 4.x can't do it) and we started further research by logging
what the DLL did into a file. To cut it short, we found out that the
crash happened after a WSALookupServiceNext call and further research
led us to the conclusion it must happen if a (null) (means empty) device
name is delivered. What we do not understand is who puts that (null)
text into our log as we don't do it (we've a empty string in this
situation).

Looking at all those examples on the internet didn't help much until my
developper noticed one example where a sleep was included in this
WSALookupServiceNext-Loop. All other examples didn't have that sleep.
So we included a 500ms sleep and now it works, even if it still returns
(null) names. So the main question is: why does it crash without that
sleep call? What goes on in the background? Obviously the (null) names
appear if a device can still answer the inquiry scan but not the "full
connect" being made to get the friednly name. I think mostly due to
getting out of range (e.g. mobile phone).

We then also found a MS example (with VS2005/2008) which would simply
list all devices. This example didn't use sleep but also didn't crash!
It seemd simply to leave out those (null) devices and investigation did
show that this are devices which sometimes do report a name and
sometimes don't. The Windows Mobile GUI in this case displays the
Bluetooth MAC address instead of the name. Nice trick!

I'm asking you now since I'd like to understand why our solution is one
or if there still lurks some bug in it (missed in my tests so far but
comming up later).

Here's our source used now:
/*
Bluetooth Device Discovery by Name
In WCHAR* myDeviceName
Out BTH_ADDR Bluetooth Device Address
*/
BTH_ADDR CMicrosoftStack::FindDeviceEBM(WCHAR* myDeviceName) {
BTH_ADDR retVal = 0;
WSAQUERYSET wsaQuerySet;
LPWSAQUERYSET pQuerySet;
HANDLE hLookup;
int nRet;
WCHAR szError[1024]; //Debug Message

//
// Initialize the query
//
memset(&wsaQuerySet, 0, sizeof(wsaQuerySet));

wsaQuerySet.dwSize = sizeof(WSAQUERYSET);
wsaQuerySet.dwNameSpace = NS_BTH;
wsaQuerySet.lpBlob = NULL;

// Start looking for devices
nRet = WSALookupServiceBegin(&wsaQuerySet,
LUP_CONTAINERS, &hLookup);

if (nRet == 0){
DebugMessage(_T("Success WSALookupServiceBegin"));
}else{
swprintf(szError,_T("Failed
WSALookupServiceBegin %d"),WSAGetLastError());
DebugMessage(szError);
}

if (nRet == 0){

union{
CHAR buf[5000];
SOCKADDR_BTH __unused;
};

for(int i = 0; i < 255; ++i)
{
pQuerySet = (LPWSAQUERYSET)buf;
DWORD dwLen = sizeof(buf);
memset(pQuerySet,0,sizeof(WSAQUERYSET));
pQuerySet->dwSize = sizeof(WSAQUERYSET);

pQuerySet->dwNameSpace = NS_BTH;
// NS_BTH for bluetooth queries
pQuerySet->lpBlob = NULL;

DWORD dwFlags =
LUP_RETURN_NAME | LUP_RETURN_ADDR;
// only name and address
nRet = WSALookupServiceNext(hLookup,
dwFlags, &dwLen, pQuerySet);

Sleep(500);

if(nRet != 0) // ERROR or WSA_E_NO_MORE
{

swprintf(szError,_T(
"WSALookupServiceNext :: Error
%d"),WSAGetLastError());
DebugMessage(szError);
break;
} else {

swprintf(szError,_T(
"WSALookupServiceNext :: Success
WSAGetLastError: %d"),WSAGetLastError());
DebugMessage(szError);
}

SOCKADDR_BTH *pBtAddr = (SOCKADDR_BTH*)
pQuerySet->lpcsaBuffer->RemoteAddr.lpSockaddr;
BT_ADDR btAddr = pBtAddr->btAddr;
swprintf(szError,_T("SOCKADDR_BTH: %x"),btAddr);
DebugMessage(szError);

swprintf(szError,_T("%s"),
pQuerySet->lpszServiceInstanceName);

DebugMessage(szError);
if (wcscmp(szError, myDeviceName) == 0){
retVal = btAddr; // found what we are
// looking for
break;
}


} // End For
}

// Cleanup
WSALookupServiceEnd(hLookup);

DebugMessage(_T("Exit"));

return retVal;
}

Summary of my questions:
1. why does it log (null) as name even if we don't assign it to out
log string? Who puts it there?
2. what goes on in the background of the MS stack implementation that a
sleep aviods a crash?
3. why does the above source crash with a general protection exception
sometimes (0xC000005 is the GPF) if the sleep is not done?

Greetings

Markus
 
Based on your observation I'd say tere's a race condition. WSALookup... is
returning/unblocking before pQuerySet is fully populated. The Sleep allows
it to get populated before you try reading it. That's just a guess of
course since I don't have the source for the WASLookup... stuff, but it
makes sense.

The reason it GPFs is because you're not doing parameter checking. You're
probably trying to call swprintf with a source address of NULL.


--

Chris Tacke, Embedded MVP
OpenNETCF Consulting
Giving back to the embedded community
http://community.OpenNETCF.com


Markus Humm said:
Hello,

we're currently implementing a device search for Bluetooth devices to find
our device in order to be able to open it. This device search source code
has been put in a C/C++ DLL to avoid all those P/Invoke
declarations in our VB.net compact framework application. So it might not
have to do much with CF, but since the code is called from VB.net (CF 1.0)
it's not entirely out of the equation.

Normally the code just works and gives us a list of all devices
currently visible on the location the PDA is. But I found out during
testing that it crashes with a 0xC0000005 exception if it's run in a
environment where many (up to a dozen and more) Bluetooth devices are
reachable. My developper then said he couldn't catch that exception
(says eVC 4.x can't do it) and we started further research by logging
what the DLL did into a file. To cut it short, we found out that the crash
happened after a WSALookupServiceNext call and further research led us to
the conclusion it must happen if a (null) (means empty) device name is
delivered. What we do not understand is who puts that (null) text into
our log as we don't do it (we've a empty string in this situation).

Looking at all those examples on the internet didn't help much until my
developper noticed one example where a sleep was included in this
WSALookupServiceNext-Loop. All other examples didn't have that sleep.
So we included a 500ms sleep and now it works, even if it still returns
(null) names. So the main question is: why does it crash without that
sleep call? What goes on in the background? Obviously the (null) names
appear if a device can still answer the inquiry scan but not the "full
connect" being made to get the friednly name. I think mostly due to
getting out of range (e.g. mobile phone).

We then also found a MS example (with VS2005/2008) which would simply
list all devices. This example didn't use sleep but also didn't crash!
It seemd simply to leave out those (null) devices and investigation did
show that this are devices which sometimes do report a name and
sometimes don't. The Windows Mobile GUI in this case displays the
Bluetooth MAC address instead of the name. Nice trick!

I'm asking you now since I'd like to understand why our solution is one or
if there still lurks some bug in it (missed in my tests so far but comming
up later).

Here's our source used now:
/*
Bluetooth Device Discovery by Name
In WCHAR* myDeviceName
Out BTH_ADDR Bluetooth Device Address
*/
BTH_ADDR CMicrosoftStack::FindDeviceEBM(WCHAR* myDeviceName) {
BTH_ADDR retVal = 0;
WSAQUERYSET wsaQuerySet;
LPWSAQUERYSET pQuerySet;
HANDLE hLookup;
int nRet;
WCHAR szError[1024]; //Debug Message

//
// Initialize the query
//
memset(&wsaQuerySet, 0, sizeof(wsaQuerySet));

wsaQuerySet.dwSize = sizeof(WSAQUERYSET);
wsaQuerySet.dwNameSpace = NS_BTH;
wsaQuerySet.lpBlob = NULL;

// Start looking for devices
nRet = WSALookupServiceBegin(&wsaQuerySet,
LUP_CONTAINERS, &hLookup);

if (nRet == 0){
DebugMessage(_T("Success WSALookupServiceBegin"));
}else{
swprintf(szError,_T("Failed
WSALookupServiceBegin %d"),WSAGetLastError());
DebugMessage(szError);
}

if (nRet == 0){

union{
CHAR buf[5000];
SOCKADDR_BTH __unused;
};

for(int i = 0; i < 255; ++i)
{
pQuerySet = (LPWSAQUERYSET)buf;
DWORD dwLen = sizeof(buf);
memset(pQuerySet,0,sizeof(WSAQUERYSET));
pQuerySet->dwSize = sizeof(WSAQUERYSET);

pQuerySet->dwNameSpace = NS_BTH;
// NS_BTH for bluetooth queries
pQuerySet->lpBlob = NULL;

DWORD dwFlags =
LUP_RETURN_NAME | LUP_RETURN_ADDR;
// only name and address
nRet = WSALookupServiceNext(hLookup,
dwFlags, &dwLen, pQuerySet);

Sleep(500);

if(nRet != 0) // ERROR or WSA_E_NO_MORE
{

swprintf(szError,_T(
"WSALookupServiceNext :: Error
%d"),WSAGetLastError());
DebugMessage(szError);
break;
} else {

swprintf(szError,_T(
"WSALookupServiceNext :: Success
WSAGetLastError: %d"),WSAGetLastError());
DebugMessage(szError);
}

SOCKADDR_BTH *pBtAddr = (SOCKADDR_BTH*)
pQuerySet->lpcsaBuffer->RemoteAddr.lpSockaddr;
BT_ADDR btAddr = pBtAddr->btAddr;
swprintf(szError,_T("SOCKADDR_BTH: %x"),btAddr);
DebugMessage(szError);

swprintf(szError,_T("%s"),
pQuerySet->lpszServiceInstanceName);

DebugMessage(szError);
if (wcscmp(szError, myDeviceName) == 0){
retVal = btAddr; // found what we are
// looking for
break;
}


} // End For
}

// Cleanup
WSALookupServiceEnd(hLookup);

DebugMessage(_T("Exit"));

return retVal;
}

Summary of my questions:
1. why does it log (null) as name even if we don't assign it to out
log string? Who puts it there?
2. what goes on in the background of the MS stack implementation that a
sleep aviods a crash?
3. why does the above source crash with a general protection exception
sometimes (0xC000005 is the GPF) if the sleep is not done?

Greetings

Markus
 
Based on your observation I'd say tere's a race condition. WSALookup... is
returning/unblocking before pQuerySet is fully populated. The Sleep allows
it to get populated before you try reading it. That's just a guess of
course since I don't have the source for the WASLookup... stuff, but it
makes sense.

Yes that makes some sense, but: how to know when it is save to use those
values returned? Means: how to avoid this race condition?
The reason it GPFs is because you're not doing parameter checking. You're
probably trying to call swprintf with a source address of NULL.

Aha. No, that's wrong. We did include those printfs as we observed such
crashes, so they're a reacion on these crashes to find out what crashed.
After including the sleep it doesn't crash anymore but we're wondering
since only one other sample code on internet had a sleep (shorter than
ours!).

Is there really no way to catch such exceptions in C/C++?

Greetings

Markus
 
You can wrap with __try/__except, but WinMo doesn't have SEH.

Are you saying the exception is happeining outside your code? So the
exception occurs and then you have nothing in the name (that's expected
since nothing after the exception is valid any more). I'm not sure what
line in your code is the one that's executing when things go sideways on
you.


--

Chris Tacke, Embedded MVP
OpenNETCF Consulting
Giving back to the embedded community
http://community.OpenNETCF.com
 
You can wrap with __try/__except, but WinMo doesn't have SEH.

Are you saying the exception is happeining outside your code? So the
exception occurs and then you have nothing in the name (that's expected
since nothing after the exception is valid any more). I'm not sure what
line in your code is the one that's executing when things go sideways on
you.

Hello,

the exception happening is a 0xc0000005 which is the general
protection fault. Our application then crashes.

Greetings

Markus
 
Hello,

another thing we observed is: these crashes only happen if a device is
scanned while it goes out of range, means: it does answer the normal
inquiry scan (returning MAC and Class of device) but when the MS Stack
tries to get the friendly name (via a full connect) the device is
already out of reach, this fails and a (null) name is returned to us
and then the exception is raised.

Even if we try to catch it on the manages side it crashes our
application. I'd simply like to get rid of this crash! At this time
nothing more and nothing less.

Greetings

Markus
 
Hello,

further investigation led us to the inclusion that even a
pQuerySet->lpszServiceInstanceName = "(null)"
to check whether the name is not there will crash.

Is there any method available to check before access to the
lpszServiceInstanceName if this is "assigned"?
I'm thinking along Delphi code: if not
assigned(lpszServiceInstanceName) then continue
or something in that direction.

Any clue? Searching the internet doesn't turn up somebody else with
that problem,
so we must do something wrong. But what?

Greetings

Markus
 
Sounds like pQuerySet is an invalid pointer. When you dereference it to
check lpszServiceInstanceName, you're dereferencing an invalid pointer,
hence the crash.

Paul T.
 
Hello,

sounds logical and IsBadPtr seems to cure it, it only raises the
question why does the MS BT stack
give us such invalid pointers? If a device can't answer on a complete
connect (to be able to obtain the friendly name) if should simply
return some empty string or so, but not something which will lead to a
crash if dereferenced.

Greetings

Markus
 
Hello,

some further insight:
- my developper claims that this crash only happens when being called
from the managed side
- and the documentation on MSDN seems to be incomplete and sometimes
buggy, any chances to correct this?

Greetings

Markus
 
Hmmm. Maybe true, but doubt that it means anything.

If the documentation is wrong, use the Send Feedback link on the page that's
wrong and tell them what's wrong.

Paul T.
 
If it's happening only when called from managed code we'd need to see more
on the context of how it's called. The only way that would happen that I
can think of is if you're passing an unpinned pointer from managed to native
and then the GC does a compaction, invalidating that pointer, and then the
native side tries to use it. It doesn't look like your code does that, so
I'm dubious.


--

Chris Tacke, Embedded MVP
OpenNETCF Consulting
Giving back to the embedded community
http://community.OpenNETCF.com
 
If it's happening only when called from managed code we'd need to see more
on the context of how it's called. The only way that would happen that I
can think of is if you're passing an unpinned pointer from managed to native
and then the GC does a compaction, invalidating that pointer, and then the
native side tries to use it. It doesn't look like your code does that, so
I'm dubious.
Hello,

my programmer yesterday did wrap this on the managed side as recommended
and took care that the GC doesn't interfere with it but had no success.
But: your solution with IsBadPtr is really great. First tests are
positive more tests will be done tomorrow. Thanks! If we only had known
that function erlier...

Greetings

Markus
 
Back
Top