View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0002704||Ham Radio Deluxe||Bug||public||2018-05-06 16:22||2018-10-01 18:28|
|Priority||normal||Severity||crash||Reproducibility||have not tried|
|Target Version||Fixed in Version||126.96.36.1998|
|Summary||0002704: DM780: User reports crash with dump file|
|Description||In this ticket: https://support.hamradiodeluxe.com/scp/tickets.php?id=18169|
a user has reported a crash in DM780 with an attached dump file.
|Steps To Reproduce||No repro steps were made available.|
|Tags||No tags attached.|
The minidump provided shows a crash in the thread that handles requests being sent to the Logbook (and the responses received). When such a request is queued, it is recorded in a list that implements the queue. Each element in the list is a pointer to a CBkgPacket object which holds the operation type and includes some parameters for the operation. In the case of a logbook request, one of the parameters is a pointer to an HRDLOG_DATA structure that contains information about the specific request for the Logbook application.
HRDLOG_DATA looks like it was written by someone who didn't have much C++ experience. HRDLOG_DATA is a structure which contains a mixture of C++ objects and plain data types; it has an implicit destructor, no accessors, and instead of using a constructor a macro (?!) is implemented that initializes the fields.
The crash is shown below; we see that it's in the destructor for HRDLOG_DATA.
0:007> kb *** Stack trace for last set context - .thread/.cxr resets it # ChildEBP RetAddr Args to Child 00 0c029b9c 01408fba 2f646c73 01d69a90 01d69a90 Digital_Master!CLogbookInterface::HRDLOG_DATA::~HRDLOG_DATA+0x9c 01 0c08bad4 0140b00c 2f642d4f 0e9f2f70 01d69a90 Digital_Master!CBackgroundProcessingThread::LogbookCommand+0x7ba [c:\ham radio\digital master\digital master\backgroundprocessingthreadlog.cpp @ 292] 02 0c08fbe8 0140a201 0e9f2f70 2f642d5f 00000000 Digital_Master!CBackgroundProcessingThread::ProcessData+0xdac [c:\ham radio\digital master\digital master\backgroundprocessingthreadpacket.cpp @ 1704] 03 0c08fc24 015a7ee5 2f642aeb 0a46c068 007e5bd0 Digital_Master!CBackgroundProcessingThread::DoWork+0xd1 [c:\ham radio\digital master\digital master\backgroundprocessingthreadpacket.cpp @ 1069] 04 0c08fc4c 015a7f76 2f642a2f 00000000 0a46c068 Digital_Master!CThinThread::Run+0x85 [c:\ham radio\digital master\digital master\thinthread.cpp @ 186] 05 0c08fc88 0170ff9a 01d69a90 2f642a67 00000000 Digital_Master!CThinThread::Start+0x46 [c:\ham radio\digital master\digital master\thinthread.cpp @ 236] 06 0c08fcc0 017100c2 00000000 0c08fcd8 76c5343d Digital_Master!_callthreadstartex+0x1b [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376] 07 0c08fccc 76c5343d 0a46c068 0c08fd18 77379832 Digital_Master!_threadstartex+0x7c [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354] 08 0c08fcd8 77379832 0a46c068 7b463c07 00000000 kernel32!BaseThreadInitThunk+0xe 09 0c08fd18 77379805 01710046 0a46c068 00000000 ntdll!__RtlUserThreadStart+0x70 0a 0c08fd30 00000000 01710046 0a46c068 00000000 ntdll!_RtlUserThreadStart+0x1b
The call to the destructor is from the LogbookCommand() function, which implements the performance of the Logbook Command operation. The work is already done at this point, so the HRDLOG_DATA object is deleted since it is no longer needed.
Code for the destructor call is here:
0:007> u Digital_Master!CBackgroundProcessingThread::LogbookCommand+0x7b3 [c:\ham radio\digital master\digital master\backgroundprocessingthreadlog.cpp @ 292]: 01408fb3 8bcf mov ecx,edi 01408fb5 e8a6f6ffff call Digital_Master!CLogbookInterface::HRDLOG_DATA::~HRDLOG_DATA (01408660
where we can see the address of the HRDLOG_DATA object is copied from the EDI register to the ECX register, the normal protocol for passing the this pointer to a function. At the time of the crash, we have these registers:
0:007> r Last set context: eax=ffffffff ebx=0e9f2f01 ecx=fffffffc edx=fffffff0 esi=ffffffff edi=003aeae8 eip=014086fc esp=0c029b80 ebp=0c029b9c iopl=0 nv up ei ng nz na pe cy cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010287 Digital_Master!CLogbookInterface::HRDLOG_DATA::~HRDLOG_DATA+0x9c: 014086fc f00fc101 lock xadd dword ptr [ecx],eax ds:002b:fffffffc=????????
so we can see the HRDLOG_DATA structure in use is at 003aeae8, and has this content:
0:007> dx -r1 ((Digital_Master!CLogbookInterface::HRDLOG_DATA *) 0x003aeae8) ((Digital_Master!CLogbookInterface::HRDLOG_DATA *) 0x003aeae8) : 0x3aeae8 [Type: CLogbookInterface::HRDLOG_DATA *] [+0x000] bCallsignWorked : 0 [Type: int] [+0x004] bConnected : 0 [Type: int] [+0x008] bCountryWorked : 123832288 [Type: int] [+0x00c] bSignal : 6 [Type: int] [+0x010] strBand [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >] [+0x014] strCallsign [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >] [+0x018] strCallsignLastQSO [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >] [+0x01c] strContinent [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >] [+0x020] strCountry [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >] [+0x024] strMode [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >] [+0x028] strValue [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >] [+0x02c] strClass [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >] [+0x030] astrBands [Type: CStringArray] [+0x044] astrDXCCs [Type: CStringArray] [+0x058] astrFields [Type: CStringArray] [+0x06c] astrModes [Type: CStringArray] [+0x080] astrValues [Type: CStringArray] [+0x094] astrCallsignModes [Type: CStringArray] [+0x0a8] anKeys [Type: CUIntArray] [+0x0bc] hEvent : 0x7610b80 [Type: void *] [+0x0c0] hReplyWnd : 0x0 [Type: HWND__ *] [+0x0c4] lpszData : 0x0 [Type: wchar_t *] [+0x0c8] lpszLog : 0x0 [Type: wchar_t *] [+0x0cc] szAddress : "" [Type: wchar_t ] [+0x2cc] szDatabase : "???" [Type: wchar_t ] [+0x4cc] szField : "--- memory read error at address 0x003aefb4 ---" [Type: wchar_t ] [+0x6cc] szValue : "--- memory read error at address 0x003af1b4 ---" [Type: wchar_t ] [+0x8cc] nCQZone : Unable to read memory at Address 0x3af3b4 [+0x8d0] nDXCC : Unable to read memory at Address 0x3af3b8 [+0x8d4] nITUZone : Unable to read memory at Address 0x3af3bc [+0x8d8] nReplyMsg : Unable to read memory at Address 0x3af3c0 [+0x8dc] wPort : Unable to read memory at Address 0x3af3c4 [+0x8e0] eOption : Unable to read memory at Address 0x3af3c8 [+0x8e4] nChannel : Unable to read memory at Address 0x3af3cc [+0x8e8] nCount : Unable to read memory at Address 0x3af3d0 [+0x8ec] nReturn : Unable to read memory at Address 0x3af3d4
The state of the object is curious; 12383288 doesn't seem like a particularly good value for a BOOL. This structure is very large; about 2288 bytes, as there are four TCHAR arrays of 256 bytes each. The data of the full structure is not in the minidump.
The crash is at a point in the dectructor where the CString objects in the HRDLOG_DATA object are being released. Becaues the dump doesn't include enough memory to show the state of those objects, we don't get too much insight into the state of those objects, making it hard to guess what might have gone wrong.
||The minidump is in the team's Google Drive at the Dumps folder with the name Mantis0002704.zip|
I spent the afternoon looking into 2704, which reports a crash in DM780. I’m not able to reproduce this crash, but I’m told by support it happens often. The issue has also been reported in a dump that was added to 2732, with the explanation that Logbook ends up hanging when it’s connected to DM780, and DM780 experiences the crash documented here.
The crash shows a problem surrounding the HRDLOG_DATA structure. An object of this structure appears to be freed twice, resulting in the crash we see in this bug (and in that one report in 2732). The memory in the structure being freed doesn’t seem to match the structure; we should see the partially deconstructed members even at the time of the crash, but we don’t.
As such, a double free of the structure might not be the right diagnosis. But it does seem mostly correct and worth pursuing.
In studying the code, I notice a problem in the management of the work items.
If a request is sent synchronously, it is posted to the queue with a handle to an event object in the HRDLOG_DATA structure. The caller resets that event, the working sets it when it is done working, and the caller waits. Once the caller has the signal, it knows the work is done and it can take ownership of the HRDLOG_DATA and its results.
The problem is that the event is shared; one event is created and used for all work items. If two work items are concurrently processing, the event is shared between them, and the state of the work becomes ambiguous. I have changed the code to create one event per synchronous HRDLOG_DATA invocation.
Since the event in question is not auto-reset, any number of waiters may be released when only one waiter is finished. An HRDLOG_DATA instance that isn’t ready might be in any state; uninitialized, not initialized enough to indicate that it shouldn’t be deleted, and so on.
This speculative path explains the crashes, and also explains the observation that the faulting object isn’t partially destructed; it might have never been completely constructed, in fact.
Without the ability to reproduce this issue directly, I’m obliged to make a fix only on static analysis, which is difficult. But it’s impossible to prove that the fix prescribed by the analysis actually addresses the user’s concern, since the fix cannot be validated.
This checkin was made:
||This fix shipped in 888|
|2018-05-06 16:22||K7ZCZ||New Issue|
|2018-05-06 16:23||K7ZCZ||Note Added: 0004967|
|2018-05-06 16:30||K7ZCZ||OS||Windows 10 Professional x64 =>|
|2018-05-06 16:30||K7ZCZ||OS Version||16299 =>|
|2018-05-06 16:30||K7ZCZ||Platform||Intel i7-5960X =>|
|2018-05-06 16:30||K7ZCZ||Product Version||=> 188.8.131.526|
|2018-05-06 17:22||K7ZCZ||Note Added: 0004969|
|2018-07-10 18:43||K7ZCZ||Relationship added||related to 0002732|
|2018-07-12 23:04||K7ZCZ||Note Added: 0005669|
|2018-07-12 23:09||K7ZCZ||Note Added: 0005670|
|2018-07-13 13:10||K7ZCZ||Assigned To||=> K7ZCZ|
|2018-07-13 13:10||K7ZCZ||Status||new => assigned|
|2018-09-18 09:31||K7ZCZ||Relationship added||related to 0002901|
|2018-09-30 17:49||K7ZCZ||Status||assigned => resolved|
|2018-09-30 17:49||K7ZCZ||Resolution||open => fixed|
|2018-09-30 17:49||K7ZCZ||Fixed in Version||=> 184.108.40.2068|
|2018-09-30 17:49||K7ZCZ||Note Added: 0006256|
|2018-10-01 18:28||WA9PIE||Status||resolved => closed|
|2018-10-01 18:28||WA9PIE||Testing||Not Started => Beta Successful|
|2018-10-01 18:28||WA9PIE||Project||3 - Current Dev List => Ham Radio Deluxe|