View Issue Details

IDProjectCategoryView StatusLast Update
0002704Ham Radio DeluxeBugpublic2018-10-01 18:28
ReporterK7ZCZ 
Assigned ToK7ZCZ 
PrioritynormalSeveritycrashReproducibilityhave not tried
Status closedResolutionfixed 
Product Version6.4.0.806 
Target VersionFixed in Version6.4.0.888 
Summary0002704: DM780: User reports crash with dump file
DescriptionIn this ticket: https://support.hamradiodeluxe.com/scp/tickets.php?id=18169

a user has reported a crash in DM780 with an attached dump file.
Steps To Reproduce No repro steps were made available.
TagsNo tags attached.
ModuleDM780
Sub-ModuleCall lookup
Testing Beta Successful

Relationships

related to 0002901 closedK7ZCZ Ham Radio Deluxe Digital Master: crash when freeing HRDLOG_DATA structure 
related to 0002732 closedPD9FER 4 - Closed w/o Action Logbook crashing when idle 

Activities

K7ZCZ

2018-05-06 16:23

manager   ~0004967

The minidump provided shows a crash in the thread that handles requests being sent to the Logbook (and the responses received). When such a request is queued, it is recorded in a list that implements the queue. Each element in the list is a pointer to a CBkgPacket object which holds the operation type and includes some parameters for the operation. In the case of a logbook request, one of the parameters is a pointer to an HRDLOG_DATA structure that contains information about the specific request for the Logbook application.

HRDLOG_DATA looks like it was written by someone who didn't have much C++ experience. HRDLOG_DATA is a structure which contains a mixture of C++ objects and plain data types; it has an implicit destructor, no accessors, and instead of using a constructor a macro (?!) is implemented that initializes the fields.

The crash is shown below; we see that it's in the destructor for HRDLOG_DATA.

0:007> kb
  *** Stack trace for last set context - .thread/.cxr resets it
 # ChildEBP RetAddr  Args to Child              
00 0c029b9c 01408fba 2f646c73 01d69a90 01d69a90 Digital_Master!CLogbookInterface::HRDLOG_DATA::~HRDLOG_DATA+0x9c
01 0c08bad4 0140b00c 2f642d4f 0e9f2f70 01d69a90 Digital_Master!CBackgroundProcessingThread::LogbookCommand+0x7ba [c:\ham radio\digital master\digital master\backgroundprocessingthreadlog.cpp @ 292] 
02 0c08fbe8 0140a201 0e9f2f70 2f642d5f 00000000 Digital_Master!CBackgroundProcessingThread::ProcessData+0xdac [c:\ham radio\digital master\digital master\backgroundprocessingthreadpacket.cpp @ 1704] 
03 0c08fc24 015a7ee5 2f642aeb 0a46c068 007e5bd0 Digital_Master!CBackgroundProcessingThread::DoWork+0xd1 [c:\ham radio\digital master\digital master\backgroundprocessingthreadpacket.cpp @ 1069] 
04 0c08fc4c 015a7f76 2f642a2f 00000000 0a46c068 Digital_Master!CThinThread::Run+0x85 [c:\ham radio\digital master\digital master\thinthread.cpp @ 186] 
05 0c08fc88 0170ff9a 01d69a90 2f642a67 00000000 Digital_Master!CThinThread::Start+0x46 [c:\ham radio\digital master\digital master\thinthread.cpp @ 236] 
06 0c08fcc0 017100c2 00000000 0c08fcd8 76c5343d Digital_Master!_callthreadstartex+0x1b [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376] 
07 0c08fccc 76c5343d 0a46c068 0c08fd18 77379832 Digital_Master!_threadstartex+0x7c [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354] 
08 0c08fcd8 77379832 0a46c068 7b463c07 00000000 kernel32!BaseThreadInitThunk+0xe
09 0c08fd18 77379805 01710046 0a46c068 00000000 ntdll!__RtlUserThreadStart+0x70
0a 0c08fd30 00000000 01710046 0a46c068 00000000 ntdll!_RtlUserThreadStart+0x1b


The call to the destructor is from the LogbookCommand() function, which implements the performance of the Logbook Command operation. The work is already done at this point, so the HRDLOG_DATA object is deleted since it is no longer needed.

Code for the destructor call is here:


0:007> u
Digital_Master!CBackgroundProcessingThread::LogbookCommand+0x7b3 [c:\ham radio\digital master\digital master\backgroundprocessingthreadlog.cpp @ 292]:
01408fb3 8bcf            mov     ecx,edi
01408fb5 e8a6f6ffff      call    Digital_Master!CLogbookInterface::HRDLOG_DATA::~HRDLOG_DATA (01408660


where we can see the address of the HRDLOG_DATA object is copied from the EDI register to the ECX register, the normal protocol for passing the this pointer to a function. At the time of the crash, we have these registers:


0:007> r
Last set context:
eax=ffffffff ebx=0e9f2f01 ecx=fffffffc edx=fffffff0 esi=ffffffff edi=003aeae8
eip=014086fc esp=0c029b80 ebp=0c029b9c iopl=0         nv up ei ng nz na pe cy
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010287
Digital_Master!CLogbookInterface::HRDLOG_DATA::~HRDLOG_DATA+0x9c:
014086fc f00fc101        lock xadd dword ptr [ecx],eax ds:002b:fffffffc=????????


so we can see the HRDLOG_DATA structure in use is at 003aeae8, and has this content:

0:007> dx -r1 ((Digital_Master!CLogbookInterface::HRDLOG_DATA *) 0x003aeae8)
((Digital_Master!CLogbookInterface::HRDLOG_DATA *) 0x003aeae8)                 : 0x3aeae8 [Type: CLogbookInterface::HRDLOG_DATA *]
    [+0x000] bCallsignWorked  : 0 [Type: int]
    [+0x004] bConnected       : 0 [Type: int]
    [+0x008] bCountryWorked   : 123832288 [Type: int]
    [+0x00c] bSignal          : 6 [Type: int]
    [+0x010] strBand          [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >]
    [+0x014] strCallsign      [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >]
    [+0x018] strCallsignLastQSO [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >]
    [+0x01c] strContinent     [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >]
    [+0x020] strCountry       [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >]
    [+0x024] strMode          [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >]
    [+0x028] strValue         [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >]
    [+0x02c] strClass         [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >]
    [+0x030] astrBands        [Type: CStringArray]
    [+0x044] astrDXCCs        [Type: CStringArray]
    [+0x058] astrFields       [Type: CStringArray]
    [+0x06c] astrModes        [Type: CStringArray]
    [+0x080] astrValues       [Type: CStringArray]
    [+0x094] astrCallsignModes [Type: CStringArray]
    [+0x0a8] anKeys           [Type: CUIntArray]
    [+0x0bc] hEvent           : 0x7610b80 [Type: void *]
    [+0x0c0] hReplyWnd        : 0x0 [Type: HWND__ *]
    [+0x0c4] lpszData         : 0x0 [Type: wchar_t *]
    [+0x0c8] lpszLog          : 0x0 [Type: wchar_t *]
    [+0x0cc] szAddress        : "" [Type: wchar_t [256]]
    [+0x2cc] szDatabase       : "???" [Type: wchar_t [256]]
    [+0x4cc] szField          : "--- memory read error at address 0x003aefb4 ---" [Type: wchar_t [256]]
    [+0x6cc] szValue          : "--- memory read error at address 0x003af1b4 ---" [Type: wchar_t [256]]
    [+0x8cc] nCQZone          : Unable to read memory at Address 0x3af3b4
    [+0x8d0] nDXCC            : Unable to read memory at Address 0x3af3b8
    [+0x8d4] nITUZone         : Unable to read memory at Address 0x3af3bc
    [+0x8d8] nReplyMsg        : Unable to read memory at Address 0x3af3c0
    [+0x8dc] wPort            : Unable to read memory at Address 0x3af3c4
    [+0x8e0] eOption          : Unable to read memory at Address 0x3af3c8
    [+0x8e4] nChannel         : Unable to read memory at Address 0x3af3cc
    [+0x8e8] nCount           : Unable to read memory at Address 0x3af3d0
    [+0x8ec] nReturn          : Unable to read memory at Address 0x3af3d4


    
The state of the object is curious; 12383288 doesn't seem like a particularly good value for a BOOL. This structure is very large; about 2288 bytes, as there are four TCHAR arrays of 256 bytes each. The data of the full structure is not in the minidump.

The crash is at a point in the dectructor where the CString objects in the HRDLOG_DATA object are being released. Becaues the dump doesn't include enough memory to show the state of those objects, we don't get too much insight into the state of those objects, making it hard to guess what might have gone wrong.

K7ZCZ

2018-05-06 17:22

manager   ~0004969

The minidump is in the team's Google Drive at the Dumps folder with the name Mantis0002704.zip

K7ZCZ

2018-07-12 23:04

manager   ~0005669

I spent the afternoon looking into 2704, which reports a crash in DM780. I’m not able to reproduce this crash, but I’m told by support it happens often. The issue has also been reported in a dump that was added to 2732, with the explanation that Logbook ends up hanging when it’s connected to DM780, and DM780 experiences the crash documented here.

The crash shows a problem surrounding the HRDLOG_DATA structure. An object of this structure appears to be freed twice, resulting in the crash we see in this bug (and in that one report in 2732). The memory in the structure being freed doesn’t seem to match the structure; we should see the partially deconstructed members even at the time of the crash, but we don’t.

As such, a double free of the structure might not be the right diagnosis. But it does seem mostly correct and worth pursuing.

In studying the code, I notice a problem in the management of the work items.

If a request is sent synchronously, it is posted to the queue with a handle to an event object in the HRDLOG_DATA structure. The caller resets that event, the working sets it when it is done working, and the caller waits. Once the caller has the signal, it knows the work is done and it can take ownership of the HRDLOG_DATA and its results.

The problem is that the event is shared; one event is created and used for all work items. If two work items are concurrently processing, the event is shared between them, and the state of the work becomes ambiguous. I have changed the code to create one event per synchronous HRDLOG_DATA invocation.

Since the event in question is not auto-reset, any number of waiters may be released when only one waiter is finished. An HRDLOG_DATA instance that isn’t ready might be in any state; uninitialized, not initialized enough to indicate that it shouldn’t be deleted, and so on.

This speculative path explains the crashes, and also explains the observation that the faulting object isn’t partially destructed; it might have never been completely constructed, in fact.
Without the ability to reproduce this issue directly, I’m obliged to make a fix only on static analysis, which is difficult. But it’s impossible to prove that the fix prescribed by the analysis actually addresses the user’s concern, since the fix cannot be validated.

K7ZCZ

2018-07-12 23:09

manager   ~0005670

This checkin was made:
https://hrdsoftware.visualstudio.com/HRD/_versionControl/changeset/4228

K7ZCZ

2018-09-30 17:49

manager   ~0006256

This fix shipped in 888

Issue History

Date Modified Username Field Change
2018-05-06 16:22 K7ZCZ New Issue
2018-05-06 16:23 K7ZCZ Note Added: 0004967
2018-05-06 16:30 K7ZCZ OS Windows 10 Professional x64 =>
2018-05-06 16:30 K7ZCZ OS Version 16299 =>
2018-05-06 16:30 K7ZCZ Platform Intel i7-5960X =>
2018-05-06 16:30 K7ZCZ Product Version => 6.4.0.806
2018-05-06 17:22 K7ZCZ Note Added: 0004969
2018-07-10 18:43 K7ZCZ Relationship added related to 0002732
2018-07-12 23:04 K7ZCZ Note Added: 0005669
2018-07-12 23:09 K7ZCZ Note Added: 0005670
2018-07-13 13:10 K7ZCZ Assigned To => K7ZCZ
2018-07-13 13:10 K7ZCZ Status new => assigned
2018-09-18 09:31 K7ZCZ Relationship added related to 0002901
2018-09-30 17:49 K7ZCZ Status assigned => resolved
2018-09-30 17:49 K7ZCZ Resolution open => fixed
2018-09-30 17:49 K7ZCZ Fixed in Version => 6.4.0.888
2018-09-30 17:49 K7ZCZ Note Added: 0006256
2018-10-01 18:28 WA9PIE Status resolved => closed
2018-10-01 18:28 WA9PIE Testing Not Started => Beta Successful
2018-10-01 18:28 WA9PIE Project 3 - Current Dev List => Ham Radio Deluxe