View Issue Details

IDProjectCategoryView StatusLast Update
0002901Ham Radio DeluxeBugpublic2018-11-11 00:34
ReporterK7ZCZ 
Assigned ToK7ZCZ 
PrioritynormalSeveritycrashReproducibilityhave not tried
Status closedResolutionfixed 
Product Version6.4.0.886 
Target VersionFixed in Version6.4.0.902 
Summary0002901: Digital Master: crash when freeing HRDLOG_DATA structure
DescriptionTicket 508543 contains several minidumps; DigitalMaster_20180917_161340.mdmp shows a crash when one of the background processing threads
tries to free a HRDLOG_DATA structure.

This crash is similar to, but not precisely the same as, Mantis 2704.

This issue is architecturally the same as Mantis 2813; different data structure, and in the Logbook not Digital Master. But the same design pattern has been poorly implemented in both issues.
Steps To ReproduceNo repro steps were provided.
Additional Information
0:007> .ecxr
eax=ffffffff ebx=156efc40 ecx=fffffffc edx=fffffff0 esi=ffffffff edi=0024e888
eip=00c4824c esp=0cf9b654 ebp=0cf9b670 iopl=0         nv up ei ng nz na pe cy
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010287
Digital_Master!CLogbookInterface::HRDLOG_DATA::~HRDLOG_DATA+0x9c:
00c4824c f00fc101        lock xadd dword ptr [ecx],eax ds:002b:fffffffc=????????
0:007> kb
  *** Stack trace for last set context - .thread/.cxr resets it
 # ChildEBP RetAddr  Args to Child              
00 0cf9b670 00c48a69 8054432b 015a59d0 015a59d0 Digital_Master!CLogbookInterface::HRDLOG_DATA::~HRDLOG_DATA+0x9c
01 0cf9bb1c 00c4aa93 80540407 156efc40 015a59d0 Digital_Master!CBackgroundProcessingThread::LogbookCommand+0x719 [c:\ham radio\digital master\digital master\backgroundprocessingthreadlog.cpp @ 297] 
02 0cf9fc30 00c49d01 156efc40 80540477 0c357060 Digital_Master!CBackgroundProcessingThread::ProcessData+0xd33 [c:\ham radio\digital master\digital master\backgroundprocessingthreadpacket.cpp @ 1700] 
03 0cf9fc68 00de4635 805404a7 00000470 00390830 Digital_Master!CBackgroundProcessingThread::DoWork+0xd1 [c:\ham radio\digital master\digital master\backgroundprocessingthreadpacket.cpp @ 1069] 
04 0cf9fc90 00de46c6 805404fb 0c357060 00000470 Digital_Master!CThinThread::Run+0x85 [c:\ham radio\digital master\digital master\thinthread.cpp @ 182] 
05 0cf9fccc 00e23dc2 015a59d0 80540543 00000000 Digital_Master!CThinThread::Start+0x46 [c:\ham radio\digital master\digital master\thinthread.cpp @ 234] 
06 0cf9fd74 00f4d42f 0024d4c4 8054059b 00000000 Digital_Master!_AfxThreadEntry+0xff [f:\dd\vctools\vc7libs\ship\atlmfc\src\mfc\thrdcore.cpp @ 109] 
07 0cf9fdac 00f4d557 00000000 0cf9fdc4 76fc343d Digital_Master!_callthreadstartex+0x1b [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376] 
08 0cf9fdb8 76fc343d 0c365088 0cf9fe04 77e19802 Digital_Master!_threadstartex+0x7c [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354] 
09 0cf9fdc4 77e19802 0c365088 7fbd26e6 00000000 kernel32!BaseThreadInitThunk+0xe
0a 0cf9fe04 77e197d5 00f4d4db 0c365088 00000000 ntdll!__RtlUserThreadStart+0x70
0b 0cf9fe1c 00000000 00f4d4db 0c365088 00000000 ntdll!_RtlUserThreadStart+0x1b
0:007> kp
  *** Stack trace for last set context - .thread/.cxr resets it
 # ChildEBP RetAddr  
00 0cf9b670 00c48a69 Digital_Master!CLogbookInterface::HRDLOG_DATA::~HRDLOG_DATA(void)+0x9c
01 0cf9bb1c 00c4aa93 Digital_Master!CBackgroundProcessingThread::LogbookCommand(struct CLogbookInterface::HRDLOG_DATA * pData = 0xfffffffc)+0x719 [c:\ham radio\digital master\digital master\backgroundprocessingthreadlog.cpp @ 297] 
02 0cf9fc30 00c49d01 Digital_Master!CBackgroundProcessingThread::ProcessData(class CBkgPacket * pPkt = 0x156efc40)+0xd33 [c:\ham radio\digital master\digital master\backgroundprocessingthreadpacket.cpp @ 1700] 
03 0cf9fc68 00de4635 Digital_Master!CBackgroundProcessingThread::DoWork(void)+0xd1 [c:\ham radio\digital master\digital master\backgroundprocessingthreadpacket.cpp @ 1069] 
04 0cf9fc90 00de46c6 Digital_Master!CThinThread::Run(void)+0x85 [c:\ham radio\digital master\digital master\thinthread.cpp @ 182] 
05 0cf9fccc 00e23dc2 Digital_Master!CThinThread::Start(void * pv = 0x015a59d0)+0x46 [c:\ham radio\digital master\digital master\thinthread.cpp @ 234] 
06 0cf9fd74 00f4d42f Digital_Master!_AfxThreadEntry(void * pParam = 0x0024d4c4)+0xff [f:\dd\vctools\vc7libs\ship\atlmfc\src\mfc\thrdcore.cpp @ 109] 
07 0cf9fdac 00f4d557 Digital_Master!_callthreadstartex(void)+0x1b [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376] 
08 0cf9fdb8 76fc343d Digital_Master!_threadstartex(void * ptd = <Value unavailable error>)+0x7c [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354] 
09 0cf9fdc4 77e19802 kernel32!BaseThreadInitThunk+0xe
0a 0cf9fe04 77e197d5 ntdll!__RtlUserThreadStart+0x70
0b 0cf9fe1c 00000000 ntdll!_RtlUserThreadStart+0x1b

TagsNo tags attached.
ModuleDM780
Sub-ModuleInterfacing
Testing Not Tested

Relationships

related to 0002845 closedWA9PIE V6.4.0.876 - Random crashes\Lockups\closing of LB and DM where both are affected at the same time. 
related to 0002704 closedK7ZCZ DM780: User reports crash with dump file 

Activities

K7ZCZ

2018-09-18 09:31

manager   ~0006221

The dump for bug is in the archive "OSTicket #508543 DigitalMaster_20180917_161340.zip" in the Mantis2845 directory of the Google Drive dumps folder.

PD9FER

2018-09-18 09:56

viewer   ~0006222

Mike....

Could you pse write a registry entry to enable WER ? (as instructed in 0002845)
We then can merge it to the customers Registry... (Want to be sure to do it right)

K7ZCZ

2018-09-18 22:19

manager   ~0006223

The pData pointer as a paramter to LogbookCommand() is not directly available. We can find the pPkt pointer to ProcessData(), though:


0:007> kp
  *** Stack trace for last set context - .thread/.cxr resets it
 # ChildEBP RetAddr  
00 0cf9b670 00c48a69 Digital_Master!CLogbookInterface::HRDLOG_DATA::~HRDLOG_DATA(void)+0x9c
01 0cf9bb1c 00c4aa93 Digital_Master!CBackgroundProcessingThread::LogbookCommand(struct CLogbookInterface::HRDLOG_DATA * pData = 0xfffffffc)+0x719 [c:\ham radio\digital master\digital master\backgroundprocessingthreadlog.cpp @ 297] 
02 0cf9fc30 00c49d01 Digital_Master!CBackgroundProcessingThread::ProcessData(class CBkgPacket * pPkt = 0x156efc40)+0xd33 [c:\ham radio\digital master\digital master\backgroundprocessingthreadpacket.cpp @ 1700] 
03 0cf9fc68 00de4635 Digital_Master!CBackgroundProcessingThread::DoWork(void)+0xd1 [c:\ham radio\digital master\digital master\backgroundprocessingthreadpacket.cpp @ 1069] 
04 0cf9fc90 00de46c6 Digital_Master!CThinThread::Run(void)+0x85 [c:\ham radio\digital master\digital master\thinthread.cpp @ 182] 
05 0cf9fccc 00e23dc2 Digital_Master!CThinThread::Start(void * pv = 0x015a59d0)+0x46 [c:\ham radio\digital master\digital master\thinthread.cpp @ 234] 
06 0cf9fd74 00f4d42f Digital_Master!_AfxThreadEntry(void * pParam = 0x0024d4c4)+0xff [f:\dd\vctools\vc7libs\ship\atlmfc\src\mfc\thrdcore.cpp @ 109] 
07 0cf9fdac 00f4d557 Digital_Master!_callthreadstartex(void)+0x1b [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376] 
08 0cf9fdb8 76fc343d Digital_Master!_threadstartex(void * ptd = <Value unavailable error>)+0x7c [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354] 
09 0cf9fdc4 77e19802 kernel32!BaseThreadInitThunk+0xe
0a 0cf9fe04 77e197d5 ntdll!__RtlUserThreadStart+0x70
0b 0cf9fe1c 00000000 ntdll!_RtlUserThreadStart+0x1b


and we know that structure tells us the command that was being procsesed and a pointer to the pData that should be used:

0:007> dx -r1 ((Digital_Master!CBkgPacket *)0x156efc40)
((Digital_Master!CBkgPacket *)0x156efc40)                 : 0x156efc40 [Type: CBkgPacket *]
    [+0x000] m_bUsed          : true [Type: bool]
    [+0x004] m_hWnd           : 0x0 [Type: HWND__ *]
    [+0x008] m_nCommand       : 0x41 [Type: unsigned int]
    [+0x00c] m_nValue1        : 0x0 [Type: unsigned int]
    [+0x010] m_nValue2        : 0x0 [Type: unsigned int]
    [+0x014] m_pData          : 0x24e888 [Type: void *]
    [+0x018] m_szComment      : "" [Type: wchar_t [256]]
    [+0x218] m_szFilename     : "" [Type: wchar_t [260]]
    [+0x420] m_szSoundcard    : "--- memory read error at address 0x156f0060 ---" [Type: wchar_t [128]]
    [+0x520] m_dRawData       [Type: double [2048]]


The structure looks like trash:

0:007> dx -r1 ((Digital_Master!CLogbookInterface::HRDLOG_DATA *)0x24e888)
((Digital_Master!CLogbookInterface::HRDLOG_DATA *)0x24e888)                 : 0x24e888 [Type: CLogbookInterface::HRDLOG_DATA *]
    [+0x000] bCallsignWorked  : 2011168562 [Type: int]
    [+0x004] bConnected       : 2011445351 [Type: int]
    [+0x008] bCountryWorked   : 2419024 [Type: int]
    [+0x00c] bSignal          : 0 [Type: int]
    [+0x010] strBand          [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >]
    [+0x014] strCallsign      : "" [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >]
    [+0x018] strCallsignLastQSO [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >]
    [+0x01c] strContinent     [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >]
    [+0x020] strCountry       [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >]
    [+0x024] strMode          [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >]
    [+0x028] strValue         [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >]
    [+0x02c] strClass         [Type: ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >]
    [+0x030] astrBands        [Type: CStringArray]
    [+0x044] astrDXCCs        [Type: CStringArray]
    [+0x058] astrFields       [Type: CStringArray]
    [+0x06c] astrModes        [Type: CStringArray]
    [+0x080] astrValues       [Type: CStringArray]
    [+0x094] astrCallsignModes [Type: CStringArray]
    [+0x0a8] anKeys           [Type: CUIntArray]
    [+0x0bc] hEvent           : 0x24eb68 [Type: void *]
    [+0x0c0] hReplyWnd        : 0x22c000 [Type: HWND__ *]
    [+0x0c4] lpszData         : 0x250000 : "捁硴 " [Type: wchar_t *]
    [+0x0c8] lpszLog          : 0x10007 : "???" [Type: wchar_t *]
    [+0x0cc] szAddress        : "軀ట???" [Type: wchar_t [256]]
    [+0x2cc] szDatabase       : "㽟眏???" [Type: wchar_t [256]]
    [+0x4cc] szField          : "???" [Type: wchar_t [256]]
    [+0x6cc] szValue          : "" [Type: wchar_t [256]]
    [+0x8cc] nCQZone          : 0x2b1 [Type: unsigned int]
    [+0x8d0] nDXCC            : 0x0 [Type: unsigned int]
    [+0x8d4] nITUZone         : 0xffffffff [Type: unsigned int]
    [+0x8d8] nReplyMsg        : 0xc2d0 [Type: unsigned int]
    [+0x8dc] wPort            : 0x4b20 [Type: unsigned short]
    [+0x8e0] eOption          : 49665 [Type: CLogbookInterface::eOptions]
    [+0x8e4] nChannel         : 2421160 [Type: int]
    [+0x8e8] nCount           : 462338 [Type: int]
    [+0x8ec] nReturn          : 65535 [Type: int]



We can se that the command is 0x41, which is 65, which is REQUEST_LOGBOOK_COMMAND, which is added to the queue by CBackgroundProcessingThread::RequestLogbookCommand(). This "command" isn't too descriptive; the structure at pData actually contains the ral command to be executed by the logbook at the other side of the interface. Becaue we don't have a valid copy of that structure, we can't guess which one is causing the problem.

The CLogbookInterface::SendCommand() takes a parameter that makes it wait (or not) for the worker thread to finish. If the caller is not waiting ("async"), then the pData structure is copied because it's assumed the thread will delete it. If the caller is waiting ("sync"), the worker thread will _not_ delete the pData structure.

When waiting (for the "sync" mode), the wait can time out after one minute. A minute seems like a long time, but it's conceivable that customers have long-waiting work on slow machines; or that hangs or delays end up taking more than 60 seconds. After a timeout, the waiting code just gives up and returns, which means the caller will destroy the stack-local copy of the data it had. The data is still readable, but changed -- wihch makes the worker thread that the flag indiciating synchronous work is set and will end up deleting memory that's on the caller's stack -- which causes the crash.

Sure enough the 0x24e888 pointer above is inside the main thread's call stack, but that stack frame is long gone. (The main thread's call stack is 57 frames long in the dump, and not pasted here for brevity ... but these are the first and last few frames shown:

00 0024d8fc 775f1f89 000310ba 0000000e 00000000 user32!NtUserMessageCall+0x15
01 0024d93c 775e96c5 028e8780 00000000 731a99e8 user32!SendMessageWorker+0x5e9
02 0024d960 732381b9 000310ba 0000000e 00000000 user32!SendMessageW+0x7f
03 0024d9dc 775e62fa 000310b8 0000000e 00000000 comctl32!ComboBox_WndProc+0x6e9
:
:
:
55 0024f844 77e19802 7efde000 73602066 00000000 kernel32!BaseThreadInitThunk+0xe
56 0024f884 77e197d5 00f47909 7efde000 00000000 ntdll!__RtlUserThreadStart+0x70
57 0024f89c 00000000 00f47909 7efde000 00000000 ntdll!_RtlUserThreadStart+0x1b


I think the correct fix is to make the wait in SendCommand() infinite; so that a slow work item never times out and the stack is never corrupted.

This pattern (and a similar fix) plauge the Logbook's background thread processing. See Mantis issue 2704.

K7ZCZ

2018-09-18 22:19

manager   ~0006224

I'm sorry, Ferry, but I don't understand what you're asking for, or how it relates to this issue.

K7ZCZ

2018-09-18 22:24

manager   ~0006225

This shelf removes the timeouts
https://hrdsoftware.visualstudio.com/HRD/_versionControl/shelveset?ss=SendCommand%20timeout%20fix%20%28Mantis%202901%29%3Bmikeblas%40msn.com

K7ZCZ

2018-10-20 13:18

manager   ~0006312

Fixed with this checkin:
https://hrdsoftware.visualstudio.com/HRD/_versionControl/changeset/4382

g3ucq

2018-10-21 04:31

viewer   ~0006321

Unable to test. Not a problem for me.

WA9PIE

2018-11-05 11:48

administrator   ~0006377

Accepted without testing, due to the nature of the change.

Issue History

Date Modified Username Field Change
2018-09-18 09:24 K7ZCZ New Issue
2018-09-18 09:25 K7ZCZ Relationship added related to 0002845
2018-09-18 09:31 K7ZCZ Note Added: 0006221
2018-09-18 09:31 K7ZCZ Relationship added related to 0002704
2018-09-18 09:56 PD9FER Note Added: 0006222
2018-09-18 22:19 K7ZCZ Note Added: 0006223
2018-09-18 22:19 K7ZCZ Note Added: 0006224
2018-09-18 22:24 K7ZCZ Note Added: 0006225
2018-10-20 13:18 K7ZCZ Assigned To => K7ZCZ
2018-10-20 13:18 K7ZCZ Status new => resolved
2018-10-20 13:18 K7ZCZ Resolution open => fixed
2018-10-20 13:18 K7ZCZ Note Added: 0006312
2018-10-20 22:19 K7ZCZ Fixed in Version => 6.4.0.899
2018-10-21 04:31 g3ucq Note Added: 0006321
2018-11-04 07:49 K7ZCZ Fixed in Version 6.4.0.899 => 6.4.0.900
2018-11-05 11:48 WA9PIE Status resolved => closed
2018-11-05 11:48 WA9PIE Testing Not Started => Not Tested
2018-11-05 11:48 WA9PIE Note Added: 0006377
2018-11-11 00:34 WA9PIE Fixed in Version 6.4.0.900 => 6.4.0.902
2018-11-11 00:34 WA9PIE Project 3 - Current Dev List => Ham Radio Deluxe