View Revisions: Issue #2891

Summary 0002891: DM780: crash at shutdown due to poor concurrency control around CLogfile object
Revision 2018-09-28 07:48 by K7ZCZ
Additional Information The thread with the crash has this stack:

0:007> kp
  *** Stack trace for last set context - .thread/.cxr resets it
 # ChildEBP RetAddr  
WARNING: Frame IP not in any known module. Following frames may be wrong.
00 0706fc18 00a8432c 0x46746547
01 0706fd30 00a840ae Digital_Master!CLogfile::WriteToFile(class ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > > * strText = 0x0706fd48 "HRD Logbook: Closing thread")+0xcc [c:\ham radio\digital master\digital master\logfile.cpp @ 164] 
02 0706fd6c 00a3544f Digital_Master!CLogfile::Add(wchar_t * lpszFormat = 0x01029a58 "%s: Closing thread")+0xae [c:\ham radio\digital master\digital master\logfile.cpp @ 99] 
03 0706fd90 00bd4642 Digital_Master!CBackgroundProcessingThread::EndWork(void)+0x1f [c:\ham radio\digital master\digital master\backgroundprocessingthread.cpp @ 418] 
04 0706fdb8 00bd46c6 Digital_Master!CThinThread::Run(void)+0x92 [c:\ham radio\digital master\digital master\thinthread.cpp @ 200] 
05 0706fdf4 00c13dc2 Digital_Master!CThinThread::Start(void * pv = 0x013959d0)+0x46 [c:\ham radio\digital master\digital master\thinthread.cpp @ 234] 
06 0706fe9c 00d3d42f Digital_Master!_AfxThreadEntry(void * pParam = 0x0293d898)+0xff [f:\dd\vctools\vc7libs\ship\atlmfc\src\mfc\thrdcore.cpp @ 109] 
07 0706fed4 00d3d557 Digital_Master!_callthreadstartex(void)+0x1b [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376] 
08 0706fee0 76a08674 Digital_Master!_threadstartex(void * ptd = <Value unavailable error>)+0x7c [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354] 
09 0706fef4 77d95d87 kernel32!BaseThreadInitThunk+0x24
0a 0706ff3c 77d95d57 ntdll!__RtlUserThreadStart+0x2f
0b 0706ff4c 00000000 ntdll!_RtlUserThreadStart+0x1b


We can see that it's trying to shut down; the EndWork() function wants to log a message that the thread is closing.

Thing is, we can see that the main thread is already exited and the application object (which owns the CLogfile object) is already partially destroyed:

   0  Id: 82c.1484 Suspend: 0 Teb: 027f6000 Unfrozen
 # ChildEBP RetAddr  Args to Child              
00 0293fa48 76f12499 000003e4 00000000 0293fa90 ntdll!NtWaitForSingleObject+0xc
01 0293fabc 76f123f2 000003e4 00001388 00000000 KERNELBASE!WaitForSingleObjectEx+0x99
02 0293fad0 00c1f540 000003e4 00001388 0293faf0 KERNELBASE!WaitForSingleObject+0x12
03 0293fae0 00c1f70b 00001388 013959d0 0293fb1c Digital_Master!CSyncObject::Lock+0xf [f:\dd\vctools\vc7libs\ship\atlmfc\src\mfc\mtcore.cpp @ 42] 
04 0293faf0 00a353d5 00001388 fce573e3 0000040e Digital_Master!CSingleLock::Lock+0x10 [f:\dd\vctools\vc7libs\ship\atlmfc\src\mfc\mtex.cpp @ 108] 
05 0293fb1c 00a34def fce573b7 01591804 0000040e Digital_Master!CBackgroundProcessingThread::CloseDown+0x65 [c:\ham radio\digital master\digital master\backgroundprocessingthread.cpp @ 400] 
06 0293fb48 00a4a7c1 fce5738b 0ce5d248 00f73b70 Digital_Master!CBackgroundProcessingThread::~CBackgroundProcessingThread+0x3f [c:\ham radio\digital master\digital master\backgroundprocessingthread.cpp @ 269] 
07 0293fb74 00d415d2 fce57347 00d37909 00000000 Digital_Master!CDigitalMasterApp::~CDigitalMasterApp+0x2b1 [c:\ham radio\digital master\digital master\digital master.cpp @ 200] 
08 0293fbb8 00d41679 00000000 00000000 00000000 Digital_Master!doexit+0x97 [f:\dd\vctools\crt\crtw32\startup\crt0dat.c @ 628] 
09 0293fbcc 00d3789e 00000000 fce574f3 00d37909 Digital_Master!exit+0xf [f:\dd\vctools\crt\crtw32\startup\crt0dat.c @ 417] 
0a 0293fc0c 76a08674 027f3000 76a08650 8b43d7b4 Digital_Master!__tmainCRTStartup+0x10c [f:\dd\vctools\crt\crtw32\startup\crt0.c @ 264] 
0b 0293fc20 77d95d87 027f3000 8a0e8c03 00000000 kernel32!BaseThreadInitThunk+0x24
0c 0293fc68 77d95d57 ffffffff 77dbad0f 00000000 ntdll!__RtlUserThreadStart+0x2f
0d 0293fc78 00000000 00d37909 027f3000 00000000 ntdll!_RtlUserThreadStart+0x1b



so the problem is that there's inadquate synchronization between the "background processing thread" and the main thread at shutdown.

The CThinThread::Run() function has a CSingleLock object which doesn't actually lock:

    CSingleLock csl(m_pWorkEvent);	//	Synch on the work event.
    StartWork();					//	Do derived startup (not needed).


The CSingleLock constructor needs another parameter "TRUE" to acquire the lock by default. In this function, it doesn't actually do anything. I haven't looked deeply enough to see if this is the actual problem, but it is of note.

Revision 2018-09-11 16:08 by K7ZCZ
Additional Information
The thread with the crash has this stack:

0:007> kp
  *** Stack trace for last set context - .thread/.cxr resets it
 # ChildEBP RetAddr  
WARNING: Frame IP not in any known module. Following frames may be wrong.
00 0706fc18 00a8432c 0x46746547
01 0706fd30 00a840ae Digital_Master!CLogfile::WriteToFile(class ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > > * strText = 0x0706fd48 "HRD Logbook: Closing thread")+0xcc [c:\ham radio\digital master\digital master\logfile.cpp @ 164] 
02 0706fd6c 00a3544f Digital_Master!CLogfile::Add(wchar_t * lpszFormat = 0x01029a58 "%s: Closing thread")+0xae [c:\ham radio\digital master\digital master\logfile.cpp @ 99] 
03 0706fd90 00bd4642 Digital_Master!CBackgroundProcessingThread::EndWork(void)+0x1f [c:\ham radio\digital master\digital master\backgroundprocessingthread.cpp @ 418] 
04 0706fdb8 00bd46c6 Digital_Master!CThinThread::Run(void)+0x92 [c:\ham radio\digital master\digital master\thinthread.cpp @ 200] 
05 0706fdf4 00c13dc2 Digital_Master!CThinThread::Start(void * pv = 0x013959d0)+0x46 [c:\ham radio\digital master\digital master\thinthread.cpp @ 234] 
06 0706fe9c 00d3d42f Digital_Master!_AfxThreadEntry(void * pParam = 0x0293d898)+0xff [f:\dd\vctools\vc7libs\ship\atlmfc\src\mfc\thrdcore.cpp @ 109] 
07 0706fed4 00d3d557 Digital_Master!_callthreadstartex(void)+0x1b [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376] 
08 0706fee0 76a08674 Digital_Master!_threadstartex(void * ptd = <Value unavailable error>)+0x7c [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354] 
09 0706fef4 77d95d87 kernel32!BaseThreadInitThunk+0x24
0a 0706ff3c 77d95d57 ntdll!__RtlUserThreadStart+0x2f
0b 0706ff4c 00000000 ntdll!_RtlUserThreadStart+0x1b


We can see that it's trying to shut down; the EndWork() function wants to log a message that the thread is closing.

Thing is, we can see that the main thread is already exited and the application object (which owns the CLogfile object) is already partially destroyed:

   0  Id: 82c.1484 Suspend: 0 Teb: 027f6000 Unfrozen
 # ChildEBP RetAddr  Args to Child              
00 0293fa48 76f12499 000003e4 00000000 0293fa90 ntdll!NtWaitForSingleObject+0xc
01 0293fabc 76f123f2 000003e4 00001388 00000000 KERNELBASE!WaitForSingleObjectEx+0x99
02 0293fad0 00c1f540 000003e4 00001388 0293faf0 KERNELBASE!WaitForSingleObject+0x12
03 0293fae0 00c1f70b 00001388 013959d0 0293fb1c Digital_Master!CSyncObject::Lock+0xf [f:\dd\vctools\vc7libs\ship\atlmfc\src\mfc\mtcore.cpp @ 42] 
04 0293faf0 00a353d5 00001388 fce573e3 0000040e Digital_Master!CSingleLock::Lock+0x10 [f:\dd\vctools\vc7libs\ship\atlmfc\src\mfc\mtex.cpp @ 108] 
05 0293fb1c 00a34def fce573b7 01591804 0000040e Digital_Master!CBackgroundProcessingThread::CloseDown+0x65 [c:\ham radio\digital master\digital master\backgroundprocessingthread.cpp @ 400] 
06 0293fb48 00a4a7c1 fce5738b 0ce5d248 00f73b70 Digital_Master!CBackgroundProcessingThread::~CBackgroundProcessingThread+0x3f [c:\ham radio\digital master\digital master\backgroundprocessingthread.cpp @ 269] 
07 0293fb74 00d415d2 fce57347 00d37909 00000000 Digital_Master!CDigitalMasterApp::~CDigitalMasterApp+0x2b1 [c:\ham radio\digital master\digital master\digital master.cpp @ 200] 
08 0293fbb8 00d41679 00000000 00000000 00000000 Digital_Master!doexit+0x97 [f:\dd\vctools\crt\crtw32\startup\crt0dat.c @ 628] 
09 0293fbcc 00d3789e 00000000 fce574f3 00d37909 Digital_Master!exit+0xf [f:\dd\vctools\crt\crtw32\startup\crt0dat.c @ 417] 
0a 0293fc0c 76a08674 027f3000 76a08650 8b43d7b4 Digital_Master!__tmainCRTStartup+0x10c [f:\dd\vctools\crt\crtw32\startup\crt0.c @ 264] 
0b 0293fc20 77d95d87 027f3000 8a0e8c03 00000000 kernel32!BaseThreadInitThunk+0x24
0c 0293fc68 77d95d57 ffffffff 77dbad0f 00000000 ntdll!__RtlUserThreadStart+0x2f
0d 0293fc78 00000000 00d37909 027f3000 00000000 ntdll!_RtlUserThreadStart+0x1b



so the problem is that there's inadquate synchronization between the "background processing thread" and the main thread at shutdown.

The CThinThread::Run() function has a CSingleLock object which doesn't actually lock:

    CSingleLock csl(m_pWorkEvent);	//	Synch on the work event.
    StartWork();					//	Do derived startup (not needed).


The CSingleLock constructor needs another parameter "TRUE" to acquire the lock by default. In this function, it doesn't actually do anything. I haven't looked deeply enough to see if this is the actual problem, but it is of note.