View Issue Details

IDProjectCategoryView StatusLast Update
0002862Ham Radio DeluxeBugpublic2018-09-11 13:18
ReporterK7ZCZ 
Assigned ToK7ZCZ 
PrioritynormalSeveritycrashReproducibilityhave not tried
Status closedResolutionfixed 
Product Version6.4.0.873 
Target VersionFixed in Version6.4.0.886 
Summary0002862: Logbook: Bad thread synchronization around lookup calls and REQUEST_SIGNAL_DATA structure
DescriptionWe've been trying to find a crash in the Logbook that corrupts memory and causes the Logbook application to shut down. See the related bugs for more details and more minidump files, including a couple of changes made in speculation of a fix.

I recently found a dump file on the Dev Dashboard which is stored in the Dumps directory on Google Drive for this issue. The call stack for the dump is provided below. Note that the nThreadID parameter to the CountryLookup() member is -1.

This dump comes from build 873 being run on current Windows 10 bits (build 10.0.17134.112)

Additional Information
0:021> kb
  *** Stack trace for last set context - .thread/.cxr resets it
 # ChildEBP RetAddr  Args to Child              
00 171ed234 774f4e2e 00000001 00000009 774f2bb3 ntdll!RtlReportCriticalFailure+0x88
01 171ed240 774f2bb3 00000002 774a2a5f e1a95355 ntdll!RtlpReportHeapFailure+0x2f
02 171ed248 774a2a5f e1a95355 00000000 1d31f844 ntdll!RtlpHeapHandleError+0x11
03 171ed27c 774f9af9 1d31f844 1d31f844 00000009 ntdll!RtlpHpHeapHandleError+0x65
04 171ed290 774afb7d 1d31f844 00000000 00000000 ntdll!RtlpLogHeapFailure+0x41
05 171ed2ec 00f4e2b1 00920000 00000000 1d31f844 ntdll!RtlFreeHeap+0x74fad
06 171ed300 012dfbee 1d31f844 12f365f9 1bfe7a50 HRDLogbook!free+0x1a [f:\dd\vctools\crt\crtw32\heap\free.c @ 51] 
07 171ef908 012e820c 1d31f844 1d31f6c4 ffffffff HRDLogbook!CBackgroundProcessingThreadLookups::CountryLookup+0x894e [c:\ham radio\logbook\hrdlogbook\backgroundprocessingthreadlookups.cpp @ 5308] 
08 171ef93c 012d04ff 1bfe7a50 00000000 12f36585 HRDLogbook!CBackgroundProcessingThreadLookups::ProcessData+0xbc [c:\ham radio\logbook\hrdlogbook\backgroundprocessingthreadlookups.cpp @ 178] 
09 171ef974 014b3a60 12f3656d 0dd22e60 00988a90 HRDLogbook!CBackgroundProcessingThread::DoWork+0xaf [c:\ham radio\logbook\hrdlogbook\backgroundprocessingthread.cpp @ 305] 
0a 171ef99c 014b39a6 12f36529 00f534d7 0dd22e60 HRDLogbook!CThinThread::Run+0x80 [c:\ham radio\logbook\hrdlogbook\thinthread.cpp @ 188] 
0b 171ef9d8 00f5342b 0179a718 12f366e1 00f534d7 HRDLogbook!CThinThread::Start+0x46 [c:\ham radio\logbook\hrdlogbook\thinthread.cpp @ 236] 
0c 171efa10 00f53553 00f534d7 171efa30 7734a1a4 HRDLogbook!_callthreadstartex+0x1b [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376] 
0d 171efa1c 7734a1a4 0dd22e60 7734a180 8d2ac2f5 HRDLogbook!_threadstartex+0x7c [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354] 
0e 171efa30 7746174e 0dd22e60 e1a97b51 00000000 kernel32!BaseThreadInitThunk+0x24
0f 171efa78 77461722 ffffffff 774a62f3 00000000 ntdll!__RtlUserThreadStart+0x2b
10 171efa88 00000000 00f534d7 0dd22e60 00000000 ntdll!_RtlUserThreadStart+0x1b

TagsNo tags attached.
ModuleLogbook
Sub-ModuleCall lookup
Testing Not Tested

Relationships

related to 0002813 closedK7ZCZ Ham Radio Deluxe Logbook: crashes when freeing REQUEST_SIGNAL_DATA structure in CountryLookup() function 
related to 0002789 closedPD9FER 4 - Closed w/o Action Beta Build 858 Logbook crashing on opening 
related to 0002735 resolvedK7ZCZ 1 - Backlog Logbook Lockup 

Activities

K7ZCZ

2018-08-28 16:47

manager   ~0006015

There's only a couple of places in the code which make the call to perform this lookup. One is asynchronous, and doesn't wait for a return -- it passes pointers to help-allocated memory. The other location uses stack-based structures and is synchronous; it has to be, because otherwise the stack frame would disappear, rendering the pointers passed to the worker function invalid.

The whole notion of having the same code handle synchronous and asynchronous requests is a bit iffy; why use a separate thread just to wait for it to complete its work?

The -1 parameter noted above gives a clue; we know that the call must be coming from the synchronous code. It turns out that the synchronous code is synchronous -- but has a time out of seven seconds. The time out duration was arbitrarily chosen -- but how could it possibly take more than 7 seconds to do a callsign lookup?

If the time out is exceeded, the stack frame of the calling function is torn down, taking the REQUEST_SIGNAL_DATA and lookup result data structures with it. When the lookup code in the other thread derefereneces those pointers, it's walking over memory it doesn't own. The first symptom is this crash, where the REQUEST_SIGNAL_DATA structure is deallocated. It's puzzling that the structure is deallocated, but that's because it's a flag in the structure itself that indicates it is meant to be deallocated; as soon as the structure becomes corrupt (overwritten with other data after the calling stack frame disappears) the flag indicating deletion is required is very likely to look like a non-zero "TRUE" value, causing the deallocation to work.

The chorus of offsetting problems disguised this issue pretty well, but the fix should take care of things. Instead of waiting indefinitely, we still wait for seven seconds -- but log, and conitnue waiting. That way, if a hang actually occurs, the code is still showing some output and some evidence of where it might be hung.

K7ZCZ

2018-09-01 13:47

manager   ~0006036

Last edited: 2018-09-01 14:41

View 2 revisions

Fixed with this checkin, which first went out in the 877 build:
https://hrdsoftware.visualstudio.com/HRD/_versionControl/changeset/4311

Jose made a couple of good code review comments, so I'll address those before resolving this issue fully ...

K7ZCZ

2018-09-01 16:20

manager   ~0006044

I addressed Jose's CR comments in this checkin, so let's resolve this one now.
https://hrdsoftware.visualstudio.com/HRD/_versionControl/changeset/4318


g3ucq

2018-09-10 02:45

viewer   ~0006167

Unable to test

WA9PIE

2018-09-10 11:05

administrator   ~0006179

Putting FIV back... I changed it by accident.

WA9PIE

2018-09-10 16:53

administrator   ~0006192

Given that we can't test this (it was a dump collected on the dev dashboard), I'm going to give this item a pass.

Issue History

Date Modified Username Field Change
2018-08-28 15:33 K7ZCZ New Issue
2018-08-28 15:33 K7ZCZ Relationship added related to 0002813
2018-08-28 15:34 K7ZCZ Relationship added related to 0002789
2018-08-28 15:34 K7ZCZ Relationship added related to 0002735
2018-08-28 16:47 K7ZCZ Note Added: 0006015
2018-09-01 13:47 K7ZCZ Note Added: 0006036
2018-09-01 14:41 K7ZCZ Note Edited: 0006036 View Revisions
2018-09-01 16:20 K7ZCZ Assigned To => K7ZCZ
2018-09-01 16:20 K7ZCZ Status new => resolved
2018-09-01 16:20 K7ZCZ Resolution open => fixed
2018-09-01 16:20 K7ZCZ Note Added: 0006044
2018-09-03 00:07 K7ZCZ Fixed in Version => 6.4.0.878
2018-09-10 02:45 g3ucq Note Added: 0006167
2018-09-10 11:01 WA9PIE Fixed in Version 6.4.0.878 =>
2018-09-10 11:05 WA9PIE Fixed in Version => 6.4.0.878
2018-09-10 11:05 WA9PIE Description Updated View Revisions
2018-09-10 11:05 WA9PIE Note Added: 0006179
2018-09-10 16:53 WA9PIE Status resolved => closed
2018-09-10 16:53 WA9PIE Testing Not Started => Not Tested
2018-09-10 16:53 WA9PIE Note Added: 0006192
2018-09-11 13:15 WA9PIE Project 3 - Current Dev List => Ham Radio Deluxe
2018-09-11 13:18 WA9PIE Fixed in Version 6.4.0.878 => 6.4.0.886