View Issue Details

IDProjectCategoryView StatusLast Update
0002735Ham Radio DeluxeBugpublic2018-09-28 21:42
ReporterKB3NPH 
Assigned ToK7ZCZ 
PriorityhighSeveritymajorReproducibilityalways
Status closedResolutionfixed 
Product Version 
Target VersionFixed in Version6.4.0.888 
Summary0002735: Logbook Lockup
DescriptionAnother customer reporting the Logbook just stops working for no reason. We have received many of these reports with V6.4.0.840 and really need something done to find the issue. I've attached a screenshot of the display when this happens. I also got a dump file that can be looked at to see if there is any indication there as to what is causing the problem.

Additional InformationTicket #986640
TagsNo tags attached.
ModuleLogbook
Sub-ModuleFunctional
Testing Beta Successful

Relationships

related to 0002813 closedK7ZCZ Ham Radio Deluxe Logbook: crashes when freeing REQUEST_SIGNAL_DATA structure in CountryLookup() function 
related to 0002732 closedPD9FER 4 - Closed w/o Action Logbook crashing when idle 
related to 0002862 closedK7ZCZ Ham Radio Deluxe Logbook: Bad thread synchronization around lookup calls and REQUEST_SIGNAL_DATA structure 

Activities

KB3NPH

2018-05-23 10:25

administrator  

Logbook Failure.jpg (27,901 bytes)
Logbook Failure.jpg (27,901 bytes)

KB3NPH

2018-05-23 10:32

administrator   ~0005095

Dump file located:

\Team Drives\HRD Software\Dumps\Mantis ID 0002735Minideluxe.v2.1.zip

K7ZCZ

2018-05-27 10:08

manager   ~0005115

The dump file doesn't contain a dump. Instead, it's got a program named "MiniDeluxe" in it.

Please provide a dump file so this issue can be investigated.

KB3NPH

2018-06-27 08:45

administrator   ~0005497

Ticket #787583 - Here is a dump file from a customer who I instructed on how to do the "forced" dump using Task Manager. I had spent considerable time in his computer trying to get the issue to act up so I could get the dump myself, but was unable to. Hope this helps get this problem resolved once and for all. The dump file is located in the Google drive DUMPS folder.

Mantis ID 0002735 - HRDLogbook.zip

KB3NPH

2018-06-27 13:06

administrator   ~0005502

I had several customers tell me that this problem with the Logbook stopping working for no reason seemed to start in the release where we removed the IE from the software and started sending all those calls to the external browsers. Over Field Day, Kevin was working using the current version. He called me on the phone and told me he was having that problem where the Logbook stopped working quite often. Today we were talking about that and he said that during Field Day, he was NOT connected to the internet. He said that when he got home and was able to connect to the internet, he still had the problem, however it was not as frequent.

Is it possible that something that was put in or not removed from the code in order to access the external browser could be the problem with the current issue we're having?

K7ZCZ

2018-06-28 20:20

manager   ~0005528

To believe the browser change was related to this issue, I'd need some amount of evidence greater than coincidence. Many changes were made in that same release, and any one of them could have caused (or revealed) this problem.

K7ZCZ

2018-06-28 21:30

manager   ~0005531

Debugging hangs without any repro case is exceptionally hard, since the apps state doesn't stop at the problem like it does with a crash. The dump provides a snapshot of the app as it's hanging -- we get no other clues. The report provided here contains no information other than the report that the "logbook just stops working for no reason".

In examining the associated ticket (#986640) I see that it was opened on April 8, 2018. The code to remove the internal browser wasn't checked in until April 26, and didn't ship until a week after that. These facts exclude not only the external browser change, but that whole build altogether, as the customer reports very frequent hangs ("10 time in one hours").

The investigation of this issue is hampered by the lack of information. Further examination of the ticket indicates that what the user has reported doesn't match what's reported here in Mantis. The customer says he's using Build 805 of the product, but the dump is from Build 846.

The customer says that he gets a "looking for problem" message from Windows. That indicates there was a crash, not a hang. It also means that our crash handler has not been hit (if it were, it would write the minidump). But in this condition, it should be possible to get Windows to write a minidump of its own directly, right at the crash point. Instructions for doing this can be found here: https://docs.microsoft.com/en-us/windows/desktop/wer/collecting-user-mode-dumps

There should also be an entry in the Windows Event Log about the crash, which might indicate a location of a collected dump.

Recovering that information should give us a much better lead toward solving the problem. It would also help to know:

Was the customer using the Logbook alone, or was he using other apps in the suite?

What were the last actions he took before encountering the crash?

Which features of the logbook were active? (DX Cluster connected? Background log uploads? Backup? Anything else ...?)

The more information we have about the symptoms of this issue, the better chance we have at discovering it's cause.

K7ZCZ

2018-06-28 21:57

manager   ~0005532

Debugging hangs without any repro case is exceptionally hard, since the apps state doesn't stop at the problem like it does with a crash. The dump provides a snapshot of the app as it's hanging -- we get no other clues. The report provided here contains no information other than the report that the "logbook just stops working for no reason".

In examining the associated ticket (#986640) I see that it was opened on April 8, 2018. The code to remove the internal browser wasn't checked in until April 26, and didn't ship until a week after that. These facts exclude not only the external browser change, but that whole build altogether, as the customer reports very frequent hangs ("10 time in one hours").

The investigation of this issue is hampered by the lack of information. Further examination of the ticket indicates that what the user has reported doesn't match what's reported here in Mantis. The customer says he's using Build 805 of the product, but the dump is from Build 846.

The customer says that he gets a "looking for problem" message from Windows. That indicates there was a crash, not a hang. It also means that our crash handler has not been hit (if it were, it would write the minidump). But in this condition, it should be possible to get Windows to write a minidump of its own directly, right at the crash point. Instructions for doing this can be found here: https://docs.microsoft.com/en-us/windows/desktop/wer/collecting-user-mode-dumps

There should also be an entry in the Windows Event Log about the crash, which might indicate a location of a collected dump.

Recovering that information should give us a much better lead toward solving the problem. It would also help to know:

Was the customer using the Logbook alone, or was he using other apps in the suite?

What were the last actions he took before encountering the crash?

Which features of the logbook were active? (DX Cluster connected? Background log uploads? Backup? Anything else ...?)

The more information we have about the symptoms of this issue, the better chance we have at discovering it's cause.


Turning my attention to the attached minidump to see what I can learn:

The supplied dump is a 64-bit image, and requires the use of the "!wow64exts.sw" to switch windbg to 32-bit mode. The supplied dump is of build 846.

The attached dump contains on crash record. The process is active, so it's not in the state that it would generate an "application not responding" message. All the threads in the dump are marked "frozen" which makes it hard to tell if they were runnable or not.

One thread sticks out, which is this one:

0:009:x86> kb
 # ChildEBP RetAddr  Args to Child              
00 0ecfc8dc 77a8879f 0000167c 00000001 00000000 ntdll_779e0000!NtWaitForSingleObject+0x15
01 0ecfc960 77a888cd 0ecfcb00 0ecfcb50 00000000 ntdll_779e0000!RtlReportExceptionEx+0x14b
02 0ecfc9b8 77aae98e 0ecfcb00 0ecfcb50 00000000 ntdll_779e0000!RtlReportException+0x86
03 0ecfc9cc 77aaea0b c0000374 0ecfca00 77a54fb4 ntdll_779e0000!RtlpTerminateFailureFilter+0x14
04 0ecfc9d8 77a54fb4 00000000 0ecfceb0 77a0d0d0 ntdll_779e0000!RtlReportCriticalFailure+0x67
05 0ecfc9ec 77a54e59 00000000 00000000 00000000 ntdll_779e0000!_EH4_CallFilterFunc+0x12
06 0ecfca14 77a434c1 fffffffe 0ecfcea0 0ecfcb50 ntdll_779e0000!_except_handler4+0x8e
07 0ecfca38 77a43493 0ecfcb00 0ecfcea0 0ecfcb50 ntdll_779e0000!ExecuteHandler2+0x26
08 0ecfca5c 77a43434 0ecfcb00 0ecfcea0 0ecfcb50 ntdll_779e0000!ExecuteHandler+0x24
09 0ecfcae8 779f0163 00cfcb00 0ecfcb50 0ecfcb00 ntdll_779e0000!RtlDispatchException+0x127
0a 0ecfcb08 77aae9fb 00000001 77ae4270 79638fc1 ntdll_779e0000!KiUserExceptionDispatcher+0xf
0b 0ecfceb0 77aaf901 c0000374 77ae4270 0ecfcef4 ntdll_779e0000!RtlReportCriticalFailure+0x57
0c 0ecfcec0 77aaf9e1 00000002 79638fa1 00000000 ntdll_779e0000!RtlpReportHeapFailure+0x21
0d 0ecfcef4 77a5d9ac 00000009 01b10000 0ee6f974 ntdll_779e0000!RtlpLogHeapFailure+0xa1
0e 0ecfcf24 754014ad 01b10000 00000000 0ee6f974 ntdll_779e0000!RtlFreeHeap+0x64
0f 0ecfcf38 0042db82 01b10000 00000000 0ee6f974 kernel32!HeapFree+0x14
10 0ecfcf4c 007b287d 0ee6f974 efb54ef1 162cffb0 HRDLogbook!free+0x1a [f:\dd\vctools\crt\crtw32\heap\free.c @ 51] 
11 0ecffac8 0079a20d 0ee6f974 0ee6f7ec efb54ec1 HRDLogbook!CBackgroundProcessingThread::CountryLookup+0x894d [c:\ham radio\logbook\hrdlogbook\backgroundprocessingthreadlookups.cpp @ 5259] 
12 0ecffaf8 0079a526 162cffb0 00c5a854 efb54f09 HRDLogbook!CBackgroundProcessingThread::ProcessData+0x36d [c:\ham radio\logbook\hrdlogbook\backgroundprocessingthread.cpp @ 582] 
13 0ecffb30 0097e210 efb54f61 06e43bb8 01b47518 HRDLogbook!CBackgroundProcessingThread::DoWork+0xa6 [c:\ham radio\logbook\hrdlogbook\backgroundprocessingthread.cpp @ 311] 
14 0ecffb58 0097e156 efb54fad 00000000 06e43bb8 HRDLogbook!CThinThread::Run+0x80 [c:\ham radio\logbook\hrdlogbook\thinthread.cpp @ 188] 
15 0ecffb94 00432cfb 00c5a808 efb54ff5 00000000 HRDLogbook!CThinThread::Start+0x46 [c:\ham radio\logbook\hrdlogbook\thinthread.cpp @ 236] 
16 0ecffbcc 00432e23 00000000 0ecffbe4 7540343d HRDLogbook!_callthreadstartex+0x1b [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376] 
17 0ecffbd8 7540343d 06e43bb8 0ecffc24 77a19832 HRDLogbook!_threadstartex+0x7c [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354] 
18 0ecffbe4 77a19832 06e43bb8 7963bd71 00000000 kernel32!BaseThreadInitThunk+0xe
19 0ecffc24 77a19805 00432da7 06e43bb8 00000000 ntdll_779e0000!__RtlUserThreadStart+0x70
1a 0ecffc3c 00000000 00432da7 06e43bb8 00000000 ntdll_779e0000!_RtlUserThreadStart+0x1b



The CountryLookup() function in the BackgroundProcessingThreadLookups thread in the Logbook is an example of terrible software engineering. The function has awful error handling. It's also approximately 6000 lines long. Yes: one single function is 6000 lines long.

The call trapped here shows that the function is about to clean up, it's freeing memory. But the call to free() ends up throwing an exception. It's very odd that the call stack goes through the _except_handler() call without hitting our exception handler (which writes our minidumps).

From this call stack, it would seem that the pSignalData parameter to the lookup function contains bad data. The parameter is essentially unused: the only member from the structure ever touched is the m_hWnd in side it, which is used to respond to the request.

A good guess might be that the data has already been freed, but in the debugger I have no problem viewing the data. If it were previously deleted, I shouldn't be able to dump the data at the pointer.

The only lead we have, then, is that something's wrong with this memory at the moment the thread tried to free it.

K7ZCZ

2018-06-28 23:10

manager   ~0005533

The referenced memory is meant to be a REQUEST_SIGNAL_DATA structure. But it's got crap data in it:

0:009:x86> dx -r1 ((HRDLogbook!REQUEST_SIGNAL_DATA *)0xee6f974)
((HRDLogbook!REQUEST_SIGNAL_DATA *)0xee6f974)                 : 0xee6f974 [Type: REQUEST_SIGNAL_DATA *]
    [+0x000] bFree            : 153 [Type: int]
    [+0x004] hEvent           : 0x3 [Type: void *]
    [+0x008] lpszReturn       : 0x229 : "--- memory read error at address 0x00000229 ---" [Type: wchar_t *]
    [+0x00c] hWnd             : 0x1f0 [Type: HWND__ *]
    [+0x010] lpCallback       : 0x3cc [Type: void (*)(wchar_t *,long)]
    [+0x014] lParam           : 3 [Type: long]
    [+0x018] bEnableLogbook   : -2130673663 [Type: int]
    [+0x01c] bEnableCountry   : 368 [Type: int]
0:009:x86> db 0x0ee6f974
0ee6f974  99 00 00 00 03 00 00 00-29 02 00 00 f0 01 00 00  ........).......
0ee6f984  cc 03 00 00 03 00 00 00-01 80 00 81 70 01 00 00  ............p...
0ee6f994  02 02 00 00 f0 ce 51 16-01 80 00 81 a9 01 00 00  ......Q.........
0ee6f9a4  a6 92 be ef ec f8 e6 0e-7d 2f 52 75 30 fa e6 0e  ........}/Ru0...
0ee6f9b4  67 2f ca 71 ce 07 90 90-fe ff ff ff 40 fa e6 0e  g/.q........@...
0ee6f9c4  3e 67 52 75 40 00 00 00-e0 fa e6 0e 00 00 00 00  >gRu@...........
0ee6f9d4  00 00 00 00 94 fa e6 0e-24 fa e6 0e d1 c8 9e ef  ........$.......
0ee6f9e4  28 6e 4b 16 01 00 00 00-68 b0 20 16 35 88 0c 00  (nK.....h. .5...


It might mean that the data was freed the a new block allocated at the same address; or the data was overwritten.

The CountryLookup() method is only invoked by two spots in the cdoe. One is synchronous and the memory is allocated on the stack. The other is dynamically allocated and handed off.

If the synchronous code were the sender, it would still be waiting for its reply. There's no call stack at that place (which is rom the IP Listener, where a "CountryLookup" or "CountryCallsignLookup" command is being handled.)

So the other, asynchronous sender must be the culprit. Thing is, that code is pretty simple: it allocates its memory; then it queues the request to the thread. There's no apparent opportunity for the memory to be freed early and reallocated.

K7ZCZ

2018-09-11 19:04

manager   ~0006201

I believe this is fixed with Mantis 2862

g3ucq

2018-09-28 10:57

viewer   ~0006246

Logbook is stable for me.

WA9PIE

2018-09-28 21:42

administrator   ~0006250

Resolved as part of the 888 build.

Issue History

Date Modified Username Field Change
2018-05-23 10:25 KB3NPH New Issue
2018-05-23 10:25 KB3NPH File Added: Logbook Failure.jpg
2018-05-23 10:32 KB3NPH Note Added: 0005095
2018-05-27 10:08 K7ZCZ Assigned To => KB3NPH
2018-05-27 10:08 K7ZCZ Status new => feedback
2018-05-27 10:08 K7ZCZ Note Added: 0005115
2018-06-23 17:38 WA9PIE Summary Ticket #986640 - Logbook Lockup => Logbook Lockup
2018-06-23 17:38 WA9PIE Additional Information Updated View Revisions
2018-06-27 08:45 KB3NPH Note Added: 0005497
2018-06-27 13:06 KB3NPH Note Added: 0005502
2018-06-27 14:14 KB3NPH Assigned To KB3NPH => K7ZCZ
2018-06-28 20:20 K7ZCZ Note Added: 0005528
2018-06-28 21:30 K7ZCZ Note Added: 0005531
2018-06-28 21:57 K7ZCZ Note Added: 0005532
2018-06-28 23:10 K7ZCZ Note Added: 0005533
2018-07-23 10:52 K7ZCZ Relationship added related to 0002813
2018-07-29 16:44 K7ZCZ Relationship added related to 0002732
2018-08-28 15:34 K7ZCZ Relationship added related to 0002862
2018-09-11 19:04 K7ZCZ Status feedback => resolved
2018-09-11 19:04 K7ZCZ Resolution open => fixed
2018-09-11 19:04 K7ZCZ Note Added: 0006201
2018-09-28 07:47 K7ZCZ Project 1 - Backlog => 3 - Current Dev List
2018-09-28 07:47 K7ZCZ Fixed in Version => 6.4.0.887
2018-09-28 10:57 g3ucq Note Added: 0006246
2018-09-28 21:40 WA9PIE Project 3 - Current Dev List => Ham Radio Deluxe
2018-09-28 21:41 WA9PIE Testing Not Started => Beta Successful
2018-09-28 21:41 WA9PIE Fixed in Version 6.4.0.887 => 6.4.0.888
2018-09-28 21:42 WA9PIE Status resolved => closed
2018-09-28 21:42 WA9PIE Note Added: 0006250