View Issue Details

IDProjectCategoryView StatusLast Update
00028213 - Current Dev ListBugpublic2018-12-01 12:47
ReporterK7ZCZAssigned To 
PrioritynormalSeveritycrashReproducibilityhave not tried
Status newResolutionopen 
Product Version6.4.0.840 
Target VersionFixed in Version 
Summary0002821: Rig control: crash when reading on/off commands
Description
I don't have a repro for this, and pretty scarce information, as the issue is collected form the Microsoft Dev Center. The dashboard reports a pretty frequent crash in Rig Control when calling ReadOnOffCommoon() in CRadioOptions.

The crash is attached as as a spreadsheet. I'm afraid the call stack isn't completely accurate, as the ReadOnOffCommon() doesn't directly call abort(). Perhaps there's a hidden call to abort(), or maybe the call stack got trashed and I'm missing a frame or two.

According to the dashboard, this is a frequent cause of crashes so I want to record it and collect evidence and see what we can do.
TagsNo tags attached.
ModuleRig Control
Sub-ModuleRig Control
TestingNot Started

Activities

K7ZCZ

2018-07-28 11:32

administrator  

Mantis2821.xlsx (13,195 bytes)

K7ZCZ

2018-08-02 18:57

administrator   ~0005906

This check in refactors the KenwodReadString() function a bit so that debugging this issue might be a little easier.

https://hrdsoftware.visualstudio.com/web/cs.aspx?pcguid=024933d8-393e-4d7b-806f-280bdbd42f73&cs=4262

K7ZCZ

2018-08-15 07:51

administrator   ~0005978

Here's another stack trace for the same issue; slightly different call site pattern, reported against build 873. The fix I made on August 2 is not in build 873. It first appears in build 874 beta, and 876 release.

Mantis2821Stack2.xlsx (13,270 bytes)

K7ZCZ

2018-10-22 23:25

administrator   ~0006333

The refactoring helps just a bit, but still no conclusion. The attached stack is from the 893 build. This is a common crash, about 17% of the crashes for RigControl, according to the Microsoft dashboard.

It seems a call to a checked string function is going out of range, but I have no smoking gun because the stack trace has a bit of a funny shape.

Mantis2821Stack893.xlsx (9,805 bytes)

K7ZCZ

2018-12-01 08:26

administrator   ~0006500

This is still happening in 903. Two stacks are attached; very different paths, but they're both heading for KenwoodReadString() and end up calling abort().

Mantis2821Stack903_2.xlsx (12,408 bytes)
Mantis2821Stack903_1.xlsx (9,623 bytes)

K7ZCZ

2018-12-01 12:47

administrator   ~0006507

The attached Mantis2821Stack903_3Icom spreadsheet has a similar crash but is against an Icom radio.

The call site before the crash is here:
0:000> u
HamRadioDeluxe!CRadioOptions::ReadOnOffCommon+0x2e41 [c:\ham radio\hamradiodeluxe\radiooptions.cpp @ 19277]:
0051f731 81fe80220000    cmp     esi,2280h
0051f737 72cb            jb      HamRadioDeluxe!CRadioOptions::ReadOnOffCommon+0x2e14 (0051f704)
0051f739 e9e2000000      jmp     HamRadioDeluxe!CRadioOptions::ReadOnOffCommon+0x2f30 (0051f820)
0051f73e 8d347f          lea     esi,[edi+edi*2]
0051f741 8a04f588d0a200  mov     al,byte ptr HamRadioDeluxe!aButtonsOnOffCIV2+0x8 (00a2d088)[esi*8]
0051f748 8a0cf589d0a200  mov     cl,byte ptr HamRadioDeluxe!aButtonsOnOffCIV2+0x9 (00a2d089)[esi*8]
0051f74f 8a14f58ad0a200  mov     dl,byte ptr HamRadioDeluxe!aButtonsOnOffCIV2+0xa (00a2d08a)[esi*8]
0051f756 88442478        mov     byte ptr [esp+78h],al
0:000> u
HamRadioDeluxe!CRadioOptions::ReadOnOffCommon+0x2e6a [c:\ham radio\hamradiodeluxe\radiooptions.cpp @ 19285]:
0051f75a 888c24ac000000  mov     byte ptr [esp+0ACh],cl
0051f761 88942424010000  mov     byte ptr [esp+124h],dl
0051f768 84c0            test    al,al
0051f76a 7513            jne     HamRadioDeluxe!CRadioOptions::ReadOnOffCommon+0x2e8f (0051f77f)
0051f76c 84c9            test    cl,cl
0051f76e 750f            jne     HamRadioDeluxe!CRadioOptions::ReadOnOffCommon+0x2e8f (0051f77f)
0051f770 8b442418        mov     eax,dword ptr [esp+18h]
0051f774 8b905c010000    mov     edx,dword ptr [eax+15Ch]
0:000> u
HamRadioDeluxe!CRadioOptions::ReadOnOffCommon+0x2e8a [c:\ham radio\hamradiodeluxe\radiooptions.cpp @ 19296]:
0051f77a e977000000      jmp     HamRadioDeluxe!CRadioOptions::ReadOnOffCommon+0x2f06 (0051f7f6)
0051f77f 80f9ff          cmp     cl,0FFh
0051f782 7513            jne     HamRadioDeluxe!CRadioOptions::ReadOnOffCommon+0x2ea7 (0051f797)
0051f784 3ad1            cmp     dl,cl
0051f786 752a            jne     HamRadioDeluxe!CRadioOptions::ReadOnOffCommon+0x2ec2 (0051f7b2)
0051f788 ff742478        push    dword ptr [esp+78h]
0051f78c 8b4c241c        mov     ecx,dword ptr [esp+1Ch]
0051f790 e80b3ef4ff      call    HamRadioDeluxe!CConnection::CIVReadStatus (004635a0)
0:000> u
HamRadioDeluxe!CRadioOptions::ReadOnOffCommon+0x2ea5 [c:\ham radio\hamradiodeluxe\radiooptions.cpp @ 19304]:
0051f795 eb39            jmp     HamRadioDeluxe!CRadioOptions::ReadOnOffCommon+0x2ee0 (0051f7d0)
0051f797 80faff          cmp     dl,0FFh
0051f79a 7516            jne     HamRadioDeluxe!CRadioOptions::ReadOnOffCommon+0x2ec2 (0051f7b2)
0051f79c ffb424ac000000  push    dword ptr [esp+0ACh]
0051f7a3 8b4c241c        mov     ecx,dword ptr [esp+1Ch]
0051f7a7 ff74247c        push    dword ptr [esp+7Ch]
0051f7ab e81040f4ff      call    HamRadioDeluxe!CConnection::CIVReadStatus (004637c0)
0051f7b0 eb1e            jmp     HamRadioDeluxe!CRadioOptions::ReadOnOffCommon+0x2ee0 (0051f7d0)


The return address would be 51F7B0, but we end up at _abort() instead. That means the previous call, CConnection:CIVReadStatus(), either called (or called something that called) _abort. This is the same pattern in the other stacks, just that it's against an Icom radio instead of a Kenwood radio. The Icom code in CIVReadStatus() is far simpler than the similar code for the Kenwood radios.

In disassembling the CIVReadStatus() code, we don't find that too much work is done. As far as I can tell, the only function call in that implementation suspect of calling _abort() is the CString constructor. Here's the code:

0:000> u 004637c0
HamRadioDeluxe!CConnection::CIVReadStatus [c:\ham radio\hamradiodeluxe\connectionciv.cpp @ 902]:
004637c0 55              push    ebp
004637c1 8bec            mov     ebp,esp
004637c3 6aff            push    0FFFFFFFFh
004637c5 68f1889300      push    offset HamRadioDeluxe!OleUIAddVerbMenuW+0x8665 (009388f1)
004637ca 64a100000000    mov     eax,dword ptr fs:[00000000h]
004637d0 50              push    eax
004637d1 81ec2c010000    sub     esp,12Ch
004637d7 a1e0f1b600      mov     eax,dword ptr [HamRadioDeluxe!__security_cookie (00b6f1e0)]
0:000> u
HamRadioDeluxe!CConnection::CIVReadStatus+0x1c [c:\ham radio\hamradiodeluxe\connectionciv.cpp @ 902]:
004637dc 33c5            xor     eax,ebp
004637de 8945f0          mov     dword ptr [ebp-10h],eax
004637e1 53              push    ebx
004637e2 56              push    esi
004637e3 57              push    edi
004637e4 50              push    eax
004637e5 8d45f4          lea     eax,[ebp-0Ch]
004637e8 64a300000000    mov     dword ptr fs:[00000000h],eax
0:000> u
HamRadioDeluxe!CConnection::CIVReadStatus+0x2e [c:\ham radio\hamradiodeluxe\connectionciv.cpp @ 902]:
004637ee 8bf9            mov     edi,ecx
004637f0 6a01            push    1
004637f2 8d4728          lea     eax,[edi+28h]
004637f5 50              push    eax
004637f6 8d8dd8feffff    lea     ecx,[ebp-128h]
004637fc e854931600      call    HamRadioDeluxe!CSingleLock::CSingleLock (005ccb55)
00463801 6a01            push    1
00463803 8d4708          lea     eax,[edi+8]
0:000> u
HamRadioDeluxe!CConnection::CIVReadStatus+0x46 [c:\ham radio\hamradiodeluxe\connectionciv.cpp @ 904]:
00463806 c745fc00000000  mov     dword ptr [ebp-4],0
0046380d 50              push    eax
0046380e 8d8dccfeffff    lea     ecx,[ebp-134h]
00463814 e83c931600      call    HamRadioDeluxe!CSingleLock::CSingleLock (005ccb55)
00463819 c645fc01        mov     byte ptr [ebp-4],1
0046381d e8aed91400      call    HamRadioDeluxe!AfxGetStringManager (005b11d0)
00463822 33c9            xor     ecx,ecx
00463824 8bd0            mov     edx,eax
0:000> u
HamRadioDeluxe!CConnection::CIVReadStatus+0x66 [c:\ham radio\hamradiodeluxe\connectionciv.cpp @ 907]:
00463826 85d2            test    edx,edx
00463828 0f95c1          setne   cl
0046382b 85c9            test    ecx,ecx
0046382d 750a            jne     HamRadioDeluxe!CConnection::CIVReadStatus+0x79 (00463839)
0046382f 6805400080      push    80004005h
00463834 e8178ffaff      call    HamRadioDeluxe!ATL::AtlThrowImpl (0040c750)
00463839 8b02            mov     eax,dword ptr [edx]
0046383b 8bca            mov     ecx,edx
0:000> u
HamRadioDeluxe!CConnection::CIVReadStatus+0x7d [c:\ham radio\hamradiodeluxe\connectionciv.cpp @ 907]:
0046383d ff500c          call    dword ptr [eax+0Ch]
00463840 83c010          add     eax,10h
00463843 8985e8feffff    mov     dword ptr [ebp-118h],eax
00463849 0fb6450c        movzx   eax,byte ptr [ebp+0Ch]
0046384d 33db            xor     ebx,ebx
0046384f 50              push    eax
00463850 8b4508          mov     eax,dword ptr [ebp+8]
00463853 0fb6c0          movzx   eax,al



The most evident code we have is a branch that tests the pointer for the string manager to NULL and throws an exception if it is not found. The throw code starts at 0046382f. I've written a test program that does the same thing, forcing a null string manager. In a release build, that does throw an exception but the exception is handled by the unhandled filter and ends up calling abort from the filter rather than from the current stack. My test program crash looks like this:

0:000> kb
 # ChildEBP RetAddr  Args to Child              
00 00f5efb0 0101cbb8 4d0d5135 0101bc66 00000000 FormattingWidth!abort+0x28 [f:\dd\vctools\crt\crtw32\misc\abort.c @ 88] 
01 00f5efe0 0101bca6 00f5f084 74675cf0 00f5f0b4 FormattingWidth!terminate+0x33 [f:\dd\vctools\crt\crtw32\eh\hooks.cpp @ 96] 
02 00f5efe8 74675cf0 00f5f0b4 6353fea6 00000000 FormattingWidth!__CxxUnhandledExceptionFilter+0x40 [f:\dd\vctools\crt\crtw32\eh\unhandld.cpp @ 39] 
03 00f5f084 77b2d4e0 00f5f0b4 77afe822 00f5f854 KERNELBASE!UnhandledExceptionFilter+0x1a0
04 00f5f854 77af2ffa ffffffff 77b0ec42 00000000 ntdll!__RtlUserThreadStart+0x3a4e3
05 00f5f864 00000000 01015099 00cf5000 00000000 ntdll!_RtlUserThreadStart+0x1b



My guess was that the call pattern was crashing into abort() because manager wasn't available -- maybe this thread is untamed and running at shutdown, for example. But given this experiment, I'd have to expect a different call stack in that case.

We do know the call to _abort() eliminates some possibilities; it's not a call to _security_failure(), for example. Maybe the unhandled exception is the correct guess, or maybe there's something else that can call _abort() directly.

Mantis2821Stack903_3Icom.xlsx (9,626 bytes)

Issue History

Date Modified Username Field Change
2018-07-28 11:31 K7ZCZ New Issue
2018-07-28 11:32 K7ZCZ File Added: Mantis2821.xlsx
2018-08-02 18:57 K7ZCZ Note Added: 0005906
2018-08-15 07:51 K7ZCZ File Added: Mantis2821Stack2.xlsx
2018-08-15 07:51 K7ZCZ Note Added: 0005978
2018-10-22 23:25 K7ZCZ File Added: Mantis2821Stack893.xlsx
2018-10-22 23:25 K7ZCZ Note Added: 0006333
2018-12-01 08:26 K7ZCZ File Added: Mantis2821Stack903_2.xlsx
2018-12-01 08:26 K7ZCZ File Added: Mantis2821Stack903_1.xlsx
2018-12-01 08:26 K7ZCZ Note Added: 0006500
2018-12-01 12:47 K7ZCZ File Added: Mantis2821Stack903_3Icom.xlsx
2018-12-01 12:47 K7ZCZ Note Added: 0006507