View Issue Details

IDProjectCategoryView StatusLast Update
0002812Ham Radio DeluxeBugpublic2019-01-16 22:04
ReporterK7ZCZ 
Assigned ToK7ZCZ 
PrioritynormalSeverityminorReproducibilityhave not tried
Status closedResolutionfixed 
Product Version6.4.0.840 
Target VersionFixed in Version6.5.0.183 
Summary0002812: Logbook: EQSL logging code may over-step array bounds
DescriptionA loop that reads a response from a web request sent to EQSL.com for logging may over-step an array declared on the stack and cause the Logbook to shut down.

I don't have a specific repro for this problem. It comes from reports sent to the Microsoft Dev Center Dashboard for our application. The Dev Center reports about 1.25% of crashes collected for build 840 of the Logbook are due to this exception: SOFTWARE_NX_FAULT_FAIL_FAST_RANGE_CHECK_FAILURE_c0000409_HRDLogbook.exe!Unknown

The provided stack trace is attached as LogbookRangeCheck.tsv.
Additional InformationThe visible symptom of this problem is that the logbook will close without showing the HRD minidump dialog. It might write a Minidump if Windows is configured to do so, but will store it in the system-defined directory rather than HRD's own directory. Note that there are many possible causes for the Logbook closing without warning; this is just one of them.
TagsNo tags attached.
ModuleLogbook
Sub-ModuleeQSL
Testing Not Tested

Relationships

Activities

K7ZCZ

2018-07-19 13:44

manager  

LogbookRangeCheck.tsv (377 bytes)

K7ZCZ

2018-07-19 14:21

manager   ~0005744

Amazingly, the Dev Center Dashboard doesn't provide minidumps anymore. Maybe this is because we're not registered dev partners with Microsoft, or maybe it's because of some data collection/retention laws.

Instead of a minidump, we get this TSV file with the crash. The stack might show what's going on -- or it might not. In this case, we're pretty lucky, since the stack makes the problem apparent. The TSV file is a Tab Separated Variable file that shows the module and the offset of the stack trace at the time of the crash. Even then, it's a really weak stack trace.

The main problem is that the TSV file doesn't contain any parameters; no memmory, either. Not even registers. Worse, it doesn't even contain our private symbols. To make the stack trace in the TSV file useful, I have to manually traverse it in WinDBG. That isn't too hard, but it's pretty tedious.

With symbols and a copy of HRDLogbook.EXE from the 840 build, I can load the image with "windbg -z HRDLogbook.exe". Once loaded, the "lm" command shows the loaded modules:

0:000> lm
start    end        module name
00400000 01afa000   HRDLogbook   (private pdb symbols)  f:\mysymbolcache\HRDLogbook.pdb\3F0851EAC6B84DCEA9755FAA643E50491\HRDLogbook.pdb


Of course, this just gets the logbook code and its symbols into memory. We're not running the code; and the code has never run, so we've got no context. We do have, however, the code laide out in memory and the symbols loaded.

The top callstack frame in th TSV file says it's at offset 0x17DDB5 in the image. The above lm command shows that the Logbook loaded at 0x00400000, so we have to do some math before we can find where the stack frame is:

0:000> ? 0x00400000 + 0x17DDB5
Evaluate expression: 5758389 = 0057ddb5


Disassembling code at that address reveals we're inside the report security failure handler:

0:000> u 0x0057ddb5
HRDLogbook!__report_securityfailure+0x17 [f:\dd\vctools\crt\crtw32\misc\gs_report.c @ 409]:
0057ddb5 cd29            int     29h
0057ddb7 a3905bd900      mov     dword ptr [HRDLogbook!GS_ContextRecord+0xb0 (00d95b90)],eax
0057ddbc 890d8c5bd900    mov     dword ptr [HRDLogbook!GS_ContextRecord+0xac (00d95b8c)],ecx
0057ddc2 8915885bd900    mov     dword ptr [HRDLogbook!GS_ContextRecord+0xa8 (00d95b88)],edx
0057ddc8 891d845bd900    mov     dword ptr [HRDLogbook!GS_ContextRecord+0xa4 (00d95b84)],ebx
0057ddce 8935805bd900    mov     dword ptr [HRDLogbook!GS_ContextRecord+0xa0 (00d95b80)],esi
0057ddd4 893d7c5bd900    mov     dword ptr [HRDLogbook!GS_ContextRecord+0x9c (00d95b7c)],edi
0057ddda 668c15a85bd900  mov     word ptr [HRDLogbook!GS_ContextRecord+0xc8 (00d95ba8)],ss


That's not useful, but it does get us one frame into the dump. I've copied and pasted the symbolic address into a spreadsheet built from the TSV file and attached it.

We repeat the process for the next stack entry in the TSV, at offset 0x17DD9C:

0:000> ? 0x00400000 + 0x17DD9C
Evaluate expression: 5758364 = 0057dd9c
0:000> u 0x57dd9c
HRDLogbook!__report_rangecheckfailure+0xa [f:\dd\vctools\crt\crtw32\misc\gs_report.c @ 498]:
0057dd9c 5d              pop     ebp
0057dd9d c3              ret
HRDLogbook!__report_securityfailure [f:\dd\vctools\crt\crtw32\misc\gs_report.c @ 398]:
0057dd9e 55              push    ebp
0057dd9f 8bec            mov     ebp,esp
0057dda1 81ec1c030000    sub     esp,31Ch
0057dda7 6a17            push    17h
0057dda9 e822270200      call    HRDLogbook!IsProcessorFeaturePresent (005a04d0)
0057ddae 85c0            test    eax,eax


This gives us the next frame, which is at the end of the report rangecheckfialure handler. That matches the bucket we saw: a range check has failed. Normally, a minidump would include the stack -- and also include all the parameters the range check failure carried; those would point straight to the problem. But we dont' get any of that information from these minimal TSV reports.

So, on to the next frame, at 0x4ED966:

0:000> ? 0x0400000 + 0x4ED966
Evaluate expression: 9361766 = 008ed966
0:000> u 0x008ed966
HRDLogbook!CBackgroundProcessingThread::EqslUploadSend+0x3d6 [c:\ham radio\logbook\hrdlogbook\backgroundprocessingthreadeqsl.cpp @ 290]:
008ed966 8b85b4efffff    mov     eax,dword ptr [ebp-104Ch]
008ed96c 3dc8000000      cmp     eax,0C8h
008ed971 741b            je      HRDLogbook!CBackgroundProcessingThread::EqslUploadSend+0x3fe (008ed98e)
008ed973 85c0            test    eax,eax
008ed975 7417            je      HRDLogbook!CBackgroundProcessingThread::EqslUploadSend+0x3fe (008ed98e)
008ed977 50              push    eax
008ed978 8d85dcefffff    lea     eax,[ebp-1024h]
008ed97e 685048c400      push    offset HRDLogbook!`string' (00c44850)


We're finally inside the logbook code. Note that the stack frame address given is the return address, whatver popped this exception is before the address we computed, and where we started disassembly. For a reason I couldn't detect, the "ub" command doesn't work here so I disassemble a range. I guess I could disassemble the whole EqslUploadSend function, but I don't need quite that much. By trial and error, I have this range disassembled. Instead of disassembling a range (with "u 008ed778 008ed966"), I repeatedly execute a single "u" command starting at 008ed800 to get the periodic symbolic addresses ... which include references to matching positions in the source code. The source references are valuable landmarks for interpreting the assembly code:

0:000> u
HRDLogbook!CBackgroundProcessingThread::EqslUploadSend+0x261 [c:\ham radio\logbook\hrdlogbook\backgroundprocessingthreadeqsl.cpp @ 259]:
008ed7f1 e88d6eb7ff      call    HRDLogbook!CHttpFile::QueryInfoStatusCode (00464683)
008ed7f6 68ff0f0000      push    0FFFh
008ed7fb 8d85edefffff    lea     eax,[ebp-1013h]
008ed801 c685ecefffff00  mov     byte ptr [ebp-1014h],0
008ed808 6a00            push    0
008ed80a 50              push    eax
008ed80b e8c066c9ff      call    HRDLogbook!memset (00583ed0)
008ed810 8b06            mov     eax,dword ptr [esi]
0:000> u
HRDLogbook!CBackgroundProcessingThread::EqslUploadSend+0x282 [c:\ham radio\logbook\hrdlogbook\backgroundprocessingthreadeqsl.cpp @ 265]:
008ed812 8d8decefffff    lea     ecx,[ebp-1014h]
008ed818 83c40c          add     esp,0Ch
008ed81b 6800100000      push    1000h
008ed820 51              push    ecx
008ed821 8bce            mov     ecx,esi
008ed823 ff5038          call    dword ptr [eax+38h]
008ed826 8d8d90efffff    lea     ecx,[ebp-1070h]
008ed82c 8bf0            mov     esi,eax
0:000> u
HRDLogbook!CBackgroundProcessingThread::EqslUploadSend+0x29e [c:\ham radio\logbook\hrdlogbook\backgroundprocessingthreadeqsl.cpp @ 266]:
008ed82e e8f6b5b5ff      call    HRDLogbook!CStringArray::CStringArray (00448e29)
008ed833 c78590efffff5480ca00 mov dword ptr [ebp-1070h],offset HRDLogbook!COwnStringArray::`vftable' (00ca8054)
008ed83d c645fc09        mov     byte ptr [ebp-4],9
008ed841 81fe00100000    cmp     esi,1000h
008ed847 0f8314010000    jae     HRDLogbook!CBackgroundProcessingThread::EqslUploadSend+0x3d1 (008ed961)
008ed84d c68435ecefffff00 mov     byte ptr [ebp+esi-1014h],0


As the source references indicate, we're around line 259 of BackgroundProcessingThreadEQSL.cpp, which looks like this:

    pFile->QueryInfoStatusCode(dwReturn);	// line 259

    //
    //	Get any reply.
    //
    char            szBuff[4096] = "";
    UINT            nRead        = pFile->Read(szBuff, ARRAYSIZE(szBuff));
    COwnStringArray astrStatus;

    szBuff[nRead] = 0;

    while (nRead > 0)
    {


The call to QueryInfoStatusCode() is on line 259. The disassembly shows that call.

The next instructions in the disassembly initialize the szBuff array by filling it full of zeroes. One byte is stored directly, the remaining bytes are filled with a call to memset().

At 08ED810, the code sets up the call to pFile-Read() with the parameters it needs: the address of szBuff, and then the array size of 0x1000 (which is 4096).

The call is made, and the return value comes back in EAX, which is copied to ESI. At this point, the ESI register is effectively the nRead variable. The local CStringArray is constructed next.

Finally, at address 08ED841, the return value (ESI, symbolically nRead) is compared to the maximum allowable index of the array -- 0x1000. If above or equal, a branch is taken to 8ED961. That's the code that pitches the exception chain we found in the call stack:

0:000> u 008ED961
HRDLogbook!CBackgroundProcessingThread::EqslUploadSend+0x3d1 [c:\ham radio\logbook\hrdlogbook\backgroundprocessingthreadeqsl.cpp @ 284]:
008ed961 e82c04c9ff      call    HRDLogbook!__report_rangecheckfailure (0057dd92)


The bug here is that pFile->Read() is called, and the number of bytes read is taken back. The bytes read might be exactly equal to the size of the given array, so the code that sets 0 into szBuff[nRead] could walk past the end of the array. The disassembled code shows that the compiler has emitted code to detect this situation and pitch the execpetion, shutting it down and protecting the user from any security issue or data damage.

K7ZCZ

2018-07-19 14:21

manager  

Mantis2812Stack.xlsx (13,199 bytes)

K7ZCZ

2018-07-19 14:29

manager   ~0005745

Another bug is that the stream of bytes coming back from the web request is assumed to be broken at newlines. Looking at the loop, we can see a block of 4096 bytes is read and assumed to end at a newline. It might, or might not, end on a newline -- but this code forces it to do so.

At this point, the code has an off-by-one mistake in counting the number of bytes available in a buffer. We also have a problem with building a string out of the blocks retrieved into a cohesive response.

But here's what really makes me hit the ceiling: this code -- a loop that sends a web request, builds a response and chops it into a string array at each newline -- is copied and pasted. A sensible approach would be to build a function or class and re-use the code properly, in a modular fashion. Instead, what's happened here is that the cdoe was written and pasted around.

The two bugs I've identified (are there more?) exist in nine different places:


  • BackgroundProcessingThreadEqsl.cpp around line 264

  • BackgroundprocessingThreadQRZ.cpp around line 350

  • BackgroundProcessingThreadWebLog.cpp around line 710

  • EqslUpload.cpp around line 618

  • PrefsLOTW.cpp around line 1904

  • PrefsLOTW.cpp around line 2193

  • PrefsLOTW.cpp around line 2234

  • PrefsLOTWDownload.cpp around line 1089

  • qrzupload.cpp around line 756



A correct fix aggressively refactors the copied-and-pasted code into a function (or class), eliminates the identified bugs just once, and re-writes the nine calling locations listed above to make use of the new function/class.

K7ZCZ

2018-07-19 19:39

manager   ~0005749

Partially fixed with this checkin -- just the overrun, not the refactoring.
https://hrdsoftware.visualstudio.com/HRD/_versionControl/changeset/4243

K7ZCZ

2018-08-11 11:58

manager   ~0005964

The WebLog.XLSX file shows the call stack in 846 of this problem occuring when doing an upload to the WebLog site. Same bug, already fixed with Change 4243 above.

(But still not the complete fix with refactoring ...)

HRDMantis2812Stack WebLog.xlsx (9,766 bytes)

K7ZCZ

2018-12-01 14:51

manager   ~0006508

I'm no longer seeing this issue in the Microsoft Dashboard.

WA9PIE

2018-12-21 00:56

administrator   ~0006760

I'll accept this as complete.

(If there's something we need to do to register with Microsoft to get access to the dumps, I'm open to that.)

Issue History

Date Modified Username Field Change
2018-07-19 13:44 K7ZCZ New Issue
2018-07-19 13:44 K7ZCZ File Added: LogbookRangeCheck.tsv
2018-07-19 14:21 K7ZCZ Note Added: 0005744
2018-07-19 14:21 K7ZCZ File Added: Mantis2812Stack.xlsx
2018-07-19 14:29 K7ZCZ Note Added: 0005745
2018-07-19 18:11 K7ZCZ Description Updated View Revisions
2018-07-19 18:11 K7ZCZ Steps to Reproduce Updated View Revisions
2018-07-19 18:11 K7ZCZ Additional Information Updated View Revisions
2018-07-19 19:39 K7ZCZ Note Added: 0005749
2018-08-11 11:58 K7ZCZ File Added: HRDMantis2812Stack WebLog.xlsx
2018-08-11 11:58 K7ZCZ Note Added: 0005964
2018-12-01 14:51 K7ZCZ Assigned To => K7ZCZ
2018-12-01 14:51 K7ZCZ Status new => resolved
2018-12-01 14:51 K7ZCZ Resolution open => fixed
2018-12-01 14:51 K7ZCZ Note Added: 0006508
2018-12-09 15:44 K7ZCZ Fixed in Version => 6.5.0.132
2018-12-21 00:56 WA9PIE Status resolved => closed
2018-12-21 00:56 WA9PIE Additional Information Updated View Revisions
2018-12-21 00:56 WA9PIE Testing Not Started => Not Tested
2018-12-21 00:56 WA9PIE Note Added: 0006760
2019-01-16 22:04 WA9PIE Fixed in Version 6.5.0.132 => 6.5.0.183
2019-01-16 22:04 WA9PIE Project 3 - Current Dev List => Ham Radio Deluxe