Recall from blog/2006-07-11 and TSRT-06-02 that any code relying on the implicit message size limitation of Second-class Mailslots could be exposing a vulnerability. I mentioned that the rare usage of Mailslots will severely mitigate the impact of this new "class" of vulnerability. The fact that no Mailslot bugs have emerged since the initial disclosure is evidence that my assumption was true.
casdscsvc.exe -> Asbrdcst.dll 20C14E8C push 0 ; lpSecurityAttributes 20C14E8E push 0 ; lReadTimeout 20C14E90 push 0 ; nMaxMessageSize 20C14E92 push offset Name ; "\\\\.\\mailslot\\CheyenneDS" 20C14E97 stosb 20C14E98 call ds:CreateMailslotA 20C14E9E cmp eax, INVALID_HANDLE_VALUE 20C14EA1 mov mailslot_handle, eax
Later the mailslot handle is read from into a 4k buffer. The read data is also passed to a routine which calls vsprintf into a 1k buffer.
casdscsvc.exe -> Asbrdcst.dll 20C15024 mov eax, mailslot_handle 20C15029 lea edx, [esp+1044h+Buffer_4k] 20C1502D push ecx ; nNumberOfBytesToRead 20C1502E push edx ; lpBuffer 20C1502F push eax ; hFile 20C15030 call edi ; ReadFile 20C15032 test eax, eax 20C15034 jz short read_failed 20C15036 lea ecx, [esp+3Dh] 20C1503A push ecx ; char 20C1503B push offset str_ReadmailslotS ; "ReadMailSlot: %s\n" 20C15040 call not_interesting_call_to_vsnprtinf 20C15045 add esp, 8 20C15048 lea edx, [esp+3Dh] 20C1504C push edx ; va_list 20C1504D push offset str_ReadmailslotS_0 ; "ReadMailSlot: %s" 20C15052 push 0 ; for_debug_log 20C15054 call vsprintf_into_1024_stack_buf_and_debug_log
One would imagine that at least one other instance of a Mailslot handling bug must exist elsewhere. Anyone?
Just wanted to drop a quick note regarding eEye's recently released bin-diffing tool:
For those of you who are interested, Ero and I are giving a 2-day course in Vegas at Blackhat on Reverse Engineering on Windows. While the class will have a malware centric focus, the main purpose of the course is to glean general reversing knowledge and techniques. In the process PaiMei will certainly be covered, explored and experimented with. With that shameless self plug said and done, onto the main reason behind this posting.
hooks = utils.hook_container()Next resolve and add hooks for your target functions. In our case, we will need to hook RtlAllocateHeap, RtlFreeHeap and RtlReAllocateHeap. All located within NTDLL.DLL:
a = dbg.func_resolve("ntdll", "RtlAllocateHeap") f = dbg.func_resolve("ntdll", "RtlFreeHeap") r = dbg.func_resolve("ntdll", "RtlReAllocateHeap") hooks.add(dbg, a, 3, None, RtlAllocateHeap) hooks.add(dbg, f, 3, None, RtlFreeHeap) hooks.add(dbg, r, 4, None, RtlReAllocateHeap)The first argument to the hook container object is an instance of PyDbg, the second is the address of the API to hook, followed by the number of arguments the API supports, a callback function for when the API is entered and finally a callback function for when the API exits. The entry-point callback provides you with the argument list, allowing you to instrument the arguments prior to passing control back to the API. The exit-point callback provides you with the argument list and return value, allowing you to instrument the return value prior to passing control back to the caller.
With that out of the way it's fairly trivial to generate and dynamically maintain a pgraph structure as well as display the results in real time through uDraw. Whenever RtlAllocateHeap is called, we'll create an orange node containing the address of the calling instruction, a blue node containing the allocation size and we'll connect the two nodes together. This is sufficient for a demo, but as we are hooking the lowest user-mode level heap manipulation routines the calling instruction address will likely lie within a Windows DLL and is not all that interesting. To improve this we could examine dbg.stack_unwind() and utilize the first address that lies within a non Microsoft DLL. Whenever RtlFreeHeap is called we will examine the arguments and remove the buffer address from the graph. Finally, whenever RtlReAllocateHeap is called, we'll resize the target buffer and paint the node yellow. We can then easily tie it to uDraw through the udraw_connector. All said and done, here is a flash excerpt from the code in action:
You can grab the code behind this application from heap_trace.py. As an experiment I tossed in some to disk rendering once the graph node count reaches 1000.
It's all pretty simple. One of the nice things about this class is that it (I think / hope) transparently takes care of various thread-related race conditions that make pairing arguments and return values more tricky than trivial.