TDengine R D Sharing Practices and experiences in solving memory leak problems with Windbg

Mondo Technology Updated on 2024-02-28

Memory leaks are a common problem that causes a program's memory footprint to gradually increase, eventually leading to system resource exhaustion or program crashes. AddressSanitizer (ASAN) and Valgrind are good memory detection tools, and TDengine's CI process uses ASAN. However, this time the memory leak problem occurred under Windows, and our CI has not covered it yet, so TDENGINE R&D chose to use WindBG to solve the problem. It turns out that under Windows, using WindBG is also a good option.

Memory leaks typically occur in the following situations:

The program did not properly free the allocated memory.

There are circular references in the program that cause the garbage collector to fail memory.

Third-party libraries or components with memory leaks in the program.

The main detection methods for memory leaks are as follows:

Static analysis tools: issues such as unreleased pointers or memory allocation errors that do not detect dynamic allocation of memory while the program is running.

Dynamic Analysis Tools: You can use Memory Allocation and Freeing*** to track memory allocation and release operations in your program and detect if there is a memory leak. However, using certain tools, such as Valgrind, may have some impact on the performance of the program.

Debuggers: windbg and gdb.

Pros & Cons:

Static analysis tools can catch problems early, but they can't detect how memory is dynamically allocated while the program is running.

Dynamic analysis tools can detect problems while the program is running, but they can affect program performance and can require significant time and resources when instrumenting large applications. However, running in a well-resourced test environment is not a problem, for example, ASAN has helped us find many problems.

The debugger can detect problems while the program is running and provides powerful profiling tools.

Using windbg to locate memory leaks, the GLAGS component is relied on to record all the memory requested and released by the program during the runtime, as well as the call stack information when the memory is requested. In this way, during the running of the program, the umdh component is used to record two snapshots, and by comparing the difference between the two snapshots, you can find out the memory application information that was applied but not released in the interval between the two snapshots. If there is a memory leak, the first edge of the diff result is usually the call stack information at the leak point. Of course, during the two snapshots, the memory leak should be triggered as much as possible to locate it more accurately. There will also be a small number of call messages in the diff result that the normal application has not had time to release, but the number of calls can be seen in the diff result, which is easier to identify.

taosdump error importing data in windows:

build and install latest tdengine 3.0 branch on windowsuse "taosbenchmark -i stmt -y" to create a lot of tables and data (10000 * 10000).use "taosdump -d test -o outputfile" to dump outuse "taos -s 'drop database test'" to drop databaseuse "taosdump -i inputfile" to dump in.
Error log: taosd "tsem init failed, errno: 28".

taosdump: dumpin**rodataimpl() ln7039 taos_stmt_execute() failed! reason: out of memory, timestamp: 1500000009256

The Gflags tool should be located at the path: C: Program Files (x86) Windows Kits 10 Debuggers X64 Gflags, if not, you can go directly to Microsoft's official *** installation:

After the installation is complete, run the gflags. command lineexe /i your_application.The exe can set the tracking target and set the relevant parameters. It is also possible to double-click to run, image file corresponds to the i parameter, and select the launcher your applicationexe and then select a different configuration.

1.Launch your applicationexe (I'm debugging taosdumpexe, so the bottom is taosdumpexe)

c:\program files (x86)\windows kits\10\debuggers\x64\gflags” -i taosdump.exe +ust

2.Copy the pdb file to the mysymbols directory, the pdb file stores the debugging information of the compiled program, which is generated with the executable program, and can be found in the application generation directory.

3.set pdb directory.

set _nt_symbol_path=c:\mysymbols;srv*c:\mycache*
4.Take the first memory snapshot.

"c:\program files (x86)\windows kits\10\debuggers\x64\umdh" -pn:taosdump.exe -f:c:\xstest\umdhlog\taosdump11.log
5.Take a second memory snapshot.

"c:\program files (x86)\windows kits\10\debuggers\x64\umdh" -pn:taosdump.exe -f:c:\xstest\umdhlog\taosdump12.log
6.Generate Snapshot Comparison Results (UMDH).

"c:\program files (x86)\windows kits\10\debuggers\x64\umdh" c:\xstest\umdhlog\taosdump11.log c:\xstest\umdhlog\taosdump12.log -f:c:\xstest\umdhlog\taosdumpdiff11_12.log
Because the taosdump program is doing a lot of business work from the start to the exit, memory leaks can easily occur between snapshots. 988040 6ecf0 means "number of applications number of releases", it is clear that a memory leak has occurred, and the leak point is in the sem init of the buildrequest function.

+ 919350 ( 988040 - 6ecf0) 201b0 allocs backtrace9cb6973f+ 1ea5c ( 201b0 - 1754) backtrace9cb6973f allocations ntdll!rtlpallocateheapinternal+948d5 taos!heap_alloc_dbg_internal+1f6 (minkernel\crts\ucrt\src\appcrt\heap\debug_heap.cpp, 359) taos!heap_alloc_dbg+4d (minkernel\crts\ucrt\src\appcrt\heap\debug_heap.cpp, 450) taos!_calloc_dbg+6c (minkernel\crts\ucrt\src\appcrt\heap\debug_heap.cpp, 518) taos!calloc+2e (minkernel\crts\ucrt\src\appcrt\heap\calloc.cpp, 30) taos!sem_init+5d (c:\workroom\tdengine\contrib\pthread\sem_init.c, 109) taos!buildrequest+209 (c:\workroom\tdengine\source\client\src\clientimpl.c, 192) taos!stmtcreaterequest+73 (c:\workroom\tdengine\source\client\src\clientstmt.c, 15) taos!stmtsettbname+115 (c:\workroom\tdengine\source\client\src\clientstmt.c, 588) taos!taos_stmt_set_tbname+7f (c:\workroom\tdengine\source\client\src\clientmain.c, 1350) taosdump!dumpin**rodataimpl+e25 (c:\workroom\tdengine\tools\taos-tools\src\taosdump.c, 6260) taosdump!dumpinone**rofile+3d2 (c:\workroom\tdengine\tools\taos-tools\src\taosdump.c, 7229) taosdump!dumpin**roworkthreadfp+20b (c:\workroom\tdengine\tools\taos-tools\src\taosdump.c, 7306) taosdump!ptw32_threadstart+cd (c:\workroom\tdengine\contrib\pthread\ptw32_threadstart.c, 233) taosdump!thread_start+9c (minkernel\crts\ucrt\src\appcrt\startup\thread.cpp, 97) kernel32!basethreadinitthunk+10 ntdll!rtluserthreadstart+2b
Next, check ** and modify, C language has a high degree of freedom to use memory, so it is also more troublesome. You can see that some paths miss the call to tsem destory.

For more details, please see here

If you want to do a good job, you must first sharpen your tools, master more tools and means, and solve the problem more calmly, windbg's way of locating memory leaks is very simple, but it is very effective. However, it's important to note that it relies on a pdb file, so remember to keep the pdb file when publishing your application. The pdb file contains symbolic information about the program, which helps us pinpoint the problem during debugging.

In addition, it can be seen from the ** problematic that the management of this memory is still relatively error-prone, the RAII mechanism can better avoid resource leakage, and the C language can also achieve a similar effect by simulating RAII, although it is not as smooth as C++, maybe you can consider optimizing it in the future.

The RAII (Resource Acquisition is Initialization) mechanism is an important resource management method, which correlates the acquisition of resources with the life cycle of objects. By getting resources in the constructor of an object and releasing resources in the destructor, we can ensure the proper management of resources and prevent problems such as resource leaks and memory leaks. The RAII mechanism is widely used in programming languages such as C++ and is an effective way to manage resources.

Related Pages