最近协助分析了一个.net进程内存泄露的问题 过程分享给大家。
症状 客户的服务端.net进程出现分钟级的cpu抖动 接近100%后落回。
图1
分析 支持同学通过procdump.exe工具抓取了进程dump, 设定触发dump的条件为 若进程的CPU使用量超过80%持续1秒 则开始抓取。
procdump.exe -ma -s 1 -c 80 10672 f:\aliyun
Loading Dump File [E:\temp\201127\GameServer.exe_201119_171245.dmp\GameServer.exe_201119_171245.dmp]
User Mini Dump File with Full Memory: Only application data is available
?
Comment:
*** e:\soft\procdump\procdump.exe -ma -s 1 -c 80 10672 f:\aliyun
*** Process exceeded 80% CPU (system scale) for 1 second. Value: 88%. Hottest Thread: 4196 (0x1064).
?
************* Path validation summary **************
Response Time (ms) Location
Deferred srv*F:\symbols*https://msdl.microsoft.com/download/symbols
Symbol search path is: srv*F:\symbols*https://msdl.microsoft.com/download/symbols
Executable search path is:
Windows 8.1 Version 9600 MP (32 procs) Free x64
Product: Server, suite: TerminalServer DataCenter SingleUserTS
6.3.9600.18217 (winblue_ltsb.160124-0053)
Machine Name:
Debug session time: Thu Nov 19 17:12:45.000 2020 (UTC 8:00)
System Uptime: 38 days 3:36:48.460
Process Uptime: 0 days 0:33:22.000
?
在dump抓取时 所采样的系统CPU负载高达91%。
0:059 .loadby sos clr
0:059 !threadpool
CPU utilization: 91%
Worker Thread: Total: 57 Running: 3 Idle: 49 MaxLimit: 32767 MinLimit: 32
Work Request in Queue: 0
--------------------------------------
Number of Timers: 1
--------------------------------------
Completion Port Thread:Total: 89 Free: 88 MaxFree: 64 CurrentLimit: 89 MaxLimit: 1000 MinLimit: 65
?
查看dump抓取瞬间 有为数不多的几个线程在使用CPU。
1)????Thread 37
0:059 ~37s
mscorlib_ni!System.IO.FileStream.WriteFileNative(Microsoft.Win32.SafeHandles.SafeFileHandle, Byte[], Int32, Int32, System.Threading.NativeOverlapped*, Int32 ByRef)$##600184B 0x86:
00007ffb c5923d76 48894de0 mov qword ptr [rbp-20h],rcx ss:000000e2 16b0ee20 000000e3cde2fc10
0:037 kL
# Child-SP RetAddr Call Site
00 000000e2 16b0edf0 00007ffb c5923cc2 mscorlib_ni!System.IO.FileStream.WriteFileNative(Microsoft.Win32.SafeHandles.SafeFileHandle, Byte[], Int32, Int32, System.Threading.NativeOverlapped*, Int32 ByRef)$##600184B 0x86
01 000000e2 16b0ee50 00007ffb c5923aa7 mscorlib_ni!System.IO.FileStream.WriteCore(Byte[], Int32, Int32)$##600183D 0x62
02 000000e2 16b0eec0 00007ffb c5923a34 mscorlib_ni!System.IO.FileStream.FlushInternalBuffer()$##600182F 0x57
03 000000e2 16b0ef00 00007ffb c58d4f4c mscorlib_ni!System.IO.FileStream.Flush(Boolean)$##600182E 0x24
04 000000e2 16b0ef40 00007ffb 69ccac7b mscorlib_ni!System.IO.StreamWriter.Flush(Boolean, Boolean)$##60019BE 0x8c
05 000000e2 16b0efa0 00007ffb 69ccaaae 0x00007ffb 69ccac7b
06 000000e2 16b0efe0 00007ffb 69ca103e 0x00007ffb 69ccaaae
07 000000e2 16b0f030 00007ffb c58eca72 0x00007ffb 69ca103e
08 000000e2 16b0f070 00007ffb c58ec904 mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)$##6003A95 0x162
09 000000e2 16b0f140 00007ffb c58ec8c2 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)$##6003A94 0x14
0a 000000e2 16b0f170 00007ffb c5926472 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)$##6003A93 0x52
0b 000000e2 16b0f1c0 00007ffb c8fd6793 mscorlib_ni!System.Threading.ThreadHelper.ThreadStart()$##6003B8E 0x52
0c 000000e2 16b0f200 00007ffb c8fd6665 clr!CallDescrWorkerInternal 0x83
0d 000000e2 16b0f240 00007ffb c8fd736d clr!CallDescrWorkerWithHandler 0x4e
0e 000000e2 16b0f280 00007ffb c90bbf59 clr!MethodDescCallSite::CallTargetWorker 0xf8
0f 000000e2 16b0f380 00007ffb c8fd7ce5 clr!ThreadNative::KickOffThread_Worker 0x109
10 000000e2 16b0f5e0 00007ffb c8fd7c60 clr!Frame::Push 0x59
11 000000e2 16b0f620 00007ffb c8fd7b9e clr!FillInRegTypeMap 0x198
12 000000e2 16b0f720 00007ffb c8fd7d1f clr!FillInRegTypeMap 0xc1
13 000000e2 16b0f7b0 00007ffb c90bbe3b clr!FillInRegTypeMap 0x47
14 000000e2 16b0f810 00007ffb c919159f clr!ThreadNative::KickOffThread 0xdb
15 000000e2 16b0f8e0 00007ffb d90d13d2 clr!Thread::intermediateThreadProc 0x86
16 000000e2 16b0fa20 00007ffb d92254f4 kernel32!BaseThreadInitThunk 0x22
17 000000e2 16b0fa50 00000000 00000000 ntdll!RtlUserThreadStart 0x34
0:037 ub rip
mscorlib_ni!System.IO.FileStream.WriteFileNative(Microsoft.Win32.SafeHandles.SafeFileHandle, Byte[], Int32, Int32, System.Threading.NativeOverlapped*, Int32 ByRef)$##600184B 0x6c:
00007ffb c5923d5c c9 leave
00007ffb c5923d5d 4903d1 add rdx,r9
00007ffb c5923d60 4533c9 xor r9d,r9d
00007ffb c5923d63 4c894c2420 mov qword ptr [rsp 20h],r9
00007ffb c5923d68 4c8d4de8 lea r9,[rbp-18h]
00007ffb c5923d6c 448bc0 mov r8d,eax
00007ffb c5923d6f e8e40bf2ff call mscorlib_ni!System.Runtime.Remoting.Activation.ActivationServices.GetActivator()$##6005B45 (mscorlib_ni 0x434958) (00007ffb c5844958)
00007ffb c5923d74 33c9 xor ecx,ecx
?
2)????Thread 39
0:042 ~39s
mscorlib_ni!System.Text.UTF8Encoding.GetBytes(Char*, Int32, Byte*, Int32, System.Text.EncoderNLS)$##600675A 0x1df:
00007ffb c5963cef f7c280ff80ff test edx,0FF80FF80h
0:039 kL
# Child-SP RetAddr Call Site
00 000000e2 16d0ee40 00007ffb c58d50be mscorlib_ni!System.Text.UTF8Encoding.GetBytes(Char*, Int32, Byte*, Int32, System.Text.EncoderNLS)$##600675A 0x1df
01 000000e2 16d0eed0 00007ffb c58d4f17 mscorlib_ni!System.Text.EncoderNLS.GetBytes(Char[], Int32, Int32, Byte[], Int32, Boolean)$##6006608 0x11e
02 000000e2 16d0ef60 00007ffb 69ccac7b mscorlib_ni!System.IO.StreamWriter.Flush(Boolean, Boolean)$## 60019BE 0x57
03 000000e2 16d0efc0 00007ffb 69ccaaae 0x00007ffb 69ccac7b
04 000000e2 16d0f000 00007ffb 69ca103e 0x00007ffb 69ccaaae
05 000000e2 16d0f050 00007ffb c58eca72 0x00007ffb 69ca103e
06 000000e2 16d0f090 00007ffb c58ec904 mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)$##6003A95 0x162
07 000000e2 16d0f160 00007ffb c58ec8c2 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)$##6003A94 0x14
08 000000e2 16d0f190 00007ffb c5926472 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)$##6003A93 0x52
09 000000e2 16d0f1e0 00007ffb c8fd6793 mscorlib_ni!System.Threading.ThreadHelper.ThreadStart()$##6003B8E 0x52
0a 000000e2 16d0f220 00007ffb c8fd6665 clr!CallDescrWorkerInternal 0x83
0b 000000e2 16d0f260 00007ffb c8fd736d clr!CallDescrWorkerWithHandler 0x4e
0c 000000e2 16d0f2a0 00007ffb c90bbf59 clr!MethodDescCallSite::CallTargetWorker 0xf8
0d 000000e2 16d0f3a0 00007ffb c8fd7ce5 clr!ThreadNative::KickOffThread_Worker 0x109
0e 000000e2 16d0f600 00007ffb c8fd7c60 clr!Frame::Push 0x59
0f 000000e2 16d0f640 00007ffb c8fd7b9e clr!FillInRegTypeMap 0x198
10 000000e2 16d0f740 00007ffb c8fd7d1f clr!FillInRegTypeMap 0xc1
11 000000e2 16d0f7d0 00007ffb c90bbe3b clr!FillInRegTypeMap 0x47
12 000000e2 16d0f830 00007ffb c919159f clr!ThreadNative::KickOffThread 0xdb
13 000000e2 16d0f900 00007ffb d90d13d2 clr!Thread::intermediateThreadProc 0x86
14 000000e2 16d0fb40 00007ffb d92254f4 kernel32!BaseThreadInitThunk 0x22
15 000000e2 16d0fb70 00000000 00000000 ntdll!RtlUserThreadStart 0x34
0:039 ub rip
mscorlib_ni!System.Text.UTF8Encoding.GetBytes(Char*, Int32, Byte*, Int32, System.Text.EncoderNLS)$##600675A 0x1c0:
00007ffb c5963cd0 4889442438 mov qword ptr [rsp 38h],rax
00007ffb c5963cd5 e979050000 jmp mscorlib_ni!System.Text.UTF8Encoding.GetBytes(Char*, Int32, Byte*, Int32, System.Text.EncoderNLS)$##600675A 0x743 (00007ffb c5964253)
00007ffb c5963cda 488b4c2440 mov rcx,qword ptr [rsp 40h]
00007ffb c5963cdf 448b19 mov r11d,dword ptr [rcx]
00007ffb c5963ce2 488b4c2440 mov rcx,qword ptr [rsp 40h]
00007ffb c5963ce7 8b4904 mov ecx,dword ptr [rcx 4]
00007ffb c5963cea 418bd3 mov edx,r11d
00007ffb c5963ced 0bd1 or edx,ecx
?
3)????Thread 52
0:039 ~52s
clr!SVR::gc_heap::background_mark_simple1 0x48:
00007ffb c9162f38 488bd7 mov rdx,rdi
0:052 kL
# Child-SP RetAddr Call Site
00 000000e2 17e0f140 00007ffb c91631ae clr!SVR::gc_heap::background_mark_simple1 0x48
01 000000e2 17e0f1b0 00007ffb c9163f14 clr!SVR::gc_heap::background_mark_simple 0x91
02 000000e2 17e0f1e0 00007ffb c91628b4 clr!SVR::gc_heap::background_drain_mark_list 0x50
03 000000e2 17e0f210 00007ffb c934f660 clr!SVR::gc_heap::background_mark_phase 0x3bf
04 000000e2 17e0f2a0 00007ffb c9162244 clr! ?? ::FNODOBFM:: string 0x8082a
05 000000e2 17e0f2f0 00007ffb c919159f clr!SVR::gc_heap::bgc_thread_function 0x132
06 000000e2 17e0f340 00007ffb d90d13d2 clr!Thread::intermediateThreadProc 0x86
07 000000e2 17e0fb80 00007ffb d92254f4 kernel32!BaseThreadInitThunk 0x22
08 000000e2 17e0fbb0 00000000 00000000 ntdll!RtlUserThreadStart 0x34
0:052 ub rip
clr!SVR::gc_heap::background_mark_simple1 0x26:
00007ffb c9162f16 488bfa mov rdi,rdx
00007ffb c9162f19 488bd9 mov rbx,rcx
00007ffb c9162f1c 4c8989f01e0000 mov qword ptr [rcx 1EF0h],r9
00007ffb c9162f23 4d8d04c1 lea r8,[r9 rax*8]
00007ffb c9162f27 4c89442478 mov qword ptr [rsp 78h],r8
00007ffb c9162f2c 4533db xor r11d,r11d
00007ffb c9162f2f 4885ff test rdi,rdi
00007ffb c9162f32 0f84e1010000 je clr!SVR::gc_heap::background_mark_simple1 0x901 (00007ffb c9163119)
?
4)????Thread 113
0:052 ~113s
MSVCR120_CLR0400!memset 0x23:
00007ffb c8f0f8b3 f3aa rep stos byte ptr [rdi]
0:113 kL
# Child-SP RetAddr Call Site
00 000000e2 1c7fd580 000000ec 7cb76810 MSVCR120_CLR0400!memset 0x23
01 000000e2 1c7fd588 00007ffb c918f750 0x000000ec 7cb76810
02 000000e2 1c7fd590 00007ffb c918f3d2 clr!SVR::gc_heap::adjust_limit_clr 0xe0
03 000000e2 1c7fd5e0 00007ffb c914625f clr!SVR::gc_heap::allocate_small 0x3ae
04 000000e2 1c7fd6a0 00007ffb c58e0e5c clr!JIT_New 0x61f
*** WARNING: Unable to verify checksum for System.Core.ni.dll
*** ERROR: Module load completed but symbols could not be loaded for System.Core.ni.dll
05 000000e2 1c7fdae0 00007ffb c36fb3ae mscorlib_ni!System.Collections.Generic.List 1[System.__Canon].System.Collections.Generic.IEnumerable.GetEnumerator()$##60039A3 0x4c
06 000000e2 1c7fdb40 00007ffb 6ad21c2d System_Core_ni 0x2db3ae
07 000000e2 1c7fdbb0 00007ffb 6ad21544 0x00007ffb 6ad21c2d
08 000000e2 1c7fdc00 00007ffb 6ad0a240 0x00007ffb 6ad21544
09 000000e2 1c7fdc60 00007ffb 6ad09cfb 0x00007ffb 6ad0a240
0a 000000e2 1c7fdcc0 00007ffb 6ad0920c 0x00007ffb 6ad09cfb
0b 000000e2 1c7fdd00 00007ffb 6ad07790 0x00007ffb 6ad0920c
0c 000000e2 1c7fdd40 00007ffb 6ace5a30 0x00007ffb 6ad07790
0d 000000e2 1c7fdda0 00007ffb 6ace38e8 0x00007ffb 6ace5a30
0e 000000e2 1c7fde90 00007ffb 6ace1a7b 0x00007ffb 6ace38e8
0f 000000e2 1c7fdf80 00007ffb 6ace1407 0x00007ffb 6ace1a7b
10 000000e2 1c7fe080 00007ffb 6a80981e 0x00007ffb 6ace1407
11 000000e2 1c7fe0b0 00007ffb 6a8081db 0x00007ffb 6a80981e
12 000000e2 1c7fe110 00007ffb c58eca72 0x00007ffb 6a8081db
...
20 000000e2 1c7fe710 00007ffb c8fd6665 clr!CallDescrWorkerInternal 0x83
21 000000e2 1c7fe750 00007ffb c8fd736d clr!CallDescrWorkerWithHandler 0x4e
22 000000e2 1c7fe790 00007ffb c8fdaf69 clr!MethodDescCallSite::CallTargetWorker 0xf8
23 000000e2 1c7fe890 00007ffb c8fd7ce5 clr!QueueUserWorkItemManagedCallback 0x2a
24 000000e2 1c7fe980 00007ffb c8fd7c60 clr!Frame::Push 0x59
25 000000e2 1c7fe9c0 00007ffb c8fd7b9e clr!FillInRegTypeMap 0x198
26 000000e2 1c7feac0 00007ffb c8fd7d1f clr!FillInRegTypeMap 0xc1
27 000000e2 1c7feb50 00007ffb c8fdaa70 clr!FillInRegTypeMap 0x47
28 000000e2 1c7febb0 00007ffb c8fd82b8 clr!ManagedPerAppDomainTPCount::DispatchWorkItem 0xa0
29 000000e2 1c7fed30 00007ffb c8fd8195 clr!ThreadpoolMgr::ExecuteWorkRequest 0x64
2a 000000e2 1c7fed60 00007ffb c919159f clr!ThreadpoolMgr::WorkerThreadStart 0xf5
2b 000000e2 1c7fee00 00007ffb d90d13d2 clr!Thread::intermediateThreadProc 0x86
2c 000000e2 1c7ffbc0 00007ffb d92254f4 kernel32!BaseThreadInitThunk 0x22
2d 000000e2 1c7ffbf0 00000000 00000000 ntdll!RtlUserThreadStart 0x34
0:113 ub rip
MSVCR120_CLR0400!memset 0x6:
00007ffb c8f0f896 4983f810 cmp r8,10h
00007ffb c8f0f89a 0f825c010000 jb MSVCR120_CLR0400!memset 0x16c (00007ffb c8f0f9fc)
00007ffb c8f0f8a0 0fba25b08e0a0001 bt dword ptr [MSVCR120_CLR0400!_favor (00007ffb c8fb8758)],1
00007ffb c8f0f8a8 730e jae MSVCR120_CLR0400!memset 0x28 (00007ffb c8f0f8b8)
00007ffb c8f0f8aa 57 push rdi
00007ffb c8f0f8ab 488bf9 mov rdi,rcx
00007ffb c8f0f8ae 8bc2 mov eax,edx
00007ffb c8f0f8b0 498bc8 mov rcx,r8
?
但是我们看到这台机器是32核心的服务器
0:113 !cpuid
CP F/M/S Manufacturer MHz
0 6,5,7 2500
1 6,5,7 2500
2 6,5,7 2500
3 6,5,7 2500
4 6,5,7 2500
5 6,5,7 2500
6 6,5,7 2500
7 6,5,7 2500
8 6,5,7 2500
9 6,5,7 2500
10 6,5,7 2500
11 6,5,7 2500
12 6,5,7 2500
13 6,5,7 2500
14 6,5,7 2500
15 6,5,7 2500
16 6,5,7 2500
17 6,5,7 2500
18 6,5,7 2500
19 6,5,7 2500
20 6,5,7 2500
21 6,5,7 2500
22 6,5,7 2500
23 6,5,7 2500
24 6,5,7 2500
25 6,5,7 2500
26 6,5,7 2500
27 6,5,7 2500
28 6,5,7 2500
29 6,5,7 2500
30 6,5,7 2500
31 6,5,7 2500
那么 上述几个线程不至于将服务器的CPU飙高。 dump抓取时 实际上这个进程CPU占用并不高 因此 我们也就无法通过分析这个dump中的线程行为来直接找到high cpu的原因了。
我们留意到 这个dump本身也是很大的 dump文件本身在20G左右 而且绝大多数内存为.net托管。
0:113 !address -summary
?
Mapping file section regions...
Mapping module regions...
Mapping PEB regions...
Mapping TEB and stack regions...
Mapping heap regions...
Mapping page heap regions...
Mapping other regions...
Mapping stack trace database regions...
Mapping activation context regions...
?
--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
Free 166 7ff5 c19eb000 ( 127.960 TB) 99.97%
999 a 1af16000 ( 40.421 GB) 98.65% 0.03%
Stack 862 0 10240000 ( 258.250 MB) 0.62% 0.00%
Image 712 0 0a0b6000 ( 160.711 MB) 0.38% 0.00%
Heap 61 0 08fef000 ( 143.934 MB) 0.34% 0.00%
TEB 284 0 00238000 ( 2.219 MB) 0.01% 0.00%
Other 9 0 001d1000 ( 1.816 MB) 0.00% 0.00%
PEB 1 0 00001000 ( 4.000 kB) 0.00% 0.00%
?
--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_PRIVATE 1845 a 31a9f000 ( 40.776 GB) 99.52% 0.03%
MEM_IMAGE 1048 0 0ab32000 ( 171.195 MB) 0.41% 0.00%
MEM_MAPPED 35 0 02034000 ( 32.203 MB) 0.08% 0.00%
?
--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_FREE 166 7ff5 c19eb000 ( 127.960 TB) 99.97%
MEM_RESERVE 655 5 2804b000 ( 20.625 GB) 50.34% 0.02%
MEM_COMMIT 2273 5 165ba000 ( 20.349 GB) 49.66% 0.02%
?
--- Protect Summary (for commit) - RgnCount ----------- Total Size -------- %ofBusy %ofTotal
PAGE_READWRITE 1252 5 0a7de000 ( 20.164 GB) 49.21% 0.02%
PAGE_EXECUTE_READ 74 0 073c5000 ( 115.770 MB) 0.28% 0.00%
PAGE_READONLY 360 0 02d12000 ( 45.070 MB) 0.11% 0.00%
PAGE_WRITECOPY 175 0 0102e000 ( 16.180 MB) 0.04% 0.00%
PAGE_EXECUTE_READWRITE 79 0 005d8000 ( 5.844 MB) 0.01% 0.00%
PAGE_READWRITE|PAGE_GUARD 284 0 00582000 ( 5.508 MB) 0.01% 0.00%
PAGE_EXECUTE_WRITECOPY 36 0 0016f000 ( 1.434 MB) 0.00% 0.00%
PAGE_NOACCESS 11 0 0000b000 ( 44.000 kB) 0.00% 0.00%
PAGE_EXECUTE 2 0 00003000 ( 12.000 kB) 0.00% 0.00%
那么 很值得看一下这些.net托管对象在内存中的行为。
我们看到 客户自己命名空间下的对象 ShowHand 已脱敏 有些已经达到了几百万的数量
0:113 !dumpheap -stat
...
00007ffb6a1fab68 1062320 95485536 ShowHand.ConfigData.CommonPropertyInfo[]
00007ffb6ac2c4c0 1310734 104858720 ShowHand.ProjectU.Common.LBStaticActorProcessingAchievement
00007ffb6ac20db0 1633682 109318576 ShowHand.ProjectU.Common.IPropertiesProvider[]
00007ffb6ac0ad88 253100 114645888 ShowHand.ProjectU.Common.ICommonActorComp[]
00007ffbc5b16948 1809532 116227168 System.String
00007ffb6a5a84a8 1076959 120619408 ShowHand.ProjectU.Common.BattleGrid
00007ffb6b04bf40 2549394 122370912 behaviac.Action ActionTask
00007ffb6a1f8a50 5651702 135640848 ShowHand.ConfigData.CommonPropertyInfo
00007ffb6b04ae10 1986359 143017848 behaviac.Selector SelectorTask
00007ffb6abe65f8 31814 151783472 ShowHand.ProjectU.Common.IGameEventPipeListener[]
00007ffb6b04bdc8 3176057 152450736 behaviac.Assignment AssignmentTask
00007ffb6ac03478 14993 155447424 System.Collections.Generic.Dictionary 2 Entry[[System.Int32, mscorlib],[ShowHand.ProjectU.Common.WayPointInfo, CommonDefine]][]
00007ffb6b0ba198 3976124 159044960 ShowHand.ProjectU.Common.BattleGrid GridLink
00007ffb6afe0ef8 228402 164720576 ShowHand.ProjectU.Common.BattleGrid[]
00007ffb6afe0698 344120 185049680 ShowHand.ProjectU.Common.BattleGridInfo4Select[]
00007ffb6abda5f0 4078096 195748608 ShowHand.ProjectU.Common.WayPointInfo
00007ffb6b04bac0 4306198 206697504 behaviac.Condition ConditionTask
00007ffb6b04bc88 2827987 226238960 behaviac.ReferencedBehavior ReferencedBehaviorTask
00007ffb6aba7b88 5975977 239039080 ShowHand.ProjectU.Common.ProcessingMissionInfo
00007ffb6abebab0 6016203 240648120 ShowHand.ProjectU.Common.GameEventIdDefine[]
00007ffb6abe98e0 6050943 242037720 System.Collections.Generic.List 1[[ShowHand.ProjectU.Common.GameEventIdDefine, CommonDefine]]
00007ffb6afe0c78 515019 275971336 ShowHand.ProjectU.Common.BattleGridInfo4Attack[]
...
?
那么 这里是否存在托管内存泄露的问题 就值得深究一下了。
通过查看32个gc堆 可以看到这些堆中确实十分不健康
0:113 !eeheap -gc
Number of GC Heaps: 32
------------------------------
Heap 0 (000000e275cd3290)
generation 0 starts at 0x000000e29cfa59a8
generation 1 starts at 0x000000e29b152070
generation 2 starts at 0x000000e277b31000
ephemeral segment allocation context: none
segment begin allocated size
000000e277b30000 000000e277b31000 000000e29d2959c0 0x257649c0(628509120)
Large object heap starts at 0x000000ea77b31000
segment begin allocated size
000000ea77b30000 000000ea77b31000 000000ea7811ccc8 0x5ebcc8(6208712)
Heap Size: Size: 0x25d50688 (634717832) bytes.
------------------------------
Heap 1 (000000e275cd6620)
generation 0 starts at 0x000000e2d363e938
generation 1 starts at 0x000000e2d182a800
generation 2 starts at 0x000000e2b7b31000
ephemeral segment allocation context: none
segment begin allocated size
000000e2b7b30000 000000e2b7b31000 000000e2d5478678 0x1d947678(496268920)
Large object heap starts at 0x000000ea87b31000
segment begin allocated size
000000ea87b30000 000000ea87b31000 000000ea87fa0e68 0x46fe68(4652648)
Heap Size: Size: 0x1ddb74e0 (500921568) bytes.
------------------------------
Heap 2 (000000e275cda830)
generation 0 starts at 0x000000e31f5516c8
generation 1 starts at 0x000000e31d26b3b8
generation 2 starts at 0x000000e2f7b31000
ephemeral segment allocation context: none
segment begin allocated size
000000e2f7b30000 000000e2f7b31000 000000e321b9c7b0 0x2a06b7b0(705083312)
Large object heap starts at 0x000000ea97b31000
segment begin allocated size
000000ea97b30000 000000ea97b31000 000000ea98130d60 0x5ffd60(6290784)
Heap Size: Size: 0x2a66b510 (711374096) bytes.
------------------------------
Heap 3 (000000e275cdf480)
generation 0 starts at 0x000000e35263b398
generation 1 starts at 0x000000e350719af8
generation 2 starts at 0x000000e337b31000
ephemeral segment allocation context: none
segment begin allocated size
000000e337b30000 000000e337b31000 000000e3529b73b0 0x1ae863b0(451437488)
Large object heap starts at 0x000000eaa7b31000
segment begin allocated size
000000eaa7b30000 000000eaa7b31000 000000eaa7f90b30 0x45fb30(4586288)
Heap Size: Size: 0x1b2e5ee0 (456023776) bytes.
------------------------------
?
以下29个heap数据略 以gc heap0 为例
其Gen2 中大小已经到了593M 593629296Bytes 。
0:059 ? 0x000000e29b152070-0x000000e277b31000
Evaluate expression: 593629296 00000000 23621070
?
然而0代和1代大小才几MB和几十MB。 这种gen0 gen1很小 gen2爆大的分布是很不正常的 说明可能存在有GC不掉的托管对象。
我们查看一下heap 0 gen2中的对象 看到一个堆中 客户命名空间下的对象多的也达20万之多。
0:059 !dumpheap -stat 0x000000e277b31000 0x000000e29b152070
00007ffb6a5a84a8 35146 3936352 ShowHand.ProjectU.Common.BattleGrid
00007ffb6b04ae10 62375 4491000 behaviac.Selector SelectorTask
00007ffb6a1f8a50 190065 4561560 ShowHand.ConfigData.CommonPropertyInfo
00007ffb6b04bdc8 99780 4789440 behaviac.Assignment AssignmentTask
00007ffb6b0ba198 129947 5197880 ShowHand.ProjectU.Common.BattleGrid GridLink
00007ffb6ac03478 508 5266944 System.Collections.Generic.Dictionary 2 Entry[[System.Int32, mscorlib],[ShowHand.ProjectU.Common.WayPointInfo, CommonDefine]][]
00007ffb6afe0698 10768 5802880 ShowHand.ProjectU.Common.BattleGridInfo4Select[]
00007ffb6b04bac0 133954 6429792 behaviac.Condition ConditionTask
00007ffb6abda5f0 138160 6631680 ShowHand.ProjectU.Common.WayPointInfo
00007ffb6b04bc88 87804 7024320 behaviac.ReferencedBehavior ReferencedBehaviorTask
00007ffb6aba7b88 204523 8180920 ShowHand.ProjectU.Common.ProcessingMissionInfo
00007ffb6abebab0 205528 8221120 ShowHand.ProjectU.Common.GameEventIdDefine[]
00007ffb6abe98e0 207062 8282480 System.Collections.Generic.List 1[[ShowHand.ProjectU.Common.GameEventIdDefine, CommonDefine]]
00007ffb6afe0c78 16152 8657472 ShowHand.ProjectU.Common.BattleGridInfo4Attack[]
00007ffb6b04af38 222880 8915200 System.Collections.Generic.List 1[[behaviac.BehaviorTask, BehaviacRuntime]]
00007ffb6b0ea0c0 5248 9385248 System.Collections.Generic.Dictionary 2 Entry[[System.String, mscorlib],[FixMath.NET.Fix64, Fix64]][]
00007ffbc5aea7f0 121480 9718400 System.Collections.Generic.Dictionary 2[[System.Int32, mscorlib],[System.Int32, mscorlib]]
00007ffb6b04b730 147779 10640088 behaviac.Sequence SequenceTask
?
我们已ShowHand.ProjectU.Common.ProcessingMissionInfo为例 随机挑选一些该类的对象 查看其root行为。
0:059 !dumpheap -mt 00007ffb6aba7b88 0x000000e277b31000 0x000000e29b152070
...
000000e27c723178 00007ffb6aba7b88 40
000000e27c723348 00007ffb6aba7b88 40
000000e27c723500 00007ffb6aba7b88 40
000000e27c7236b8 00007ffb6aba7b88 40
000000e27c723888 00007ffb6aba7b88 40
000000e27c723a40 00007ffb6aba7b88 40
000000e27c723bf8 00007ffb6aba7b88 40
000000e27c723dc8 00007ffb6aba7b88 40
000000e27c723f80 00007ffb6aba7b88 40
000000e27c724138 00007ffb6aba7b88 40
000000e27c724308 00007ffb6aba7b88 40
000000e27c7244c0 00007ffb6aba7b88 40
000000e27c724678 00007ffb6aba7b88 40
000000e27c724848 00007ffb6aba7b88 40
000000e27c724a00 00007ffb6aba7b88 40
000000e27c724bb8 00007ffb6aba7b88 40
000000e27c724d88 00007ffb6aba7b88 40
000000e27c724f40 00007ffb6aba7b88 40
000000e27c7250f8 00007ffb6aba7b88 40
000000e27c7252c8 00007ffb6aba7b88 40
000000e27c725480 00007ffb6aba7b88 40
000000e27c725638 00007ffb6aba7b88 40
000000e27c725808 00007ffb6aba7b88 40
000000e27c7259c0 00007ffb6aba7b88 40
000000e27c725b78 00007ffb6aba7b88 40
000000e27c725d48 00007ffb6aba7b88 40
000000e27c725f00 00007ffb6aba7b88 40
000000e27c7260b8 00007ffb6aba7b88 40
000000e27c726288 00007ffb6aba7b88 40
000000e27c726440 00007ffb6aba7b88 40
000000e27c7265f8 00007ffb6aba7b88 40
000000e27c7267c8 00007ffb6aba7b88 40
...
?
随机挑选000000e27c723348和000000e27c723f80 这两个对象 查看其引用链:
0:059 !gcroot 000000e27c723348
Thread 2f78:
000000e216b0efa0 00007ffb69ccac7b log4net.Appender.FileAppender.Append(log4net.Core.LoggingEvent)
rbp 10: 000000e216b0efe0
- 000000e377b3f738 log4net.Appender.AsyncRollingFileAppender
- 000000e377b3f850 System.Collections.Concurrent.ConcurrentQueue 1[[log4net.Core.LoggingEvent, log4net]]
- 000000e623ca3f08 System.Collections.Concurrent.ConcurrentQueue 1 Segment[[log4net.Core.LoggingEvent, log4net]]
- 000000e623ca3f48 log4net.Core.LoggingEvent[]
- 000000e9555e8218 log4net.Core.LoggingEvent
- 000000e377b3b310 log4net.Repository.Hierarchy.Hierarchy
- 000000e377b527a0 log4net.Repository.LoggerRepositoryShutdownEventHandler
- 000000e377b526c8 log4net.Core.WrapperMap
- 000000e377b526f0 System.Collections.Hashtable
- 000000e377b52740 System.Collections.Hashtable bucket[]
- 000000e377b527e0 System.Collections.Hashtable
- 000000e377b52830 System.Collections.Hashtable bucket[]
- 000000e377b522e8 log4net.Repository.Hierarchy.DefaultLoggerFactory LoggerImpl
- 000000e377b3e720 log4net.Repository.Hierarchy.RootLogger
- 000000e377b46980 log4net.Util.AppenderAttachedImpl
- 000000e377b469a0 log4net.Appender.AppenderCollection
- 000000e377b51130 log4net.Appender.IAppender[]
- 000000e377b469c0 log4net.Appender.AsyncRollingFileAppender
- 000000e377b46b70 System.Threading.Thread
- 000000e6b7b33cb8 System.Runtime.Remoting.Contexts.Context
- 000000e277b31560 System.AppDomain
- 000000e6b7b67160 System.UnhandledExceptionEventHandler
- 000000e277b32a60 ShowHand.ProjectU.GameServer.GameServer
- 000000e377b53750 ShowHand.ServerBase.PlayerContextManager
- 000000e377b53a48 System.Collections.Concurrent.ConcurrentDictionary 2[[System.UInt64, mscorlib],[ShowHand.ServerBase.IManagedContext, ServerBase]]
- 000000e9c04a04d0 System.Collections.Concurrent.ConcurrentDictionary 2 Tables[[System.UInt64, mscorlib],[ShowHand.ServerBase.IManagedContext, ServerBase]]
- 000000ec47cb0218 System.Collections.Concurrent.ConcurrentDictionary 2 Node[[System.UInt64, mscorlib],[ShowHand.ServerBase.IManagedContext, ServerBase]][]
- 000000e9c048e440 System.Collections.Concurrent.ConcurrentDictionary 2 Node[[System.UInt64, mscorlib],[ShowHand.ServerBase.IManagedContext, ServerBase]]
- 000000e707424da8 ShowHand.ProjectU.GameServer.GameServerPlayerContext
- 000000e707424cf0 ShowHand.NetSharp.Client
- 000000e707425610 System.Threading.SemaphoreSlim
- 000000e707425668 System.Threading.SemaphoreSlim TaskNode
- 000000e685d6a2e0 System.Collections.Generic.List 1[[System.Object, mscorlib]]
- 000000e6c2a468d0 System.Object[]
- 000000e2d362b088 System.Threading.Tasks.TaskFactory CompleteOnInvokePromise
- 000000e707425840 System.Action
- 000000e707425820 System.Runtime.CompilerServices.AsyncMethodBuilderCore MoveNextRunner
- 000000e7074258d0 ShowHand.NetSharp.Client d__21
- 000000e707425880 System.Threading.Tasks.Task 1[[System.Threading.Tasks.VoidTaskResult, mscorlib]]
- 000000e707425980 System.Action
- 000000e707425960 System.Runtime.CompilerServices.AsyncMethodBuilderCore MoveNextRunner
- 000000e7074259c0 ShowHand.NetSharp.Endpoint d__8
- 000000e2b7b495f8 ShowHand.NetSharp.Endpoint
- 000000e2b7b49630 System.Collections.Concurrent.ConcurrentDictionary 2[[ShowHand.NetSharp.Client, NetSharp],[System.Boolean, mscorlib]]
- 000000e5bccc1f00 System.Collections.Concurrent.ConcurrentDictionary 2 Tables[[ShowHand.NetSharp.Client, NetSharp],[System.Boolean, mscorlib]]
- 000000eb47bd1150 System.Collections.Concurrent.ConcurrentDictionary 2 Node[[ShowHand.NetSharp.Client, NetSharp],[System.Boolean, mscorlib]][]
- 000000e5bccbbee8 System.Collections.Concurrent.ConcurrentDictionary 2 Node[[ShowHand.NetSharp.Client, NetSharp],[System.Boolean, mscorlib]]
- 000000e43c0e1d48 ShowHand.NetSharp.Client
- 000000e43c0e2cb8 ShowHand.ProjectU.GameServer.GameServerPlayerContext
- 000000e43c0e2e50 System.Collections.Generic.List 1[[ShowHand.ProjectU.Common.Components.Server.IServerDataSection, ProjectU.LogicComp]]
- 000000e27c6b4f48 ShowHand.ProjectU.Common.Components.Server.IServerDataSection[]
- 000000e27c6a12f8 ShowHand.ProjectU.Common.Components.Server.MissionDataSection
- 000000e27c6a1330 System.Collections.Generic.List 1[[ShowHand.ProjectU.Common.ProcessingMissionInfo, CommonDefine]]
- 000000e27c71d400 ShowHand.ProjectU.Common.ProcessingMissionInfo[]
- 000000e27c723348 ShowHand.ProjectU.Common.ProcessingMissionInfo
?
我们可以看到 000000e27c723348间接引用自ShowHand.NetSharp.Client 我们看到000000e27c723f80和其他所有gen2中的ShowHand.ProjectU.Common.ProcessingMissionInfo都间接引用自ShowHand.NetSharp.Client 并且都引用来自同一个ShowHand.NetSharp.Endpoint对象000000e2b7b495f8。 我们进而分析这个对象。
0:059 !do 000000e2b7b495f8
Name: ShowHand.NetSharp.Endpoint
MethodTable: 00007ffb69f641d0
EEClass: 00007ffb69f53fd8
Size: 56(0x38) bytes
File: e:\ServerRelease\Server\GameServer\NetSharp.dll
Fields:
MT Field Offset Type VT Attr Value Name
00007ffbc5b19288 400002d 28 System.Int32 1 instance 65536 k__BackingField
00007ffb698a8d80 400002e 8 ...pointEventHandler 0 instance 000000e277b32a60 m_endPointEventHandler
00007ffbc4bf2088 400002f 10 ...ckets.TcpListener 0 instance 000000e2b7b4ad60 m_listener
00007ffb69f64078 4000030 2c System.Int32 1 instance 0 m_state
00007ffbc5b27b38 4000031 18 ...eading.Tasks.Task 0 instance 000000e2b7b4af70 m_mainTask
00007ffb69f64888 4000032 20 ...olean, mscorlib]] 0 instance 000000e2b7b49630 m_clientActiveness
?
0:059 !do 000000e2b7b49630
Name: System.Collections.Concurrent.ConcurrentDictionary 2[[ShowHand.NetSharp.Client, NetSharp],[System.Boolean, mscorlib]]
MethodTable: 00007ffb69f64888
EEClass: 00007ffb69e90240
Size: 64(0x40) bytes
File: C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
Fields:
MT Field Offset Type VT Attr Value Name
00007ffb69d33f48 4001830 8 ...olean, mscorlib]] 0 instance 000000e5bccc1f00 m_tables
00007ffbc5b34300 4001831 10 ...Canon, mscorlib]] 0 instance 0000000000000000 m_comparer
00007ffbc5b21f28 4001832 30 System.Boolean 1 instance 1 m_growLockArray
00007ffbc5b19288 4001833 20 System.Int32 1 instance 0 m_keyRehashCount
00007ffbc5b19288 4001834 24 System.Int32 1 instance 32 m_budget
00007ffbc65fc070 4001835 18 ...ean, mscorlib]][] 0 instance 0000000000000000 m_serializationArray
00007ffbc5b19288 4001836 28 System.Int32 1 instance 0 m_serializationConcurrencyLevel
00007ffbc5b19288 4001837 2c System.Int32 1 instance 0 m_serializationCapacity
00007ffbc5b21f28 400183b 10 System.Boolean 1 static
?
0:059 !do 000000e5bccc1f00
Name: System.Collections.Concurrent.ConcurrentDictionary 2 Tables[[ShowHand.NetSharp.Client, NetSharp],[System.Boolean, mscorlib]]
MethodTable: 00007ffb69f65790
EEClass: 00007ffb69e90968
Size: 48(0x30) bytes
File: C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
Fields:
MT Field Offset Type VT Attr Value Name
0000000000000000 400341d 8 SZARRAY 0 instance 000000eb47bd1150 m_buckets
00007ffbc5b16fc0 400341e 10 System.Object[] 0 instance 000000e737ef1220 m_locks
00007ffbc5b19220 400341f 18 System.Int32[] 0 instance 000000e5bcc76a00 m_countPerLock
00007ffbc5b34300 4003420 20 ...Canon, mscorlib]] 0 instance 000000e2b7b41d18 m_comparer
?
0:059 !do 000000eb47bd1150
Name: System.Collections.Concurrent.ConcurrentDictionary 2 Node[[ShowHand.NetSharp.Client, NetSharp],[System.Boolean, mscorlib]][]
MethodTable: 00007ffb69f65610
EEClass: 00007ffbc54daa00
Size: 266240(0x41000) bytes
Array: Rank 1, Number of elements 33277, Type CLASS (Print Array)
Fields:
None
?
我们可以看到 某个ShowHand.NetSharp.Endpoint对象 引用了数万的ShowHand.NetSharp.Client,而数万的
ShowHand.NetSharp.Client又间接引用了几千万的客户自己的各种对象。 最终这些对象因为存在着引用 经历GC回收后最终被推到了Gen2。
?
这是一个不健康的行为 比起gen0和gen1的垃圾回收 gen2的回收则昂贵的多。 基于严谨 我们不能将上述分析作为确凿证据和文初的cpu抖动挂钩 其实的确是又这种可能性 大规模的gen2 GC引发高CPU 。但是 该不健康的2代对象太多的问题 的确需要解决 无论它和CPU抖动有没有直接关系 它都会给程序的健康运行带来巨大隐患。
?
我们建议客户基于以上分析 并基于自身业务考虑该情况的发生是否合理 如不合理 应适当考虑对程序进行优化。
?
我们是阿里云智能全球技术服务-SRE团队 我们致力成为一个以技术为基础、面向服务、保障业务系统高可用的工程师团队 提供专业、体系化的SRE服务 帮助广大客户更好地使用云、基于云构建更加稳定可靠的业务系统 提升业务稳定性。我们期望能够分享更多帮助企业客户上云、用好云 让客户云上业务运行更加稳定可靠的技术 您可用钉钉扫描下方二维码 加入阿里云SRE技术学院钉钉圈子 和更多云上人交流关于云平台的那些事。
今天这篇文章,我们继续讲架构师大刘的故事。 大刘有段时间经常会给一些程序员讲...
据投资人Jamie Zoch昨天的推特消息,全球知名的烈酒生产企业百加得(Bacardi Com...
近日,阿里云物联网操作系统AliOS Things时隔一年,新版本AliOS Things 3.3.0正...
大数据是科技界的流行语。这项技术引领着企业的大型项目,现在,DevOps自动化正...
本期导读 【OSS 访问加速】第八讲 主题 Flume 高效写入 OSS 讲师 焱冰 阿里巴巴...
哪家 企业邮箱 最便宜?现在市面上企业 邮箱 服务商有很多,各自的收费标准也是...
单线程为什么能支持10w+的QPS? 我们经常听到Redis是一个单线程程序。准确的说Red...
Kafka核心总控制器Controller 在Kafka集群中会有一个或者多个broker,其中有一个...
作者 佳旭?阿里云容器服务技术专家 引言 Kubernetes 在生产环境应用的普及度越来...
调查显示,目前几乎所有的企业都在使用多家云提供商和大量基于云的解决方案。也...