前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Issue record: “No thread for socket” about Memcached

Issue record: “No thread for socket” about Memcached

作者头像
四火
发布2022-07-15 21:47:42
1630
发布2022-07-15 21:47:42
举报
文章被收录于专栏:四火的唠叨四火的唠叨

We own a service supplying not-real-time data to the downstream clients. In order to reduce the average latency and the pressure on database, Memcached was imported as the cluster center cache, and our service consumed it. I encountered an issue when Memcached was firstly deployed on production environment:

代码语言:javascript
复制
[WARN] (Acceptor ServerSocket[addr=/0.0.0.0,port=0,localport=3961]) amazon.platform.servertoolkit.http.jetty.JettyServerManager: No thread for Socket[addr=/10.0.77.98,port=10545,localport=3961] : [Acceptor ServerSocket[addr=/0.0.0.0,port=0,localport=3961]]org.mortbay.util.ThreadPool.run(ThreadPool.java:350)

The server totally gave no response. So I checked the connection number:

代码语言:javascript
复制
netstat -an | grep ESTABLISHED | wc –l
54

And process number:

代码语言:javascript
复制
ps -ef|wc -l
564

The both seemed normal. One teammate remaindered me I should have a look at the threads of Jetty:

代码语言:javascript
复制
ps uH p 31636 | wc -l

I couldn’t find anything abnormal. I knew I might increase the thread up limit but that would not have any help to resolve the issue, at most it might only delay the issue’s happening.

Actually, this issue was a little more complex than we thought indeed.

The threads were running out from this morning, and client could not connect to our service. I took a snapshot (jstack -F -l 31636) for the stacks in JVM, finding that almost all the threads were hung reading from cache server:

代码语言:javascript
复制
Thread 19795: (state = IN_NATIVE)
 - sun.nio.ch.FileDispatcher.read0(java.io.FileDescriptor, long, int) @bci=0 (Compiled frame; information may be imprecise)
 - sun.nio.ch.SocketDispatcher.read(java.io.FileDescriptor, long, int) @bci=4, line=21 (Compiled frame)
 - sun.nio.ch.IOUtil.readIntoNativeBuffer(java.io.FileDescriptor, java.nio.ByteBuffer, long, sun.nio.ch.NativeDispatcher, java.lang.Object) @bci=116, line=198 (Compiled frame)
 - sun.nio.ch.IOUtil.read(java.io.FileDescriptor, java.nio.ByteBuffer, long, sun.nio.ch.NativeDispatcher, java.lang.Object) @bci=31, line=166 (Compiled frame)
 - sun.nio.ch.SocketChannelImpl.read(java.nio.ByteBuffer) @bci=101, line=245 (Compiled frame)
 - com.schooner.MemCached.SockInputStream.readFromChannel() @bci=25 (Interpreted frame)
 - com.schooner.MemCached.SockInputStream.read() @bci=31 (Compiled frame)
 - com.schooner.MemCached.SockInputStream.getLine() @bci=9 (Compiled frame)
 - com.schooner.MemCached.AscIIClient.get(java.lang.String, java.lang.String, java.lang.Integer, boolean) @bci=583 (Compiled frame)
 - com.schooner.MemCached.AscIIClient.get(java.lang.String, java.lang.Integer) @bci=6 (Compiled frame)
 - com.schooner.MemCached.AscIIClient.get(java.lang.String) @bci=3 (Compiled frame)
 - com.danga.MemCached.MemCachedClient.get(java.lang.String) @bci=5 (Compiled frame)
 - amazon.forecast.util.cache.memcached.BSFMemcachedClient.get(java.lang.String, java.lang.String) @bci=14, line=80 (Compiled frame)
 - amazon.forecast.distribution.Forecaster.getForecastDistributions(amazon.forecast.distribution.IForecastDistributionProvider, amazon.forecast.distribution.ForecastRequest)
 ...

On my understanding there should be some error on the connection from the client to cache server, then I checked the cache server but found it’s correctly and stably running, and of course, the network was good. We invoked?Memcached service manually several times thru telnet, it did work normally.?I took several snapshots in series again and got the same result — it seemed all the threads were being hung on the “read0” method.

However, after restarting the client the issue disappeared. By then nothing could explain why all the threads were being hung reading from cache server. One of my teammate told me if the tcp connection was dropped and the client was trying to read, it would not fail until the retransmission time out. The retransmission would happen in a back-off manner, and it would take long time for “read” to fail. Besides,?It didn’t matter whether or not we were able to connect to the cache server. The exact tcp connection was broken on client, and it didn’t detect and was still using the broken socket to read.

Finally, we were still unclear about the root cause, but found a loop discussing about this issue:?

https://github.com/gwhalin/Memcached-Java-Client/issues/20

A working solution is to set socket timeout to release the connection in time if unforeseen issue happens.

文章未经特殊标明皆为本人原创,未经许可不得用于任何商业用途,转载请保持完整性并注明来源链接 《四火的唠叨》

×Scan to share with WeChat

本文参与?腾讯云自媒体分享计划,分享自作者个人站点/博客。
如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客?前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与?腾讯云自媒体分享计划? ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档
http://www.vxiaotou.com