Jeff 谢谢你的回复,这边看kubectl get apiservice | grep metrics 失败状态
v1beta1.metrics.k8s.io kube-system/metrics-server False (FailedDiscoveryCheck) 22d
上面我日志中已经有个metrics-server起来但是kube-apiserver还是去试图连接那个丢失节点上的pod 10.244.235.186当我kill kube-apiserver之后他就能正确连接10.244.180.55
早期使用udp出现过这样的错误,但是使用conntrack -D 清除连接缓存就可以生成新的连接,但是这次即使使用了conntrack -D 还是不行,还是会重新指向错误的podIP 10.244.235.186 感觉有地方记忆了
https://github.com/kubernetes/kubernetes/issues/59368?from=singlemessage
metrics-server-8b7689b66-xm6mf 1/1 Running 0 36s 10.244.180.55 master2 <none> <none>
metrics-server-8b7689b66-z9hk9 1/1 Unknown 0 3m58s 10.244.235.186 worker1 <none> <none>
conntrack -L | grep 10.101.186.48
tcp 6 278 ESTABLISHED src=10.101.186.48 dst=10.101.186.48 sport=45842 dport=443 src=10.244.235.186 dst=192.168.210.71 sport=443 dport=19158 [ASSURED] mark=0 use=1
tcp 6 298 ESTABLISHED src=10.101.186.48 dst=10.101.186.48 sport=45820 dport=443 src=10.244.235.186 dst=192.168.210.71 sport=443 dport=15276 [ASSURED] mark=0 use=2
我试了如下的方法是有效的,修改了内核从net.ipv4.tcp_retries2=15到net.ipv4.tcp_retries2=1,断电的情况下载1min释放之后可以指定到10.244.180.55
https://blog.csdn.net/gao1738/article/details/42839697