背景
1 台 master 加入集群后发现忘了修改主机名,而在 k8s 集群中修改节点主机名非常麻烦,不如将 master 退出集群改名并重新加入集群(前提是用的是高可用集群)。
操作步骤
ssh 登录另外一台 master 节点将要改名的 master 节点移出集群。
# kubectl drain k8s-master1 --delete-local-data --force --ignore-daemonsets # kubectl delete node k8s-master1
登录已退出集群的 master 服务器重置 kubelet 配置并重新加入集群。
# kubeadm reset # kubeadm join k8s:6443 --token ***** \ --discovery-token-ca-cert-hash ***** \ --control-plane --certificate-key *****
加入失败,会一直卡住:
W1118 16:00:30.164179 27403 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC" [control-plane] Creating static Pod manifest for "kube-controller-manager" W1118 16:00:30.179472 27403 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC" [control-plane] Creating static Pod manifest for "kube-scheduler" W1118 16:00:30.181257 27403 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC" [check-etcd] Checking that the etcd cluster is healthy
从错误信息看 etcd 集群不知道 10.0.1.81 已经退出 k8s 集群,etcd 集群中还保存着 10.0.1.81 的信息,所以在加入时连接 10.0.1.81 失败。
登录其中一台正常的 master ,进入 etcd 容器:
# docker exec -it $(docker ps -f name=etcd_etcd -q) /bin/sh
查看 etcd 集群的成员列表,果然其中还有那台已经退出集群的服务器 master1
# etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member list 2fb26b18a5693bae, started, k8s-master2, https://10.0.0.90:2380, https://10.0.0.90:2379, false 609555ab7e1569cf, started, k8s-master3, https://10.0.0.91:2380, https://10.0.0.91:2379, false 64a231df7832b3a0, started, k8s-master1, https://10.0.0.89:2380, https://10.0.0.89:2379, false
在容器中将这台服务器从 etcd 集群中移除
# etcdctl --endpoints 127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 64a231df7832b3a0
移除以后再重新执行重置加入命令
# kubeadm reset # kubeadm join k8s:6443 --token ***** \ --discovery-token-ca-cert-hash ***** \ --control-plane --certificate-key *****