本人使用离线安装方式,先后安装了Kubesphere2.0.2和2.1的MultiNode模式。一路采坑,最终均安装成功,运行良好。
现把安装过程记录一下:
参考:
安装说明
官方安装文档务必先看至少2遍,重要!
有问题先查:
开发者社区
kubesphere issues
ks-install issues
Kubesphere2.0.2离线安装
准备2台CentOS 7.5虚拟机干净系统:
master 8C 16G 2T
node 8C 16G 2T
Kubesphere2.1离线安装
准备4台CentOS 7.7虚拟机干净系统:
master1 4C 8G 2T
node1 4C 8G 2T
node2 4C 8G 2T
node4 4C 8G 2T
上述配置只要满足官方最小配置即可,kubesphere2.1我本来是用的2C 4G,可用组件除了DevOps,其他都enable了,安装后内存吃不消跑挂了一台Node节点,改成4C8G重启后没有问题了。
实操安装以我安装2.1为例:
1、设置主机名,其实可以不设置,kubesphere安装后会用hosts.ini中写的主机名修改每台机器的主机名。
$ hostnamctl set-hostname master1
2、每台机器需要固定IP,以免重启后IP变化!
不同网络环境不一样,我们公司网络固定IP,不允许采用static模式,需要绑定MAC地址就可以固定了,直接在VirutalBox上修改虚拟机的MAC地址,绑定IP就固定了。IP仍然使用dhcp模式。
如果使用static模式固定IP,操作:
查看当前网卡名称:
$ ifconfig
查看本机网关:
$ route -n
查看DNS:
$ cat /etc/resolv.conf
centos7的网络IP地址配置文件在 /etc/sysconfig/network-scripts 文件夹下
enp0s3网卡对应的配置文件为ifcfg-enp0s3,使用vim编辑修改如下:
BOOTPROTO="static" # 使用静态IP地址,默认为dhcp
IPADDR="192.168.43.224" # 设置的静态IP地址
NETMASK="255.255.255.0" # 子网掩码
GATEWAY="192.168.43.216" # 网关地址
DNS1="192.168.43.216" # DNS服务器
修改完成后重启网络:
$ service network restart
3、DNS检查,重要!
务必检查网络环境中的DNS是否可达,我司内网的DNS服务器用Linux系统是访问不了的,这里有个大坑,需要注释掉DNS配置,安装后再重启机器,DNS会恢复。
$ sudo vi /etc/resolv.conf
#注释不可达的DNS地址
4、禁用防火墙:
$ systemctl stop firewalld
$ systemctl disable firewalld
5、禁用SELINUX:
$ sudo setenforce 0
$ sudo vi /etc/selinux/config
SELINUX=disabled
6、关闭 swap 分区:
$ sudo swapoff -a
$ sudo vi /etc/fstab
#注释掉 SWAP 的自动挂载,使用free -m确认 swap 已经关闭
$ free -m
7、确认master到node每台主机之间可以ssh
8、存储配置,我选择的本地搭建NFS,官方有其他选择https://kubesphere.io/docs/v2.1/zh-CN/installation/storage-configuration/:
NFS 服务端128.160.184.190:
$sudo yum install -y nfs-utils.x86_64
$sudo rpm -qa nfs-utils
nfs-utls-1.3.0-0.61.el7.x86_64
$sudo systemctl start nfs
$sudo systemctl enable nfs
$sudo systemctl status nfs
$sudo mkdir /nfsdata
$sudo vi /etc/exports
/nfsdata 128.160.184.0/24(rw,no_root_squash)
$sudo systemctl reload nfs
$showmount -e 127.0.0.1
Export list for 127.0.0.1:
/nfsdata 128.160.184.0/24
NFS 客户端挂载NFS:
$ sudo mkdir /nfsdata
$ sudo mount -t nfs 128.160.184.190:/nfsdata /nfsdata
上述基本环境配置OK,开始安装:
9、下载离线包参考:https://kubesphere.com.cn/forum/d/230-v2-1-0
我以master1作为taskbox,将
kubesphere-all-offline-v2.1.0.tar.gz和additional-images.tar下载好sftp put到master1。
additional images使用(其中包含istio及s2i相关镜像):
安装前:下载additional-images.tar,拷贝至Repos/images-v2.1.0,按流程执行安装。
安装后:下载additional-images.tar,拷贝至Repos/images-v2.1.0,执行scripts/os/tag.sh 。
10、修改hosts.ini,
[all]
master1 ansible_connection=local ip=128.160.184.110
node1 ansible_host=128.160.184.120 ip=128.160.184.120 ansible_ssh_pass=root
node2 ansible_host=128.160.184.121 ip=128.160.184.121 ansible_ssh_pass=root
node4 ansible_host=128.160.184.112 ip=128.160.184.112 ansible_ssh_pass=root
[local-registry]
master1
[kube-master]
master1
[kube-node]
node1
node2
node4
[etcd]
master1
[k8s-cluster:children]
kube-node
kube-master
我本来想安装多个master,kube-master中配置了多个master节点,安装报错,提示要配置loadbalancer_apiserver,后查看安装脚本,发现脚本中判断了hosts.ini中如果kube-master配置了多个就必须在common.yaml中配置loadbalancer_apiserver。
所以如果要配置集群高可用,参考:
https://kubesphere.io/docs/v2.1/zh-CN/installation/master-ha/
https://docs.qingcloud.com/product/network/loadbalancer
https://kubesphere.com.cn/forum/d/431-keepalived-haproxy-kubesphere
这里我没有实操,后面有需要再来研究
11、修改common.yaml:
存储由Local Volume改成NFS:
local_volume_enabled: false
local_volume_is_default_class: false
local_volume_storage_class: local
nfs_client_enabled: true
nfs_client_is_default_class: true
# Hostname of the NFS server(ip or hostname)
nfs_server: 128.160.184.190
# Basepath of the mount point
nfs_path: /nfsdata
nfs_vers3_enabled: false
nfs_archiveOnDelete: false
启用可插拔的功能组件:
我除了DevOps、Harbor、Gitlab没有启用,其他都启用了。
12、用root用户运行scripts/install.sh
Kubesphere2.1的安装问题:
Kubesphere2.0.2的离线安装当时遇到的问题,基本都是之前的环境配置问题,如果按照上述配置,还有问题就去官方查询。
Kubesphere2.1的离线安装遇到的新问题:
Requirement already satisfied: pycparser in /usr/lib/python2.7/site-packages (from cffi>=1.4.1->cryptography->ansible==2.7.12->-r os/requirements.txt (line 1)) (2.14)
ERROR: boto3 1.4.6 requires botocore<1.7.0,>=1.6.0, which is not installed.
ERROR: msrest 0.5.4 requires typing, which is not installed.
ERROR: ipaserver 4.6.5 requires dbus-python, which is not installed.
ERROR: ipaserver 4.6.5 requires dogtag-pki, which is not installed.
ERROR: ipaserver 4.6.5 requires jwcrpyto>=0.4.2, which is not installed.
ERROR: ipaserver 4.6.5 has requirement dnspython>=1.15, but you’ll have dnspython 1.12.0 which is incompatible.
ERROR: ipaserver 4.6.5 has requirement python-ldap>=3.0.0b1, but you’ll have python-ldap 2.4.15 which is incompatible.
ERROR: requests 2.22.0 has requirement idna<2.9,>=2.5, but you’ll have idna 2.4 which is incompatible.
ERROR: azure-keyvault 1.0.0 has requirement cryptography>=2.1.4, but you’ll have cryptography 1.7.2 which is incompatible.
ERROR: ipapython 4.6.5 has requirement dnspython>=1.15, but you’ll have dnspython 1.12.0 which is incompatible.
ERROR: ipapython 4.6.5 has requirement python-ldap>=3.0.0b1, but you’ll have python-ldap 2.4.15 which is incompatible.
Installing collected packages: netaddr, pbr, chardet, urllib3, certifi, requests, hvac, jmespath, ruamel.ordereddict, ruamel.yaml
Found existing installation: netaddr 0.7.5
ERROR: Cannot uninstall ‘netaddr’. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
这个问题是需要手动加上–ignore-installed方式安装netaddr:
$ sudo pip install --ignore-installed netaddr --no-index --find-links=/kubeinstaller/pip_repo/pip27/iso
$ sudo pip install --ignore-installed requests --no-index --find-links=/kubeinstaller/pip_repo/pip27/iso
还要将其他缺失的包都下载了再安装,pip包可以在https://pypi.org/搜索对应版本下载whl或tar.gz文件进行离线安装:
$ sudo pip install msrest==0.5.4
$ sudo pip install python_dateutil-2.8.0-py2.py3-none-any.whl
$ sudo pip install botocore-1.6.8-py2.py3-none-any.whl
$ sudo pip install typing-3.7.4.1-py2-none-any.whl
$ sudo pip install dnspython-1.16.0-py2.py3-none-any.whl
$ sudo pip install dogtag_pki-10.7.4.1-py2.py3-none-any.whl
$ sudo pip install dbus-python-1.2.12.tar.gz
再安装就出现了新的错误:
NameError: name ‘platform_system’ is not defined
failed!
这就是ansible命令执行出问题了,用ansible –version也报这个错
尝试更新了ansible到最新版:
$ sudo pip install ansible-2.9.2.tar.gz
$ ansible --version
没问题了,继续安装也没有问题,用ansible分发Docker等文件到node节点都成功了,但在分发kubeadm的时候报错:
TASK [kubernetes/node : install | Copy kubeadm binary from download dir] ***************************************************************************************************************************************
Wednesday 11 December 2019 20:10:02 +0800 (0:00:00.250) 0:08:57.314 ****
skipping: [master1]
fatal: [node4 -> 128.160.184.112]: FAILED! => {
“changed”: false,
“cmd”: “sshpass”,
“rc”: 2
}
MSG:
[Errno 2] No such file or directory
我查了一下k8s/roles/kubernetes/node/tasks/install.yml文件:
- name: install | Copy kubeadm binary from download dir
synchronize:
src: "{{ local_release_dir }}/kubeadm-{{ kubeadm_version }}-{{ image_arch }}"
dest: "{{ bin_dir }}/kubeadm"
compress: no
perms: yes
owner: no
group: no
delegate_to: "{{ inventory_hostname }}"
tags:
- kubeadm
when:
- not inventory_hostname in groups['kube-master']
其中src貌似是/tmp/releases,看了一下,存在以下文件:
calicoctl cni-plugins-linux-amd64-v0.8.1.tgz helm hyperkube-v1.15.5-amd64 images istioctl kubeadm-v1.15.5-amd64
那应该是有 kubeadm-v1.15.5-amd64文件,为何会报错[Errno 2] No such file or directory?
通过grep -r -e “kube_version” .
我发现存在kube_version=1.15.3的配置文件,试了一下,修改了以下文件:
vi ./k8s/roles/kubespray-defaults/defaults/main.yaml
vi ./k8s/inventory/sample/group_vars/k8s-cluster/k8s-cluster.yml
修改kube_version从1.15.3改为1.15.5
重新执行.install.sh仍然报上述错误。
那应该就是ansible执行脚本和ansible版本不兼容导致的,退回到2.7.12版本:
卸载ansible
$ sudo pip uninstall ansible
重新安装2.7.12版本
$ sudo pip install ansible-2.7.12.tar.gz
$ ansible –version
又出现NameError: name ‘platform_system’ is not defined
更新setuptools:
$ sudo pip install -U setuptools
或离线安装
$ sudo pip install setuptools-42.0.2-py2.py3-none-any.whl
重装requests,用离线包中的requests==2.22.0版本还是不行,安装旧版requests:
pip install requests-2.6.0-py2.py3-none-any.whl
成功执行ansible –version
再执行./install.sh发现kubeadm分发下去了,安装成功。
PS:
遇到安装问题,看安装日志,然后根据关键字查找文件夹下所有文件中某字符,很有用:
$ grep -r -e "kubeadm binary" ~/kubesphere-all-offline-v2.1.0
可以大致找到报错信息所对应的的脚本,再跟踪问题。