服务器 | |
---|---|
数量 | >1(根据实际情况配置) |
配置 | 8 core /16GB memory / 500GB硬盘/10M带宽 |
操作系统 | CentOS linux 7.2及以上/Ubuntu 16.04 以上 |
依赖包 | (参见4.5 软件环境初始化) |
用户 | 用户:app,属主:apps(app用户需可以sudo su root而无需密码) |
文件系统 | 1. 500G硬盘挂载在/ data目录下; 2.创建/ data / projects目录,目录属主为:app:apps |
p a r t y | p a r t y i d | 主 机 名 | IP 地 址 | 操作系统 | 安 装软件 | 服务 |
---|---|---|---|---|---|---|
P a r t y A | 9 9 9 9 | V M_0 _1_ cen tos | 1 92 .1 68 .0 .1 | CentOS 7 .2/Ubuntu 16.04 | fate, eggroll ,mysql | fat e_flow,fateboard,cluster manager,nodemanger,mysql |
P a r t y A | 9 9 9 9 | V M_0 _2_ cen tos | 1 92 .1 68 .0 .2 | CentOS 7 .2/Ubuntu 16.04 | fate, eggroll | nodemanger,rollsite |
P a r t y B | 1 0 0 0 0 | V M_0 _3_ cen tos | 1 92 .1 68 .0 .3 | CentOS 7 .2/Ubuntu 16.04 | fate, eggroll ,mysql | all |
架构图:
软件产品 | 组件 | 端口 | 说明 |
---|---|---|---|
fate | fate_flow | 9360;9380 | 联合学习任务流水 线管理模块,每个pa rty只能有一个此服务 |
fate | fateboard | 8080 | 联合学习过程 可视化模块,每个pa rty只能有一个此服务 |
eggroll | clustermanager | 4670 | cluster mana ger管理集群,每个pa rty只能有一个此服务 |
eggroll | nodemanger | 4671 | node manager管理每台机器 资源,每个party可有 多个此服务,但一台 服务器置只能有一个 |
eggroll | rollsite | 9370 | 跨 站点或者说跨party通 讯组件,相当于proxy +federation,每个pa rty只能有一个此服务 |
mysql | mysql | 3306 | 数据存储 ,clustermanager和f ateflow依赖,每个pa rty只需要一个此服务 |
1)修改主机名
在192.168.0.1 root用户下执行:
hostnamectl set-hostname VM_0_1_centos
在192.168.0.2 root用户下执行:
hostnamectl set-hostname VM_0_2_centos
在192.168.0.3 root用户下执行:
hostnamectl set-hostname VM_0_3_centos
2)加入主机映射
在目标服务器(192.168.0.1 192.168.0.2 192.168.0.3)root用户下执行:
vim /etc/hosts
192.168.0.1 VM_0_1_centos
192.168.0.2 VM_0_2_centos
192.168.0.3 VM_0_3_centos
在目标服务器(192.168.0.1 192.168.0.2 192.168.0.3)root用户下执行:
确认是否已安装selinux
centos系统执行:rpm -qa | grep selinux
ubuntu系统执行:apt list –installed | grep selinux
如果已安装了selinux就执行:setenforce 0
在目标服务器(192.168.0.1 192.168.0.2 192.168.0.3)root用户下执行:
vim /etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
在目标服务器(192.168.0.1 192.168.0.2 192.168.0.3)root用户下执行
如果是Centos系统:
systemctl disable firewalld.service
systemctl stop firewalld.service
systemctl status firewalld.service
如果是Ubuntu系统:
ufw disable
ufw status
在目标服务器(192.168.0.1 192.168.0.2 192.168.0.3)root用户下执行
1)创建用户
groupadd -g 6000 apps useradd -s /bin/bash -g apps -d /home/app app passwd app
2)创建目录
mkdir -p /data/projects/fate mkdir -p /data/projects/install chown -R app:apps /data/projects
3)安装依赖
#centos yum -y install gcc gcc-c++ make openssl-devel gmp-devel mpfr-devel libmpcdevel libaio numactl autoconf automake libtool libffi-devel snappy snappy-devel zlib zlib-devel bzip2 bzip2-devel lz4-devel libasan lsof sysstat telnet psmisc #ubuntu apt-get install -y gcc g++ make openssl supervisor libgmp-dev libmpfr-dev libmpc-dev libaio1 libaio-dev numactl autoconf automake libtool libffi-dev libssl1.0.0 libssl-dev liblz4-1 liblz4-dev liblz4-1-dbg liblz4-tool zlib1g zlib1g-dbg zlib1g-dev cd /usr/lib/x86_64-linux-gnu if [ ! -f "libssl.so.10" ];then ln -s libssl.so.1.0.0 libssl.so.10 ln -s libcrypto.so.1.0.0 libcrypto.so.10 fi
在目标服务器(192.168.0.1 192.168.0.2 192.168.0.3)root用户下执行
生产环境使用时,因内存计算需要增加128G虚拟内存,参考:
cd /data dd if=/dev/zero of=/data/swapfile128G bs=1024 count=134217728 mkswap /data/swapfile128G swapon /data/swapfile128G cat /proc/swaps echo '/data/swapfile128G swap swap defaults 0 0' >> /etc/fstab
注:此指导安装目录默认为/data/projects/install,执行用户为app,安装时根据具体实际情况修改。
在目标服务器(192.168.0.1 具备外网环境)app用户下执行:
mkdir -p /data/projects/install cd /data/projects/install wget https://webank-ai-1251170195.cos.ap-guangzhou.myqcloud.com/python-env-1.4.0-rc3.tar.gz wget https://webank-ai-1251170195.cos.ap-guangzhou.myqcloud.com/jdk-8u192-linux-x64.tar.gz wget https://webank-ai-1251170195.cos.ap-guangzhou.myqcloud.com/mysql-1.4.0-rc3.tar.gz wget https://webank-ai-1251170195.cos.ap-guangzhou.myqcloud.com/FATE_install_1.4.0-release.tar.gz #传输到192.168.0.2和192.168.0.3 scp *.tar.gz app@192.168.0.2:/data/projects/install scp *.tar.gz app@192.168.0.3:/data/projects/install
在目标服务器(192.168.0.1 192.168.0.3)app用户下执行
1)mysql安装:
#建立mysql根目录 mkdir -p /data/projects/fate/common/mysql mkdir -p /data/projects/fate/data/mysql #解压缩软件包 cd /data/projects/install tar xzvf mysql-1.4.0-rc3.tar.gz cd mysql tar xf mysql-8.0.13.tar.gz -C /data/projects/fate/common/mysql #配置设置 mkdir -p /data/projects/fate/common/mysql/mysql-8.0.13/{conf,run,logs} cp service.sh /data/projects/fate/common/mysql/mysql-8.0.13/ cp my.cnf /data/projects/fate/common/mysql/mysql-8.0.13/conf #初始化 cd /data/projects/fate/common/mysql/mysql-8.0.13/ ./bin/mysqld --initialize --user=app --basedir=/data/projects/fate/common/mysql/mysql-8.0.13 --datadir=/data/projects/fate/data/mysql > logs/init.log 2>&1 cat logs/init.log |grep root@localhost #注意输出信息中root@localhost:后的是mysql用户root的初始密码,需要记录,后面修改密码需要用到 #启动服务 cd /data/projects/fate/common/mysql/mysql-8.0.13/ nohup ./bin/mysqld_safe --defaults-file=./conf/my.cnf --user=app >>logs/mysqld.log 2>&1 & #修改mysql root用户密码 cd /data/projects/fate/common/mysql/mysql-8.0.13/ ./bin/mysqladmin -h 127.0.0.1 -P 3306 -S ./run/mysql.sock -u root -p password "fate_dev" Enter Password:【输入root初始密码】 #验证登陆 cd /data/projects/fate/common/mysql/mysql-8.0.13/ ./bin/mysql -u root -p -S ./run/mysql.sock Enter Password:【输入root修改后密码:fate_dev】
2)建库授权和业务配置
cd /data/projects/fate/common/mysql/mysql-8.0.13/ ./bin/mysql -u root -p -S ./run/mysql.sock Enter Password:【fate_dev】 #创建eggroll库表 mysql>source /data/projects/install/mysql/create-eggroll-meta-tables.sql; #创建fate_flow库 mysql>CREATE DATABASE IF NOT EXISTS fate_flow; #创建远程用户和授权 1) 192.168.0.1执行 mysql>CREATE USER 'fate'@'192.168.0.1' IDENTIFIED BY 'fate_dev'; mysql>GRANT ALL ON *.* TO 'fate'@'192.168.0.1'; mysql>CREATE USER 'fate'@'192.168.0.2' IDENTIFIED BY 'fate_dev'; mysql>GRANT ALL ON *.* TO 'fate'@'192.168.0.2'; mysql>flush privileges; 2) 192.168.0.3执行 mysql>CREATE USER 'fate'@'192.168.0.3' IDENTIFIED BY 'fate_dev'; mysql>GRANT ALL ON *.* TO 'fate'@'192.168.0.3'; mysql>flush privileges; #insert配置数据 1) 192.168.0.1执行 mysql>INSERT INTO server_node (host, port, node_type, status) values ('192.168.0.1', '9460', 'CLUSTER_MANAGER', 'HEALTHY'); mysql>INSERT INTO server_node (host, port, node_type, status) values ('192.168.0.1', '9461', 'NODE_MANAGER', 'HEALTHY'); mysql>INSERT INTO server_node (host, port, node_type, status) values ('192.168.0.2', '9461', 'NODE_MANAGER', 'HEALTHY'); 2) 192.168.0.3执行 mysql>INSERT INTO server_node (host, port, node_type, status) values ('192.168.0.3', '9460', 'CLUSTER_MANAGER', 'HEALTHY'); mysql>INSERT INTO server_node (host, port, node_type, status) values ('192.168.0.3', '9461', 'NODE_MANAGER', 'HEALTHY'); #校验 mysql>select User,Host from mysql.user; mysql>show databases; mysql>use eggroll_meta; mysql>show tables; mysql>select * from server_node;
在目标服务器(192.168.0.1 192.168.0.1 192.168.0.3)app用户下执行:
#创建jdk安装目录 mkdir -p /data/projects/fate/common/jdk #解压缩 cd /data/projects/install tar xzf jdk-8u192-linux-x64.tar.gz -C /data/projects/fate/common/jdk cd /data/projects/fate/common/jdk mv jdk1.8.0_192 jdk-8u192
在目标服务器(192.168.0.1 192.168.0.2 192.168.0.3)app用户下执行:
#创建python虚拟化安装目录 mkdir -p /data/projects/fate/common/python #安装miniconda3 cd /data/projects/install tar xvf python-env-1.4.0-rc3.tar.gz cd python-env sh Miniconda3-4.5.4-Linux-x86_64.sh -b -p /data/projects/fate/common/miniconda3 #安装virtualenv和创建虚拟化环境 /data/projects/fate/common/miniconda3/bin/pip install virtualenv-20.0.18-py2.py3-none-any.whl -f . --no-index /data/projects/fate/common/miniconda3/bin/virtualenv -p /data/projects/fate/common/miniconda3/bin/python3.6 --no-wheel --no-setuptools --no-download /data/projects/fate/common/python/venv #安装依赖包 tar xvf pip-packages-fate-*.tar.gz source /data/projects/fate/common/python/venv/bin/activate pip install setuptools-42.0.2-py2.py3-none-any.whl pip install -r pip-packages-fate-1.4.0/requirements.txt -f ./pip-packages-fate-1.4.0 --no-index pip list | wc -l #结果应为158
#部署软件 #在目标服务器(192.168.0.1 192.168.0.2 192.168.0.3)app用户下执行: cd /data/projects/install tar xf FATE_install_1.4.0-release.tar.gz cd FATE_install_1.4* tar xvf python.tar.gz -C /data/projects/fate/ tar xvf eggroll.tar.gz -C /data/projects/fate #在目标服务器(192.168.0.1 192.168.0.3)app用户下执行: tar xvf fateboard.tar.gz -C /data/projects/fate #设置环境变量文件 #在目标服务器(192.168.0.1 192.168.0.2 192.168.0.3)app用户下执行: cat >/data/projects/fate/init_env.sh <<EOF export PYTHONPATH=/data/projects/fate/python:/data/projects/fate/eggroll/python export EGGROLL_HOME=/data/projects/fate/eggroll/ venv=/data/projects/fate/common/python/venv source \${venv}/bin/activate export JAVA_HOME=/data/projects/fate/common/jdk/jdk-8u192 export PATH=\$PATH:\$JAVA_HOME/bin EOF
此配置文件rollsite,clustermanager,nodemanager共用,每端party多台主机保持一致,需修改内容:
数据库驱动,数据库对应party用的连接IP,端口,用户名和密码,端口一般默认即可。
eggroll.resourcemanager.clustermanager.jdbc.driver.class.name
eggroll.resourcemanager.clustermanager.jdbc.username
eggroll.resourcemanager.clustermanager.jdbc.password
对应party clustermanager的IP、端口,nodemanager端口,进程tag,端口一般默认即可。
eggroll.resourcemanager.clustermanager.host
eggroll.resourcemanager.clustermanager.port
eggroll.resourcemanager.nodemanager.port
eggroll.resourcemanager.process.tag
Python虚拟环境路径、业务代码pythonpath、JAVA Home路径修改,如果相关路径无变化,保持默认即可。
eggroll.resourcemanager.bootstrap.egg_pair.venv
eggroll.resourcemanager.bootstrap.egg_pair.pythonpath
eggroll.resourcemanager.bootstrap.roll_pair_master.javahome
对应party rollsite的IP、端口、本party的Party Id修改,rollsite的端口一般默认即可。
eggroll.rollsite.host eggroll.rollsite.port eggroll.rollsite.party.id
以上参数调整可以参考如下例子手工配置,也可以使用以下指令完成:
配置文件:/data/projects/fate/eggroll/conf/eggroll.properties
#在目标服务器(192.168.0.1 192.168.0.2)app用户下修改执行 cat > /data/projects/fate/eggroll/conf/eggroll.properties <<EOF [eggroll] #db connect inf eggroll.resourcemanager.clustermanager.jdbc.driver.class.name=com.mysql.cj.jdbc.Driver eggroll.resourcemanager.clustermanager.jdbc.url=jdbc:mysql://192.168.0.1:3306/eggroll_meta?useSSL=false&serverTimezone=UTC&characterEncoding=utf8&allowPublicKeyRetrieval=true eggroll.resourcemanager.clustermanager.jdbc.username=fate eggroll.resourcemanager.clustermanager.jdbc.password=fate_dev eggroll.data.dir=data/ eggroll.logs.dir=logs/ #clustermanager & nodemanager eggroll.resourcemanager.clustermanager.host=192.168.0.1 eggroll.resourcemanager.clustermanager.port=4670 eggroll.resourcemanager.nodemanager.port=4671 eggroll.resourcemanager.process.tag=fate-host eggroll.bootstrap.root.script=bin/eggroll_boot.sh eggroll.resourcemanager.bootstrap.egg_pair.exepath=bin/roll_pair/egg_pair_bootstrap.sh #python env eggroll.resourcemanager.bootstrap.egg_pair.venv=/data/projects/fate/common/python/venv #pythonpath, very import, do not modify. eggroll.resourcemanager.bootstrap.egg_pair.pythonpath=/data/projects/fate/python:/data/projects/fate/eggroll/python eggroll.resourcemanager.bootstrap.egg_pair.filepath=python/eggroll/roll_pair/egg_pair.py eggroll.resourcemanager.bootstrap.roll_pair_master.exepath=bin/roll_pair/roll_pair_master_bootstrap.sh #javahome eggroll.resourcemanager.bootstrap.roll_pair_master.javahome=/data/projects/fate/common/jdk/jdk-8u192 eggroll.resourcemanager.bootstrap.roll_pair_master.classpath=conf/:lib/* eggroll.resourcemanager.bootstrap.roll_pair_master.mainclass=com.webank.eggroll.rollpair.RollPairMasterBootstrap eggroll.resourcemanager.bootstrap.roll_pair_master.jvm.options= # for roll site. rename in the next round eggroll.rollsite.coordinator=webank eggroll.rollsite.host=192.168.0.1 eggroll.rollsite.port=9370 eggroll.rollsite.party.id=10000 eggroll.rollsite.route.table.path=conf/route_table.json eggroll.session.processors.per.node=4 eggroll.session.start.timeout.ms=180000 eggroll.rollsite.adapter.sendbuf.size=1048576 eggroll.rollpair.transferpair.sendbuf.size=4150000 EOF #在目标服务器(192.168.0.3)app用户下修改执行 cat > /data/projects/fate/eggroll/conf/eggroll.properties <<EOF [eggroll] #db connect inf eggroll.resourcemanager.clustermanager.jdbc.driver.class.name=com.mysql.cj.jdbc.Driver eggroll.resourcemanager.clustermanager.jdbc.url=jdbc:mysql://192.168.0.3:3306/eggroll_meta?useSSL=false&serverTimezone=UTC&characterEncoding=utf8&allowPublicKeyRetrieval=true eggroll.resourcemanager.clustermanager.jdbc.username=fate eggroll.resourcemanager.clustermanager.jdbc.password=fate_dev eggroll.data.dir=data/ eggroll.logs.dir=logs/ #clustermanager & nodemanager eggroll.resourcemanager.clustermanager.host=192.168.0.3 eggroll.resourcemanager.clustermanager.port=4670 eggroll.resourcemanager.nodemanager.port=4671 eggroll.resourcemanager.process.tag=fate-guest eggroll.bootstrap.root.script=bin/eggroll_boot.sh eggroll.resourcemanager.bootstrap.egg_pair.exepath=bin/roll_pair/egg_pair_bootstrap.sh #python env eggroll.resourcemanager.bootstrap.egg_pair.venv=/data/projects/fate/common/python/venv #pythonpath, very import, do not modify. eggroll.resourcemanager.bootstrap.egg_pair.pythonpath=/data/projects/fate/python:/data/projects/fate/eggroll/python eggroll.resourcemanager.bootstrap.egg_pair.filepath=python/eggroll/roll_pair/egg_pair.py eggroll.resourcemanager.bootstrap.roll_pair_master.exepath=bin/roll_pair/roll_pair_master_bootstrap.sh #javahome eggroll.resourcemanager.bootstrap.roll_pair_master.javahome=/data/projects/fate/common/jdk/jdk-8u192 eggroll.resourcemanager.bootstrap.roll_pair_master.classpath=conf/:lib/* eggroll.resourcemanager.bootstrap.roll_pair_master.mainclass=com.webank.eggroll.rollpair.RollPairMasterBootstrap eggroll.resourcemanager.bootstrap.roll_pair_master.jvm.options= # for roll site. rename in the next round eggroll.rollsite.coordinator=webank eggroll.rollsite.host=192.168.0.3 eggroll.rollsite.port=9370 eggroll.rollsite.party.id=9999 eggroll.rollsite.route.table.path=conf/route_table.json eggroll.session.processors.per.node=4 eggroll.session.start.timeout.ms=180000 eggroll.rollsite.adapter.sendbuf.size=1048576 eggroll.rollpair.transferpair.sendbuf.size=4150000 EOF
此配置文件rollsite使用,配置路由信息,可以参考如下例子手工配置,也可以使用以下指令完成:
配置文件: /data/projects/fate/eggroll/conf/route_table.json
#在目标服务器(192.168.0.2)app用户下修改执行 cat > /data/projects/fate/eggroll/conf/route_table.json << EOF { "route_table": { "10000": { "default":[ { "port": 9370, "ip": "192.168.0.2" } ], "fateflow":[ { "port": 9360, "ip": "192.168.0.1" } ] }, "9999": { "default":[ { "port": 9370, "ip": "192.168.0.3" } ] } }, "permission": { "default_allow": true } } EOF #在目标服务器(192.168.0.3)app用户下修改执行 cat > /data/projects/fate/eggroll/conf/route_table.json << EOF { "route_table": { "9999": { "default":[ { "port": 9370, "ip": "192.168.0.3" } ], "fateflow":[ { "port": 9360, "ip": "192.168.0.3" } ] }, "10000": { "default":[ { "port": 9370, "ip": "192.168.0.2" } ] } }, "permission": { "default_allow": true } } EOF
fateflow
fateflow IP ,host:192.168.0.1,guest:192.168.0.3
grpc端口:9360
http端口:9380
- fateboard
fateboard IP,host:192.168.0.1,guest:192.168.0.3
fateboard端口:8080
proxy
proxy IP,host:192.168.0.2,guest:192.168.0.3—rollsite组件对应IP
proxy端口:9370
此文件要按照json格式进行配置,不然会报错,可以参考如下例子手工配置,也可以使用以下指令完成。
配置文件:data/projects/fate/python/arch/conf/server_conf.json
#在目标服务器(192.168.0.1 192.168.0.2)app用户下修改执行 cat > /data/projects/fate/python/arch/conf/server_conf.json << EOF { "servers": { "fateflow": { "host": "192.168.0.1", "grpc.port": 9360, "http.port": 9380 }, "fateboard": { "host": "192.168.0.1", "port": 8080 }, "proxy": { "host": "192.168.0.2", "port": 9370 }, "servings": [ "127.0.0.1:8000" ] } } EOF #在目标服务器(192.168.0.3)app用户下修改执行 cat > /data/projects/fate/python/arch/conf/server_conf.json << EOF { "servers": { "fateflow": { "host": "192.168.0.3", "grpc.port": 9360, "http.port": 9380 }, "fateboard": { "host": "192.168.0.3", "port": 8080 }, "proxy": { "host": "192.168.0.3", "port": 9370 }, "servings": [ "127.0.0.1:8000" ] } } EOF
work_mode(为1表示集群模式,默认)
db的连接ip、端口、账号和密码
redis IP、端口、密码(redis暂使用不需要配置)
此配置文件格式要按照yaml格式配置,不然解析报错,可以参考如下例子手工配置,也可以使用以下指令完成。
配置文件:/data/projects/fate/python/arch/conf/base_conf.yaml
#在目标服务器(192.168.0.1)app用户下修改执行 cat > /data/projects/fate/python/arch/conf/base_conf.yaml <<EOF work_mode: 1 fate_flow: host: 0.0.0.0 http_port: 9380 grpc_port: 9360 database: name: fate_flow user: fate passwd: fate_dev host: 192.168.0.1 port: 3306 max_connections: 100 stale_timeout: 30 redis: host: 127.0.0.1 port: 6379 password: WEBANK_2014_fate_dev max_connections: 500 db: 0 default_model_store_address: storage: redis host: 127.0.0.1 port: 6379 password: fate_dev db: 0 EOF #在目标服务器(192.168.0.3)app用户下修改执行 cat > /data/projects/fate/python/arch/conf/base_conf.yaml <<EOF work_mode: 1 fate_flow: host: 0.0.0.0 http_port: 9380 grpc_port: 9360 database: name: fate_flow user: fate passwd: fate_dev host: 192.168.0.3 port: 3306 max_connections: 100 stale_timeout: 30 redis: host: 127.0.0.1 port: 6379 password: WEBANK_2014_fate_dev max_connections: 500 db: 0 default_model_store_address: storage: redis host: 127.0.0.1 port: 6379 password: fate_dev db: 0 EOF
1)application.properties
服务端口
server.port—默认
fateflow的访问url
fateflow.url,host:http://192.168.0.1:9380,guest:http://192.168.0.3:9380
数据库连接串、账号和密码
fateboard.datasource.jdbc-url,host:mysql://192.168.0.1:3306,guest:mysql://192.168.0.3:3306
fateboard.datasource.username:fate
fateboard.datasource.password:fate_dev
以上参数调整可以参考如下例子手工配置,也可以使用以下指令完成:
配置文件:/data/projects/fate/fateboard/conf/application.properties
#在目标服务器(192.168.0.1)app用户下修改执行 cat > /data/projects/fate/fateboard/conf/application.properties <<EOF server.port=8080 fateflow.url=http://192.168.0.1:9380 spring.datasource.driver-Class-Name=com.mysql.cj.jdbc.Driver spring.http.encoding.charset=UTF-8 spring.http.encoding.enabled=true server.tomcat.uri-encoding=UTF-8 fateboard.datasource.jdbc-url=jdbc:mysql://192.168.0.1:3306/fate_flow?characterEncoding=utf8&characterSetResults=utf8&autoReconnect=true&failOverReadOnly=false&serverTimezone=GMT%2B8 fateboard.datasource.username=fate fateboard.datasource.password=fate_dev server.tomcat.max-threads=1000 server.tomcat.max-connections=20000 EOF #在目标服务器(192.168.0.3)app用户下修改执行 cat > /data/projects/fate/fateboard/conf/application.properties <<EOF server.port=8080 fateflow.url=http://192.168.0.3:9380 spring.datasource.driver-Class-Name=com.mysql.cj.jdbc.Driver spring.http.encoding.charset=UTF-8 spring.http.encoding.enabled=true server.tomcat.uri-encoding=UTF-8 fateboard.datasource.jdbc-url=jdbc:mysql://192.168.0.3:3306/fate_flow?characterEncoding=utf8&characterSetResults=utf8&autoReconnect=true&failOverReadOnly=false&serverTimezone=GMT%2B8 fateboard.datasource.username=fate fateboard.datasource.password=fate_dev server.tomcat.max-threads=1000 server.tomcat.max-connections=20000 EOF
2)service.sh
#在目标服务器(192.168.0.1 192.168.0.3)app用户下修改执行 cd /data/projects/fate/fateboard vi service.sh export JAVA_HOME=/data/projects/fate/common/jdk/jdk-8u192
在目标服务器(192.168.0.2)app用户下执行
#启动eggroll服务 source /data/projects/fate/init_env.sh cd /data/projects/fate/eggroll sh ./bin/eggroll.sh rollsite start sh ./bin/eggroll.sh nodemanager start
在目标服务器(192.168.0.1)app用户下执行
#启动eggroll服务 source /data/projects/fate/init_env.sh cd /data/projects/fate/eggroll sh ./bin/eggroll.sh clustermanager start sh ./bin/eggroll.sh nodemanager start #启动fate服务,fateflow依赖rollsite和mysql的启动,等所有节点的eggroll都启动后再启动fateflow, 否则会卡死报错 source /data/projects/fate/init_env.sh cd /data/projects/fate/python/fate_flow sh service.sh start cd /data/projects/fate/fateboard sh service.sh start
在目标服务器(192.168.0.3)app用户下执行
#启动eggroll服务 source /data/projects/fate/init_env.sh cd /data/projects/fate/eggroll sh ./bin/eggroll.sh all start #启动fate服务 source /data/projects/fate/init_env.sh cd /data/projects/fate/python/fate_flow sh service.sh start cd /data/projects/fate/fateboard sh service.sh start
1)eggroll日志
/data/projects/fate/eggroll/logs/eggroll/bootstrap.clustermanager.err
/data/projects/fate/eggroll/logs/eggroll/clustermanager.jvm.err.log
/data/projects/fate/eggroll/logs/eggroll/nodemanager.jvm.err.log
/data/projects/fate/eggroll/logs/eggroll/bootstrap.nodemanager.err
/data/projects/fate/eggroll/logs/eggroll/bootstrap.rollsite.err
/data/projects/fate/eggroll/logs/eggroll/rollsite.jvm.err.log
2)fateflow日志
/data/projects/fate/python/logs/fate_flow/
3)fateboard日志
/data/projects/fate/fateboard/logs
此测试您需要设置3个参数:guest_partyid,host_partyid,work_mode。
1)192.168.0.1上执行,guest_partyid和host_partyid都设为10000:
source /data/projects/fate/init_env.sh cd /data/projects/fate/python/examples/toy_example/ python run_toy_example.py 10000 10000 1
类似如下结果表示成功:
“2020-04-28 18:26:20,789 - secure_add_guest.py[line:126] - INFO: success to calculate secure_sum, it is 1999.9999999999998”
2)192.168.0.3上执行,guest_partyid和host_partyid都设为10000:
source /data/projects/fate/init_env.sh cd /data/projects/fate/python/examples/toy_example/ python run_toy_example.py 9999 9999 1
类似如下结果表示成功:
“2020-04-28 18:26:20,789 - secure_add_guest.py[line:126] - INFO: success to calculate secure_sum, it is 1999.9999999999998”
选定9999为guest方,在192.168.0.3上执行:
source /data/projects/fate/init_env.sh cd /data/projects/fate/python/examples/toy_example/ python run_toy_example.py 9999 10000 1
类似如下结果表示成功:
“2020-04-28 18:26:20,789 - secure_add_guest.py[line:126] - INFO: success to calculate secure_sum, it is 1999.9999999999998”
在guest和host两方各任一egg节点中,根据需要在run_task.py中设置字段:guest_id,host_id,arbiter_id。
该文件在/data/projects/fate/python/examples/min_test_task/目录下。
在Host节点上运行:
source /data/projects/fate/init_env.sh cd /data/projects/fate/python/examples/min_test_task/ sh run.sh host fast
从测试结果中获取“host_table”和“host_namespace”的值,并将它们作为参数传递给下述guest方命令。
在Guest节点上运行:
source /data/projects/fate/init_env.sh cd /data/projects/fate/python/examples/min_test_task/ sh run.sh guest fast ${host_table} ${host_namespace}
等待几分钟,看到结果显示“成功”字段,表明操作成功。在其他情况下,如果失败或卡住,则表示失败。
只需在命令中将“fast”替换为“normal”,其余部分与快速模式相同。
Fateboard是一项Web服务。如果成功启动了fateboard服务,则可以通过访问 http://192.168.0.1:8080 和 http://192.168.0.2:8080 来查看任务信息,如果有防火墙需开通。如果fateboard和fateflow没有部署再同一台服务器,需在fateboard页面设置fateflow所部署主机的登陆信息:页面右上侧齿轮按钮–》add–》填写fateflow主机ip,os用户,ssh端口,密码。
在目标服务器(192.168.0.1 192.168.0.2 192.168.0.3)app用户下执行
source /data/projects/fate/init_env.sh cd /data/projects/fate/eggroll
启动/关闭/查看/重启所有:
sh ./bin/eggroll.sh all start/stop/status/restart
启动/关闭/查看/重启单个模块(可选:clustermanager,nodemanager,rollsite):
sh ./bin/eggroll.sh clustermanager start/stop/status/restart
- 启动/关闭/查看/重启fate_flow服务
source /data/projects/fate/init_env.sh cd /data/projects/fate/python/fate_flow sh service.sh start|stop|status|restart
如果逐个模块启动,需要先启动eggroll再启动fateflow,fateflow依赖eggroll的启动。
- 启动/关闭/重启fateboard服务
cd /data/projects/fate/fateboard sh service.sh start|stop|status|restart
启动/关闭/查看/重启mysql服务
cd /data/projects/fate/common/mysql/mysql-8.0.13 sh ./service.sh start|stop|status|restart
在目标服务器(192.168.0.1 192.168.0.2 192.168.0.3)app用户下执行
#根据部署规划查看进程是否启动 ps -ef | grep -i clustermanager ps -ef | grep -i nodemanager ps -ef | grep -i rollsite ps -ef | grep -i fate_flow_server.py ps -ef | grep -i fateboard
#根据部署规划查看进程端口是否存在 #clustermanager netstat -tlnp | grep 4670 #nodemanager netstat -tlnp | grep 4671 #rollsite netstat -tlnp | grep 9370 #fate_flow_server netstat -tlnp | grep 9360 #fateboard netstat -tlnp | grep 8080
服务 | 日志路径 |
---|---|
eggroll | /data/projects/fate/eggroll/logs |
fate_flow&任务日志 | /data/projects/fate/python/logs |
fateboard | /data/projects/fate/fateboard/logs |
mysql | /data/projects/fate/common/mysql/mysql-8.0.13/logs |
参见build指导