spark 是一个不错的平台,支持rdd 分析stream 机器学习。。。
以下为使用kubernetes 部署的说明,以及注意的地方具体的容器镜像使用别人已经构建好的
deploy yaml 文件
deploy-k8s.yaml
apiVersion: extensions/v1beta1kind: Deploymentmetadata: name: spark-master namespace: big-data labels: app: spark-masterspec: replicas: 1 template: metadata: labels: app: spark-master spec: containers: - name: spark-master image: bde2020/spark-master:2.3.1-hadoop2.7 imagePullPolicy: IfNotPresent ports: - containerPort: 7077 - containerPort: 8080 env: - name: ENABLE_INIT_DAEMON value: "false" - name: SPARK_MASTER_PORT value: "7077"---apiVersion: v1kind: Servicemetadata: name: spark-master-service namespace: big-dataspec: type: NodePort ports: - port: 7077 targetPort: 7077 protocol: TCP name: master selector: app: spark-master---apiVersion: v1kind: Servicemetadata: name: spark-webui-service namespace: big-dataspec: ports: - port: 8080 targetPort: 8080 protocol: TCP name: ui selector: app: spark-master type: NodePort---apiVersion: extensions/v1beta1kind: Ingressmetadata: name: spark-webui-ingress namespace: big-dataspec: rules: - host: spark-webui.data.com http: paths: - backend: serviceName: spark-webui-service servicePort: 8080 path: /---apiVersion: extensions/v1beta1kind: Deploymentmetadata: name: spark-worker namespace: big-data labels: app: spark-workerspec: replicas: 1 template: metadata: labels: app: spark-worker spec: containers: - name: spark-worker image: bde2020/spark-worker:2.3.1-hadoop2.7 imagePullPolicy: IfNotPresent env: - name: SPARK_MASTER value: spark://spark-master-service:7077 - name: ENABLE_INIT_DAEMON value: "false" - name: SPARK_WORKER_WEBUI_PORT value: "8081" ports: - containerPort: 8081---apiVersion: v1kind: Servicemetadata: name: spark-worker-service namespace: big-dataspec: type: NodePort ports: - port: 8081 targetPort: 8081 protocol: TCP name: worker selector: app: spark-worker---apiVersion: extensions/v1beta1kind: Ingressmetadata: name: spark-worker-ingress namespace: big-dataspec: rules: - host: spark-worker.data.com http: paths: - backend: serviceName: spark-worker-service servicePort: 8081 path: /
部署&&运行
- 部署
kubectl apply -f deploy-k8s.yaml
- 效果
使用ingress 访问,访问域名 spark-webui.data.com
说明
- 命名的问题
平时的习惯是deploy service 命名为一样的,但是就是这个就有问题的,因为k8s 默认会进行环境变量的注入,所以居然冲突的。解决方法,修改名称,重新发布具体问题:dockerfile 中的以下环境变量ENV SPARK_MASTER_PORT 7077
- spark 任务运行
具体的运行可以参考官方demo,后期也会添加
参考资料
<wiz_tmp_tag id="wiz-table-range-border" contenteditable="false" style="display: none;">