IT Panda Blog

Life is fantastic


  • Home

  • Tags

  • Categories

  • Archives

Train Machine Learning models with MLflow, Deploy with Seldon

Posted on 2019-11-10 In machine learning

MLflow

MLflow: the open-source platform for the machine leaning lifecycle, 管理machine learning整个生命周期的一款开源产品,主要提供了三种服务:

  • MLflow Tracking: 记录并维护了machine learning的代码,数据,matrics,config,results…并结合UI展示
  • MLflow Projects: 将machine learning的model带包成一个docker image,实现run anywhere
  • MLflow Models: 标准化machine learning的model及其configuration files,实现与其他平台共同开发/部署

几乎支持市面上的所有Machine Learning frameworks, TensorFlow/PyTorch/Spark/SKlearn/R…

开源,并有着Databricks/Microsoft等一众公司的committer.

Seldon

Seldon: the open-source platform to help deploy machine learning models, 主要focus在model的deployment

  • 可以deploy市面上几乎所有的machine learning model
  • 不仅可以deploy在both in cloud and on-promise
  • expose metrics/HTTP trance等monitoring信息

Example: Train ML model using MLflow, Deploy using Seldon

Train Model with MLflow

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
import os
import warnings
import sys

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

import mlflow
import mlflow.sklearn

def eval_metrics(actual, pred):
rmse = np.sqrt(mean_squared_error(actual, pred))
mae = mean_absolute_error(actual, pred)
r2 = r2_score(actual, pred)
return rmse, mae, r2

if __name__ == "__main__":
warnings.filterwarnings("ignore")
np.random.seed(40)

# Read the wine-quality csv file (make sure you're running this from the root of MLflow!)
wine_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "wine-quality.csv")
data = pd.read_csv(wine_path)

# Split the data into training and test sets. (0.75, 0.25) split.
train, test = train_test_split(data)

# The predicted column is "quality" which is a scalar from [3, 9]
train_x = train.drop(["quality"], axis=1)
test_x = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y = test[["quality"]]

alpha = float(sys.argv[1]) if len(sys.argv) > 1 else 0.5
l1_ratio = float(sys.argv[2]) if len(sys.argv) > 2 else 0.5

mlflow.set_experiment('test')

with mlflow.start_run():
lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
lr.fit(train_x, train_y)
predicted_qualities = lr.predict(test_x)
(rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)
print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
print(" RMSE: %s" % rmse)
print(" MAE: %s" % mae)
print(" R2: %s" % r2)
mlflow.log_param("alpha", alpha)
mlflow.log_param("l1_ratio", l1_ratio)
mlflow.log_metric("rmse", rmse)
mlflow.log_metric("r2", r2)
mlflow.log_metric("mae", mae)
mlflow.sklearn.log_model(lr, "model")

在对应的Model Storage下,可以看到MLmodel文件这个文件内包含了很多信息:模型本身model.pkl,模型产生的env conda.yaml… 之后Seldon会读取这部分信息去做deploy

1
2
3
4
5
6
7
8
9
10
11
12
artifact_path: model
flavors:
python_function:
data: model.pkl
env: conda.yaml
loader_module: mlflow.sklearn
python_version: 3.6.5
sklearn:
pickled_model: model.pkl
serialization_format: cloudpickle
sklearn_version: 0.21.3
run_id: 26f04f36493b4982a064bb8d6e9d9b30

Deploy ML model with Seldon

Prerequisites:

  • a k8s cluster
  • helm installed
    1
    2
    3
    4
    5
    6
    7
    8
    curl https://raw.githubusercontent.com/helm/helm/master/scripts/get > get_helm.sh
    chmod 777 get_helm.sh
    ./get_helm.sh
    helm init #install Tiller, a deployment/service/pod of Tiller will be installed automatically in **kube-system** NS
    #### create account for Tiller
    kubectl create serviceaccount --namespace kube-system tiller
    kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
    kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'

install Seldon

1
2
3
4
5
6
7
helm install \
seldon-core-operator \
--name seldon-core \
--repo https://storage.googleapis.com/seldon-charts \
--namespace seldon-system \
--set usagemetrics.enabled=true \
--set ambassador.enabled=true

install Ambassador (k8s cloud native gateway)

1
2
helm install stable/ambassador --name ambassador --set crds.keep=false
kubectl rollout status deployment.apps/ambassador

Port forwarding:

1
2
#### run below command in another terminal
kubectl port-forward $(kubectl get pods -l app.kubernetes.io/name=ambassador -o jsonpath='{.items[0].metadata.name}') 8003:8080

install Seldon Analytics

1
2
3
4
5
# install Seldon Analytics with prometheus and grafana
helm install seldon-core-analytics --name seldon-core-analytics \
--repo https://storage.googleapis.com/seldon-charts \
--set grafana_prom_admin_password=password \
--set persistence.enabled=false
1
2
3
4
5
#### run below command in another terminal
kubectl port-forward \
$(kubectl get pods \
-l app=grafana-prom-server -o jsonpath='{.items[0].metadata.name}') \
3000:3000

Deploy Model

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: test
spec:
name: rex
predictors:
- graph:
children: []
implementation: MLFLOW_SERVER
modelUri: s3://mlflow/xxx/artifacts/model
envSecretRefName: s3-secret
name: classifier
name: default
replicas: 1
machine learning mlflow seldon k8s
Kubernetes DNS, kube-dns, CoreDNS
Kubernetes Namespace Stuck in status Terminating
  • Table of Contents
  • Overview
Rex

Rex

25 posts
26 categories
49 tags
Links
  • GitHub
  1. 1. MLflow
  2. 2. Seldon
  3. 3. Example: Train ML model using MLflow, Deploy using Seldon
    1. 3.1. Train Model with MLflow
    2. 3.2. Deploy ML model with Seldon
      1. 3.2.1. install Seldon
      2. 3.2.2. install Ambassador (k8s cloud native gateway)
      3. 3.2.3. install Seldon Analytics
      4. 3.2.4. Deploy Model
© 2019 – 2020 作者拥有版权,转载请注明出处