438 Views
June 10, 19
スライド概要
こちらは、Sansan Innovation Lab 様での機械学習を勉強会にてテーマにしたヤフーのデータサイエンス部門の山本康生が発表資料です。イベントの詳細、https://sansan.connpass.com/event/129358/。#33SIL
2023年10月からSpeaker Deckに移行しました。最新情報はこちらをご覧ください。 https://speakerdeck.com/lycorptech_jp
Beyond Machine Learning Modeling via SysML Yasuo YAMAMOTO 2019.05.28 SIL勉強会 機械学習編
5 key questions
新しいアプローチをフル・スケールで 即座に検証することが可能ですか? How easily can an entirely new algorithmic approach be tested at full scale? (1/5)
全ての学習データの依存関係を 把握していますか? What is the transitive closure of all data dependencies? (2/5)
変更による影響を 正確に測ることができますか? How precisely can the impact of a new change to the system be measured? (3/5)
あるモデルの改善が他のモデルの劣化を 引き起こしていませんか? Does improving one model or signal degrade others? (4/5)
新しいメンバが加わった際に 迅速に業務開始することができますか? How quickly can new members of the team be brought up to speed? (5/5)
Problems
This row only Machine Learning that Matters ICML 2012 https://arxiv.org/ftp/arxiv/papers/1206/1206.4656.pdf
So tiny! Hidden Technical Debt in Machine Learning Systems NIPS 2015 https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
Conclusions: Measuring Debt and Paying it Off ✓How easily can an entirely new algorithmic approach be tested at full scale? ✓What is the transitive closure of all data dependencies? ✓How precisely can the impact of a new change to the system be measured? ✓Does improving one model or signal degrade others? ✓How quickly can new members of the team be brought up to speed? “Perhaps the most important insight to be gained is that technical debt is an issue that engineers and researchers both need to be aware of. Research solutions that provide a tiny accuracy benefit at the cost of massive increases in system complexity are rarely wise practice. Even the addition of one or two seemingly innocuous data dependencies can slow further progress.” Hidden Technical Debt in Machine Learning Systems NIPS 2015 https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
Microservice, ML Pipeline, and BEST Practice
Microservices Architecture “25 March 2014” • James Lewis/Martin Fowler The term "Microservice Architecture" has sprung up over the last few years to describe a particular way of designing software applications as suites of independently deployable services. While there is no precise definition of this architectural style, there are certain common characteristics around organization around business capability, automated deployment, intelligence in the endpoints, and decentralized control of languages and data. 図左はモノリシック、図右はマイクロサービス https://martinfowler.com/articles/microservices.html
TFX: A TensorFlow-Based Production-Scale Machine LearningPlatform KDD 2017 “We present TensorFlow Extended (TFX), a TensorFlow-based general-purpose machine learning platform implemented at Google. By integrating the aforementioned components into one platform, we were able to standardize the components, simplify the platform configuration, and reduce the time to production from the order of months to weeks, while providing platform stability that minimizes disruptions” http://stevenwhang.com/tfx̲paper.pdf
AI Platform Google Cloud https://cloud.google.com/ai-platform/?hl=ja GoogleはAI-Platform、MicrosoftはKubeflow-labsを公開
Kubeflow ML Pipeline Kubeflow is an open source Kubernetes native platform for developing, orchestrating, deploying, and running scalable and portable machine learning workloads. https://www.kubeflow.org/docs/about/kubeflow/
Machine Learning Logistics •Model Management in the Real World Turns out that 90% of the effort required for success in machine learning is not the algorithm or the model or the learning - it's the logistics. Ted Dunning and Ellen Friedman identify what matters in machine learning logistics, what challenges arise, especially in a production setting, and they introduce an innovative solution: the rendezvous architecture. プロダクション環境の推論部分に関するべクスプラクティス https://mapr.com/ebook/machine-learning-logistics/
Input Data as a Stream •The Rendezvous Architecture One strength of the rendezvous architecture is that a model can be “warmed up” before its outputs are actually used so that the stability of the model under production conditions and load can be verified. Another advantage is that models can be “deployed” or “undeployed” simply by instructing the rendezvous server to stop (or start) ignoring their output. 逐次データを並列にモデルに入力する構成
The Decoy Model •Handling external state Nothing is ever quite as real as real data. As a result, recording live input data is extraordinarily helpful for developing and evaluating machine learning models. This doesn’t seem at first like it would be much of an issue, and it is common for new data scientists to make the mistake of trusting that a database or log file is a faithful record of what data was or would have been given to a model. 逐次データを保存する機構
The Canary Model •For detecting input shifts, the distribution of outputs for the canary can be recorded and recent distributions can be compared to older distributions. For simple scores, distribution of score can be summarized over short periods of time using a sketch like the t-digest. … we can use the request identifier to match up all the model results and compare each result against all others. 入力データの差異・モデル出力の差異を検出する機構
Env. Between Production and Development •Stream replication During development, the raw and input streams can be replicated into a development environment, … if the new model requires a change to the external data injector, you should replicate the raw stream, instead. 開発環境(ポストプロダクション環境)にプロダク ション環境の入力を冗長化させる構成
Q-Q Plot •Q-Qプロット “2つの確率分布を互いに対してプロットすることによって比 較する統計学のグラフィカルな方法である。まず、分位数の 区間の集合が選択される。 プロット上の点 (x, y) は、第1の 分布の同じ分位数(x座標)に対してプロットされた第2の 分布の分位数の1つ(y座標)に対応する。従って、線は、 パラメータを有するパラメトリック曲線であり、このパラメ トリック曲線は、分位点をつないだものである。“ (Wikipedia Q-Qプロットより) モデル間の予測値の差の評価
Beware of Hidden Dependencies •Data Coupling A: 80% red fraud, 0% blue fraud B: 0% red fraud, 80% blue fraud ↓ A’: 40% red fraud, 100% blue fraud A’ will find 100 percent of the blue frauds and 40 percent of the red frauds. B, working on the transactions A said were clean, will have no blue frauds to find, and it never finds any red frauds anyway, so B won’t find any frauds at all. あるモデルの改善が全体の精度を悪化させる例
Rules of Machine Learning :Best Practices for ML Engineering •Martin Zinkevich (Google) This document is intended to help those with a basic knowledge of machine learning get the benefit of best practices in machine learning from around Google. It presents a style for machine learning, similar to the Google C++ Style Guide and other popular guides to practical programming. If you have taken a class in machine learning, or built or worked on machine learned model, then you have the necessary background to read this document. 計43のルール http://martin.zinkevich.org/rules̲of̲ml/rules̲of̲ml.pdf
To make great products “do machine learning like the great engineer you are, not like the great machine learning expert you arenʼt.”
ML Phase I: Your First Pipeline • Rule #4: Keep the first model simple and get the infrastructure right. “The first model provides the biggest boost to your product, so it doesn't need to be fancy. But you will run into many more infrastructure issues than you expect. … Your simple model provides you with baseline metrics and a baseline behavior that you can use to test more complex models.”
ML Phase I: Your First Pipeline • Rule #5: Test the infrastructure independently from the machine learning. “Make sure that the infrastructure is testable, and that the learning parts of the system are encapsulated so that you can test everything around it.”
ML Phase I: Your First Pipeline • Rule #6: Be careful about dropped data when copying pipelines. “Often we create a pipeline by copying an existing pipeline (i.e. cargo cult programming), and the old pipeline drops data that we need for the new pipeline. ... This pipeline was copied to use for Google Plus Stream, where older posts are still meaningful, but the pipeline was still dropping old posts.”
Your First Objective • Rule #14: Starting with an interpretable model makes debugging easier. “Linear regression, logistic regression, and Poisson regression are directly motivated by a probabilistic model. Each prediction is interpretable as a probability or an expected value. This makes them easier to debug than models that use objectives (zero one loss, various hinge losses, et cetera) that try to directly optimize classification accuracy or ranking performance.”
ML Phase II: Feature Engineering • Rule #17: Start with directly observed and reported features as opposed to learned features. “A learned feature is a feature generated either by an external system (such as an unsupervised clustering system) or by the learner itself (e.g. via a factored model or deep learning). Both of these can be useful, but they can have a lot of issues, so they should not be in the first model.”
Human Analysis of the System • Rule #24: Measure the delta between models. “One of the easiest, and sometimes most useful measurements you can make before any users have looked at your new model is to calculate just how different the new results are from production. ... If the difference is very small, then you can tell without running an experiment that there will be little change. If the difference is very large, then you want to make sure that the change is good.”
Conclusion
まとめ ✓プロダクション環境で機械学習モデルを扱うには体系的知識が必要 ✓その知識を踏襲したマイクロサービス、コンテナ技術が発展 ✓クラウド事業や独自データ基盤を持つ会社・組織では技術開発が活発 ✓Kubeflow, TFXなど機械学習パイプライン基盤として普及しつつある
EOP