Machine learning with tensorflow on Google Cloud Platform
1,To be successful at ML, you need to think, not just about creating models, but also serving out ML predictions
2,we should make sure that we could process batch data and stream data the same way.
这和#1说的是同一个问题,有很多公司使用ML改进业务都是失败的,比如建好了模型,却不知道如何把生产中的数据源源不断地塞进这个模型来进行训练,比如:batch data,也就是常见的日志文件,图片等,stream data也就是收集上来的metrics和events,你有考虑过如何把这些不同类型的实际数据,转换成你模型定义的数据,然后传输进模型吗?
3,need to be good at data engineering
Machine Learning
Data Pipeline
Data Analytics
Data Collection
Scalability Reliability Engineering
4,what's difference between ML and AI?
AI(artificial intelligence) is discipline, make machine act like human
ML(machine learning) is toolset, like neurton network
AI contains ML(AI技术包含ML)
5,the old neurton network just have one hidden layer for :
computer power
computational tricks
6,every product in Google has a dozen of ML models
this is an example:
Predict product demand
Predict inventory
Predict restocking time
Google photos
Google translate: when your take photo to a signal that you don not recognize
Model1: Identify the Sign
Model2: OCR the Characters
Model3: Identify Language
Model4: Translate Language
Model5: Superimpose Text
Model6: Select Correct Font
比如你用google翻译,拍了个标识,然后自动识别,这里就至少涉及了6个模型:1,识别出标识(停车、限速的牌子等)2,提取出字符 3,识别字符的语言 4,翻译 5,把翻译后的内容覆盖到原先的标识上 6,选择一个适合的字体
Smart Reply Inbox: a complicated ML application
sequence to sequence model
the output of previous model will be the input of next model
7, what kinds of problems can ML solve?
eric schmidt said ML is about replacing programming, but most of us think of predicting data.
Machine learning scales better than hand-coded rules
like you search "park" in search engine:
hand-coded rules are really hard to maintain, ML scales better because it's automated
Google RankBrain (a deep neural network for search ranking and improve performance significantly)
So, we get conclusion: what kinds of problems can ML solve?
the answer is Anything for which you are writing rules for today
8,It'all about data
when you search "coffee near me"
example equals labelled data, label this above example is "Does the user like the result or does he not?"
9,Framing an ML problem
cast it as learning problem(what data is for training, what is for predicting?)
机器学习层面:要训练哪些数据,要预测哪些信息,如何模型成 train_data 与 label(这里要结合tensorflow的样例代码,比如数字09自然可以用09来表述数据的label,但真实问题,如何定义数据的label呢?)
cast it as software problem(API for service,who will use service?how it doing today?)
cast it in framework of a data problem(key actions to collect,analyze,)
Some scenario
10,Infuse your apps with ML
AUCNET as an example
11,What is the pre-trained model?
GCP provide:
Vision API
Speech API
Jobs API
Translation API
Natural Language API
Video Intelligence API
12,The ML marketplace is moving towards increasing levels of ML abstraction
ML的市场发展方向,是提升机器学习的抽象能力(怎么理解这句话呢,个人理解就好比从小学到大学甚至master phd所接触到的数学一样)
数学的核心是通过模型来解释现实,而很明显,y = kx+b这种方程能概括的现实问题远不如 傅里叶能 抽线的现实问题多
13,Build a data strategy around ML
14,Simple ML and More Data > Fancy ML and Small Data
so spend your energy collecting more data, not only quantity but also varity
15,how to successfully applied ML?
Collecting data is often the longest and hardest part of ML project and the most likey to fail
collecting data contains rating, rating means finding labels for the data
ML is a journey towards automation and scale
when we talking ML, most engineers keep thinking training, but the true utility of ML comes during predictions
your models have to work on streaming data
sometimes fail cuz something called training-serving skew
to reduce this skew, you'd better take the same code that was used to process historical data during training and reuse it during predictions
your data pipeline have to process both batch and stream
你的数据管道需要能同时处理batch和stream data,这句和上面的work on streaming data是一个意思。batch data好理解也好实现,但是stream data就没那么好处理(这里也好想明白,特别机器学习这种很需要大量数据的业务,如果你搭建并使用过分布式消息引擎就明白stream data会带来的麻烦)
During prediction, the key performance aspect is speed of response
the magic of ML comes with quantity, not complexity
Unstructed data accounts for 90% of enterprise data(like email, video footage, texts, reports, catalog, events)
pre-trained models make processing unstructed data easier
所以要学会使用别的公司、机构提供的现成模型来做数据处理(一方面给GCP的ML API打广告,另外一方面告诫希望实践ML的工程师,不要强求自己去实现ML中的各个环节)
business can benefit from ML?
1,Infuse your apps with ML, simplify user input adapt to user
2,fine-tune your business, streamline your business processes
3,Anticipate users' need creatively fulfill intent
How Google Does ML
Google suggests that we should pay more focus on collecting data and building infrastrucutre instead of optimizing ML algorithm
Avoid these top 10 ML pitfalls
1,ML requires just as much software infrastructure
successful ML practise needs lots of things around the algorithm like a whole software stack to serve
2,no data collected yet
there is no need to talk about ML without collecting great data or access to great data
3,assume the data is ready for use
4,keep human in loop
5,product launch for the wrong thing
6,ML optimizing for the wrong thing
7,is your ML improving things in the real world
8,using a pre-trained ML algorithm vs building your own
9,ML algorithm are trained more than once
10,trying to design your own perception or NLP algorithm
the good thing to hear: most of the values comes along the way.
as you march towards ML, you may not get there, and you will still greatly improve everything you're working on.when you get there, ML improves alomost everything it touches once you're ready.
if the process to build and use ML is hard for your company, it's likely hard for the other members of your industry.
ML and business processes
Look at 5 phases:
1, Individual contributor
2, Delegation
3, Digitization
4, Big data and Analytics
5, Analytics Machine Learning
finally, great ML systems will need humans in the loop.
and you should think about ML as a way to expand the impact or to scale the impact of your people, not as a way of complete removing them.
the more people you have in your organization, the more voices you have to say, automation is impossible
Learn how to identify the origins of bias in ML/ make models inclusive/ evaluate ML models with biases
ML and human bias
想象一张鞋子的图片,不同人会有不同想象,这就是human bias
but just because something is based on data doesn't automatically make it neutral
因为模型是人类训练的,而即便对于相同的东西,不同的人也有不同的倾向,所以human bias是需要关注的一个问题
a common way that we evaluate performance in ML is by using a confusion matrix.
我们评估ML模型性能的一个方式就是使用confusion matrix
statistical measurement and acceptable tradeoff
we should focus on the False Positive Rate(labels says something doesn't exist but Model predicts it)
我们更应关心上图中的False Positive Rate
Rate = False Negatives / False Negatives + True Positives
False positive rate (α) = type I error = 1 − specificity = FP / (FP + TN) = 180 / (180 + 1820) = 9%
False negative rate (β) = type II error = 1 − sensitivity = FN / (TP + FN) = 10 / (20 + 10) = 33%
True positive rate (TPR), Recall, Sensitivity, probability of detection = Σ True positive/Σ Condition positive
Accuracy (ACC) = Σ True positive + Σ True negative/Σ Total population
Precision = Σ True positive/Σ Predicted condition positive
这个产品就是类似google docs的在线编辑器