Introduction
1 Concepts
- Business Analytics(BA) is the practice and art of bringing quantitative data to bear on decision-making.
- Business Intelligence(BI) refers to data visualizaiton and reporting for understanding "what happended and what is happening".
- Nowadays, the term BA overtook the earlier term BI to denote advanced analytics, and BI is used to refer to data visualization and reporting.
如何理解overfitting:
In engineering terms, the model is fitting the noise, not just the signal.
2 本书概念定义/术语约定
In this book, we use the term maching learning to refer to algorithms that learn directly from data, especially local patterns, often in layered or interative fashion.
We use statistical models to refer to methods that apply global structure to the data.
A simple example is a linear regression model(statistical) vs. a k-nearest-neightbors algorithm(machine learning). A given record would be treated by linear regression in accord with an overall linear equation that applies to all the records. In k-nearest neightbors, that record would be classified in accord with the values of a small number of nearby records.
Lastly, many practitioners, particularly those from the IT and computer science communities, use the term machine learning to refer to all the methods discussed in this book.
3 Big Data
The challenge Big Data presents is often characterized by the four V's -- volume, velocity, variety and veracity.
- Volumn refers to the amount of data.
- Velocity refers to the flow rate -- the speed at which it is being generated and changed.
- Variety refers to the different types of data being generated(concurrency, dates, numbers, text, etc)
- Veracity refers to the fact that data is being generated by organic distributed process(e.g. millions of people signing up for serviecs or free downloads) and not subject to the controls or quality checks that apply to data collected for a study.
网友评论