在本文章,我们主要给大家介绍一些顶级的自动化EDA工具,并且通过实例来展示具体效果。
代码链接:https://www.kaggle.com/andreshg/automatic-eda-libraries-comparisson/notebook
📊 AutoViz 📚
AutoViz在众多免费软件Pythonic Rapid EDA Automation工具中脱颖而出,以非常快速的方式运行,这比其紧密的免费软件竞争对手SweetViz或Pandas Profiling更好
安装方式:
!pip install git+git://github.com/AutoViML/AutoViz.git
!pip install xlrd
from autoviz.AutoViz_Class import AutoViz_Class
AV = AutoViz_Class()
dftc = AV.AutoViz(
filename='',
sep='' ,
depVar='target',
dfte=df,
header=0,
verbose=1,
lowess=False,
chart_format='png',
max_rows_analyzed=300000,
max_cols_analyzed=30
)
📊 Pandas Profiling 📚
from pandas_profiling import ProfileReport
df = pd.read_csv('/kaggle/input/titanic/train.csv')
report = ProfileReport(df)
# Start of Pandas Profiling process
start_time = dt.datetime.now()
print("Started at ", start_time)
report
📊 SweetViz 📚
!pip install sweetviz
import sweetviz as sv
df = pd.read_csv('/kaggle/input/credit-card-customers/BankChurners.csv').head(2000)
advert_report = sv.analyze([df, 'Data'])
advert_report.show_html()
print('SweetViz finished!!')
finish_time = dt.datetime.now()
print("Finished at ", finish_time)
elapsed = finish_time - start_time
print("Elapsed time: ", elapsed)
📊 D-Tale 📚
安装
!pip install dtale
import dtale
dtale.show(df)
官方链接:https://github.com/man-group/dtale
📊 Dataprep 📚
!pip install -U dataprep
实例
from dataprep.eda import plot, plot_correlation
plot(df)
plot_correlation(df)
plot(df, "Customer_Age")
plot(df, "Customer_Age", "Gender")
参考链接
- Pandas Profiling GitHub - https://github.com/pandas-profiling/pandas-profiling
- Dan Roth, AutoViz: A New Tool for Automated Visualization - https://towardsdatascience.com/autoviz-a-new-tool-for-automated-visualization-ec9c1744a6ad
- George Vyshnya, PROs and CONs of Rapid EDA Tools - https://medium.com/sbc-group-blog/pros-and-cons-of-rapid-eda-tools-e1ccd159ab07
- SweetViz - https://towardsdatascience.com/sweetviz-automated-eda-in-python-a97e4cabacde
- DataPrep - https://sfu-db.github.io/dataprep/user_guide/eda/plot.html
网友评论