美文网首页
使用Python进行数据标准化

使用Python进行数据标准化

作者: 王吉林 | 来源:发表于2019-02-21 09:55 被阅读0次

    <section class="output_wrapper" ><h2 ><span >读取数据</span></h2><p >首先,加载pandas和numpy库,读取数据。</p><pre ><code class="python language-python hljs" ><span class="hljs-keyword" >import</span> pandas <span class="hljs-keyword" >as</span> pd<br /><span class="hljs-keyword" >import</span> numpy <span class="hljs-keyword" >as</span> np<br />detail = pd.read_csv(<span class="hljs-string" >'detail.csv'</span>,index_col=<span class="hljs-number" >0</span>,encoding = <span class="hljs-string" >'gbk'</span>)<span class="hljs-comment" >#中文编码</span><br /></code></pre><h2 ><span >自定义离差标准化函数</span></h2><pre ><code class="hljs lua" >def minmaxscale(data):<br />    data=(data-data.<span class="hljs-built_in" >min</span>())/(data.<span class="hljs-built_in" >max</span>()-data.<span class="hljs-built_in" >min</span>())<br />    <span class="hljs-keyword" >return</span> data<br />##对菜品订单表售价和销量做离差标准化<br />data1=minmaxscale(detail[<span class="hljs-string" >'counts'</span>])<br />data2=minmaxscale(detail [<span class="hljs-string" >'amounts'</span>])<br />data3=pd.<span class="hljs-built_in" >concat</span>([data1,data2],axis=<span class="hljs-number" >1</span>)<br /><span class="hljs-built_in" >print</span>(<span class="hljs-string" >'离差标准化之前销量和售价数据为:\n'</span>,<br />    detail<span class="hljs-string" >[['counts','amounts']]</span>.head())<br /><span class="hljs-built_in" >print</span>(<span class="hljs-string" >'离差标准化之后销量和售价数据为:\n'</span>,data3.head())<br /></code></pre><p >结果为:</p><pre ><code class="hljs css" >离差标准化之前销量和售价数据为:<br />            <span class="hljs-selector-tag" >counts</span>  <span class="hljs-selector-tag" >amounts</span><br /><span class="hljs-selector-tag" >detail_id</span>                 <br />2956            1       49<br />2958            1       48<br />2961            1       30<br />2966            1       25<br />2968            1       13<br />离差标准化之后销量和售价数据为:<br />            <span class="hljs-selector-tag" >counts</span>   <span class="hljs-selector-tag" >amounts</span><br /><span class="hljs-selector-tag" >detail_id</span>                  <br />2956          0<span class="hljs-selector-class" >.0</span>  0<span class="hljs-selector-class" >.271186</span><br />2958          0<span class="hljs-selector-class" >.0</span>  0<span class="hljs-selector-class" >.265537</span><br />2961          0<span class="hljs-selector-class" >.0</span>  0<span class="hljs-selector-class" >.163842</span><br />2966          0<span class="hljs-selector-class" >.0</span>  0<span class="hljs-selector-class" >.135593</span><br />2968          0<span class="hljs-selector-class" >.0</span>  0<span class="hljs-selector-class" >.067797</span><br /></code></pre><h2 ><span >也可以通过sklearn库中的minmax_scale函数实现</span></h2><pre ><code class="hljs coffeescript" ><span class="hljs-keyword" >from</span> sklearn <span class="hljs-keyword" >import</span> preprocessing<br />preprocessing.minmax_scale(detail[<span class="hljs-string" >'amounts'</span>])<br /></code></pre><p >结果为:</p><pre ><code class="hljs delphi" ><span class="hljs-keyword" >Out</span>[<span class="hljs-number" >141</span>]: <br /><span class="hljs-keyword" >array</span>([<span class="hljs-number" >0.27118644</span>, <span class="hljs-number" >0.26553672</span>, <span class="hljs-number" >0.16384181</span>, ..., <span class="hljs-number" >0.21468927</span>, <span class="hljs-number" >0.03389831</span>,<br />       <span class="hljs-number" >0.14689266</span>])<br /></code></pre><h2 ><span >自定义标准差标准化函数</span></h2><pre ><code class="hljs kotlin" >def StandardScaler(<span class="hljs-keyword" >data</span>):<br />    <span class="hljs-keyword" >data</span>=(<span class="hljs-keyword" >data</span>-<span class="hljs-keyword" >data</span>.mean())/<span class="hljs-keyword" >data</span>.std()<br />    <span class="hljs-keyword" >return</span> <span class="hljs-keyword" >data</span><br />##对菜品订单表售价和销量做标准化<br />data4=StandardScaler(detail[<span class="hljs-string" >'counts'</span>])<br />data5=StandardScaler(detail[<span class="hljs-string" >'amounts'</span>])<br />data6=pd.concat([data4,data5],axis=<span class="hljs-number" >1</span>)<br />print(<span class="hljs-string" >'标准差标准化之前销量和售价数据为:\n'</span>,<br />    detail[[<span class="hljs-string" >'counts'</span>,<span class="hljs-string" >'amounts'</span>]].head())<br />print(<span class="hljs-string" >'标准差标准化之后销量和售价数据为:\n'</span>,data6.head())<br /></code></pre><p >结果为:</p><pre ><code class="hljs css" >标准差标准化之前销量和售价数据为:<br />            <span class="hljs-selector-tag" >counts</span>  <span class="hljs-selector-tag" >amounts</span><br /><span class="hljs-selector-tag" >detail_id</span>                 <br />2956            1       49<br />2958            1       48<br />2961            1       30<br />2966            1       25<br />2968            1       13<br />标准差标准化之后销量和售价数据为:<br />              <span class="hljs-selector-tag" >counts</span>   <span class="hljs-selector-tag" >amounts</span><br /><span class="hljs-selector-tag" >detail_id</span>                    <br />2956      <span class="hljs-selector-tag" >-0</span><span class="hljs-selector-class" >.177571</span>  0<span class="hljs-selector-class" >.116671</span><br />2958      <span class="hljs-selector-tag" >-0</span><span class="hljs-selector-class" >.177571</span>  0<span class="hljs-selector-class" >.088751</span><br />2961      <span class="hljs-selector-tag" >-0</span><span class="hljs-selector-class" >.177571</span> <span class="hljs-selector-tag" >-0</span><span class="hljs-selector-class" >.413826</span><br />2966      <span class="hljs-selector-tag" >-0</span><span class="hljs-selector-class" >.177571</span> <span class="hljs-selector-tag" >-0</span><span class="hljs-selector-class" >.553431</span><br />2968      <span class="hljs-selector-tag" >-0</span><span class="hljs-selector-class" >.177571</span> <span class="hljs-selector-tag" >-0</span><span class="hljs-selector-class" >.888482</span><br /></code></pre><h2 ><span >也可以通过sklearn库中的scale函数实现</span></h2><pre ><code class="hljs coffeescript" ><span class="hljs-keyword" >from</span> sklearn <span class="hljs-keyword" >import</span> preprocessing<br />preprocessing.scale(detail[<span class="hljs-string" >'amounts'</span>])<br /></code></pre><p >结果为:</p><pre ><code class="hljs delphi" ><span class="hljs-keyword" >Out</span>[<span class="hljs-number" >143</span>]: <br /><span class="hljs-keyword" >array</span>([ <span class="hljs-number" >0.11667727</span>,  <span class="hljs-number" >0.08875496</span>, -<span class="hljs-number" >0.41384669</span>, ..., -<span class="hljs-number" >0.16254587</span>,<br />       -<span class="hljs-number" >1.05605991</span>, -<span class="hljs-number" >0.49761363</span>])<br /></code></pre></section><p><br /></p>

    相关文章

      网友评论

          本文标题:使用Python进行数据标准化

          本文链接:https://www.haomeiwen.com/subject/kxnlyqtx.html