文件格式转换UTF-8

作者: 罗蓁蓁 | 来源:发表于2020-05-01 12:30 被阅读0次

    很多时候,需要把文件格式转换成标准的utf-8,但往往在很多时候很多文件都是ASCII或者gbk2312,这很容易产生乱码!

    如果手动转码,无疑是非常浪费时间的事情,因此,我也是因为工作的需要,烦其手工,而网上也无好用的工具,因此,用Python写了一个脚本,实现了文件格式转换UTF-8的需求。

    代码如下:

    #!/usr/bin/env python
    #-*- coding: utf-8 -*-
    #modify the file code format
    #
    ##############################################################################
    
    import os
    import shutil
    import codecs
    import chardet
    import sys
    reload (sys)
    sys.setdefaultencoding('utf-8')
    
    #judge the file code format
    def check_file(filePath):
        
        f = open(filePath,'rb')
        data = f.read()
        f.close()
        dict_code = chardet.detect(data)
        code_file =  dict_code['encoding']
        print (code_file)
        return code_file
    
    #read file flow as unicode
    def read_file(filePath,encoding = "UTF-8"):
    
        f=codecs.open(filePath,"rb",encoding)
        print 'read:   '+filePath
        data=f.read()
        f.close()
        return data
    
    #specify to write the file format
    def write_file(filePath,uuu,encoding = 'UTF-8'):
    
        f = codecs.open(filePath,'wb',encoding)
        print 'write:   '+filePath
        f.write(uuu)
        f.close()
    
    #if the file format is utf-8 with bom,modify it to no bom.
    def utf_nobom(dst):
        f = open(dst,'r')
        data = f.read()
        f.close()
        f = open(dst,'w')
        if data.startswith(codecs.BOM_UTF8):
            data = data[len(codecs.BOM_UTF8):]
        f.write(data)
        f.close()
    
    
    #copy the file with directory,and search all of the file's with path
    def search_file(src,dst):
        file_list = []
        if src is None:
            raise Exception('source_path is None')
        #if dst is not None:
        #    raise Exception('dest_path has existed')
        print src
        print dst
        shutil.copytree(src,dst)
        for dirpath,dirnames,filenames in os.walk(dst):
            for name in filenames:
                file_list.append(dirpath + '\\' + name)
    
        return file_list
    
    
    
    #the main function,call other programs to achive the point of 
    #modify the file format to utf-8 with NO BOM.
    def other_2_utf8(src,dst):
        print (src)
        print (dst)
        file_list = search_file(src,dst)
        for inode in file_list:
            code_file = check_file(inode)
            data = read_file(inode,code_file)
            write_file(inode,data,'UTF-8')
            utf_nobom(inode)
        else:
            print ('transform is ok')
    

    注意:

    该脚本为Python2编写,如果想使用Python3,需要微调,很简单,自行操作。

    小尾巴

    出差必备:
    买火车票、高铁票、机票,订酒店都打9折的出行工具TRIP,点击注册

    优惠购物:
    你还在傻傻的原价淘宝吗?来这里领取内部优惠券,折扣力度非常大!点击注册,注册需要邀请码UWD9Q9E。

    相关文章

      网友评论

        本文标题:文件格式转换UTF-8

        本文链接:https://www.haomeiwen.com/subject/fbafghtx.html