美文网首页工作生活
Python 实战 | srt字幕文件转换txt文本文件

Python 实战 | srt字幕文件转换txt文本文件

作者: Biosciman | 来源:发表于2019-07-02 20:52 被阅读0次

    用外语观看电影或电视节目对于学习该语言非常有用,通常可以在字幕网站上找到字幕文件(srt文件)。但是,这些文件不容易阅读,因为它们标有时间戳。因此,本代码旨在将srt字幕文件转换txt文本文件。

    使用文本阅读器打开的srt字幕文件是这样的:

    172
    00:11:20,639 --> 00:11:24,393
    To try to quote Ellen Yindel's
    outstanding record in the time I have...

    173
    00:11:24,560 --> 00:11:26,103
    would do her a disservice.

    174
    00:11:26,270 --> 00:11:29,190
    Instead I offer the new commissioner
    my sympathy...

    175
    00:11:29,357 --> 00:11:32,526
    knowing the impossible job
    she is about to face.

    但是我们想看到的是这样的文本文件:

    To try to quote Ellen Yindel's outstanding record in the time I have...
    would do her a disservice.
    Instead I offer the new commissioner my sympathy...
    knowing the impossible job she is about to face.

    使用以下代码可以实现srt字幕文件转换为txt文本文件

    Python代码如下:

    a = 1
    b = 2
    c = 3
    state = a
    text = ''
    with open('test1.srt', 'r', utf-8-sig) as f: #打开srt字幕文件,并去掉文件开头的\ufeff
       for line in f.readlines(): #遍历srt字幕文件
           if state == a: #跳过第一行
               state = b
           elif state == b: #跳过第二行
               state = c
           elif state == c: #读取第三行字幕文本
               if len(line.strip()) !=0:
                   text += ' ' + line.strip() #将同一时间段的字幕文本拼接
                   state = c
               elif len(line.strip()) ==0:
                   with open('test1.txt', 'a') as fa: #写入txt文本文件中
                       fa.write(text)
                       text = '\n'
                       state = a
    

    参考资料

    1. Simple Python Script for Extracting Text from an SRT File
    2. srt2txt/srt2txt.py
    3. 去除 \ufeff
    4. python文件读写

    相关文章

      网友评论

        本文标题:Python 实战 | srt字幕文件转换txt文本文件

        本文链接:https://www.haomeiwen.com/subject/kfdzcctx.html