美文网首页
将 Html 表格转换为 Markdown

将 Html 表格转换为 Markdown

作者: 李云龙_ | 来源:发表于2018-03-30 17:28 被阅读1126次

方法一:推荐

直接在 Google 浏览器安装 [拷贝为 Markdown] 的插件就行了

拷贝为markdown.png

方法二:不推荐

看了网上的方法,没一个能用的,索性自己写了一个,Python小菜鸟

1.如何使用

1.1 安装 Python 环境

官网:http://www.python.org/download/

1.2 F12 复制指定页面表格的 table 节点的内容
1.3 替换变量tabStr 下的内容
1.4 运行 generate.py

记得将下载的 python.exe 添加到环境变量 (windows)

2.运行效果

QQ截图20180330172742.png

3.generate.py源码

import os

tabStr = '''
<table>
  <thead>
    <tr>
      <th>事件</th>
      <th style="text-align: center">手指数量</th>
      <th>编号变化</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>一个手指按下(命名为A)</td>
      <td style="text-align: center">1</td>
      <td>A手指的编号为0,id为0</td>
    </tr>
    <tr>
      <td>一个手指按下(命名为B)</td>
      <td style="text-align: center">2</td>
      <td>B手指的编号为1,id为1</td>
    </tr>
    <tr>
      <td>A手指抬起</td>
      <td style="text-align: center">1</td>
      <td>B手指编号变更为0,id不变为1</td>
    </tr>
    <tr>
      <td>一个手指按下(命名为C)</td>
      <td style="text-align: center">2</td>
      <td>C手指编号为0,id为0,B手指编号为1,id为1</td>
    </tr>
  </tbody>
</table>
'''

#
# | Tables        | Are           | Cool  |
# | ------------- |:-------------:| -----:|
# | col 3 is      | right-aligned | $1600 |
# | col 2 is      | centered      |   $12 |
# | zebra stripes | are neat      |    $1 |

divStr = ":-------------:"
maxTdSpace = len(divStr)

#tdStr:td的内容,并且判断中文的个数,因为一个中文和一个字符的len是相同的
def getTdRemainSpaceCount(tdStr):
    charCount = len(tdStr)
    hanziCharCount = 0
    for index in range(0,charCount):
        #如果是汉字
        if( u'\u4e00' <= tdStr[index] <=u'\u9fff'):
            hanziCharCount+=1
    return maxTdSpace - len(tdStr) - hanziCharCount

def getSpaceStr(spaceCount):
    spaceStr = ""
    for i in range(0,spaceCount):
        spaceStr += " "
    return spaceStr

# 打印一行(一个 <tr>)   targetPreSplitStr 可能为 <td style="text-align: center">, targetBackSplitStr是 </th> 或 </td>
def printTabRow(str,targetPreSplitStr,targetBackSplitStr):
    rawCount = str.count(targetBackSplitStr,0,len(str))
    splitStr = str
    for index in range(0,rawCount):
        # print("for 循环 splitStr=" + splitStr)
        backIndex = splitStr.find(targetBackSplitStr)
        preStr = splitStr[0:backIndex]
        targetPreIndex = preStr.find(targetPreSplitStr) 
        splitedStr = preStr[targetPreIndex:backIndex]
        preIndex = splitedStr.find(">") + 1
        targetStr = splitedStr[preIndex:backIndex]
        if index == 0:
            print("|",end='')

        remainPreSpaceCount = 0
        remainBackSpaceCount = 0
        tempRemainSpaceCount = getTdRemainSpaceCount(targetStr)
        if tempRemainSpaceCount %2 == 0:
            remainPreSpaceCount  = int(tempRemainSpaceCount/2)
            remainBackSpaceCount = remainPreSpaceCount
        else:
            remainPreSpaceCount =  int(tempRemainSpaceCount/2)
            remainBackSpaceCount = remainPreSpaceCount + 1

        preSpaceStr = getSpaceStr(remainPreSpaceCount)
        backSpaceStr = getSpaceStr(remainBackSpaceCount)
        #将 strong 标签替换掉
        targetStr = targetStr.replace("<strong>","**")
        targetStr = targetStr.replace("</strong>","**")
        print( preSpaceStr + targetStr + backSpaceStr + "|",end='')
        splitStr = splitStr[backIndex+len(targetBackSplitStr):len(str)]
        # print("for 循环,截取 splitStr=" + splitStr)
    print("") #如果要是 print("\n")反而会换两行
    
#divCount 是格数
def printTabDivision(divCount):
    for index in range(0,divCount):
        if index == 0:
            print("|",end='')
        print(divStr +"|",end='')
    print("")

#解析 thead 
tHeadPreIndex = tabStr.find('<thead>') + len('<thead>')
tHeadBackIndex = tabStr.find('</thead>')
tHeadStr = tabStr[tHeadPreIndex:tHeadBackIndex]
# print("tHeadStr=" + tHeadStr)
printTabRow(tHeadStr,"<th","</th>")
divCount = tHeadStr.count("</th>",0,len(tHeadStr))
printTabDivision(divCount)

#解析 tbody
tbodyPreIndex = tabStr.find('<tbody>') + len('<tbody>')
tbodyBackIndex = tabStr.find('</tbody>')
tbodyStr = tabStr[tbodyPreIndex:tbodyBackIndex]
trCount = tbodyStr.count("</tr>",0,len(tbodyStr))

tempTbodyStr = tbodyStr
for index in range(0,trCount):
    lastTrIndex = tempTbodyStr.find("</tr>") + len("</tr>")
    preTrIndex = tempTbodyStr.find("<tr>")
    targetTrStr = tempTbodyStr[preTrIndex:lastTrIndex]
    printTabRow(targetTrStr,"<td","</td>")
    tempTbodyStr = tempTbodyStr[lastTrIndex:len(tempTbodyStr)]

os.system("pause")

相关文章

网友评论

      本文标题:将 Html 表格转换为 Markdown

      本文链接:https://www.haomeiwen.com/subject/tsiicftx.html