美文网首页
简书文章下载

简书文章下载

作者: 区块东西 | 来源:发表于2023-10-29 09:57 被阅读0次

    title: 简书文章下载
    date: '2023-10-30 09:00'
    tags: ['code', '工具']
    draft: false
    summary: 使用python将简书自己的文章下载,并保存为markdown格式


    简书文章下载

    之前的笔记都写在简书上,因为简单方便,在线markdown编辑很舒服,不用考虑notion之类的要考虑“科学”问题,
    随时登录随时写,但发布文章存在限制,于是自己搭个简易博客,迁移到博客里。

    自己搭博客又不想买昂贵的vps,所以选择了可以vercel部署的框架,这类框架大部分都是静态页面。如果后期想扩展更多功能,静态页面功能受限,
    vercel自家的netxjs框架可以很方便的部署,而且支持ssr+ssg。但是基于nextjs的博客框架很少又不想自己写,最终找到了这个框架,感觉不错,
    就迁移过来了。

    https://www.www.animeirl.top

    代码如下:

    import os
    import requests
    import datetime
    import threadpool
    
    headers = {
          'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Maxthon/5.3.8.2000 Chrome/78.0.3904.108 Safari/537.36'
      }
    
    cookies = ''
    cookies_dict = {cook.split('=')[0]: cook.split('=')[1] for cook in cookies.split('; ')}
    
    def create_file(file_path, content):
        if os.path.exists(file_path):
            # 检查文件大小和修改日期是否与现有文件相同
            if os.path.getsize(file_path) == len(content):
                print("文件已存在,不需要覆盖。", file_path)
                return
        with open(file_path, "w", encoding="utf-8", newline="\n") as f:
            f.write(content)
        print("文件已创建并写入。", file_path)
    
    def saveMarkdown(noteName:str, title:str, noteId:str):
      try:
        os.mkdir(noteName)
        print("创建目录:", noteName)
      except FileExistsError:
        print("目录已存在", noteName)
      res = requests.get("https://www.jianshu.com/author/notes/"+noteId+"/content", headers=headers, cookies=cookies_dict)
      if not res.ok:
        print(f"获取文件内容出错:{res.status_code}")
        exit(-1)
      note = res.json()
      content = note['content']
      create_file(noteName+"/"+title+".md", content)
    
    def saveMarkdownT(arg:list):
       saveMarkdown(arg[0], arg[1], arg[2])
    
    def getJianshuTaskArgs():
      url = 'https://www.jianshu.com/author/notebooks'
    
    
      args = []
    
      res = requests.get(url, headers=headers, cookies=cookies_dict)
      if not res.ok:
        print(f"{res.status_code}")
        exit(-1)
      # print(type(res.json()),res.json(), len(res.json()))
      for v in res.json():
        print(v['name'], v['id'])
        notebookName = v['name']
        notebookId = v['id']
        res = requests.get("https://www.jianshu.com/author/notebooks/"+str(notebookId)+"/notes", headers=headers, cookies=cookies_dict)
        if not res.ok:
          print(f"{res.status_code}")
          exit(-1)
    
        notesInfo = res.json()
        # print(type(notesInfo),notesInfo)
    
        for noteInfo in notesInfo:
          title = noteInfo['title']
          timestamp = noteInfo['content_updated_at']
          dt_object = datetime.datetime.utcfromtimestamp(timestamp)
          local_dt_object = \
          dt_object.replace(tzinfo=datetime.timezone.utc).astimezone(tz=datetime.timezone(datetime.timedelta(hours=8)))
          date = dt_object.date().__str__()
          time = dt_object.time().__str__()
          print(title, date, time)
    
          filePath = notebookName+"/"+title+".md"
          print(filePath)
          # saveMarkdown(notebookName, title, str(noteInfo['id']))
          args.append([notebookName, title, str(noteInfo['id']), date, time])
      return args
      # return args
    
    def downloanJianshuNotes():
      args = getJianshuTaskArgs()
      pool = threadpool.ThreadPool(20)
      tasks = threadpool.makeRequests(saveMarkdownT, args)
      [pool.putRequest(task) for task in tasks]
      pool.wait()
      print("线程池执行完成")
    

    将自己的cookie填入即可食用。

    相关文章

      网友评论

          本文标题:简书文章下载

          本文链接:https://www.haomeiwen.com/subject/ewfvidtx.html