[Python] 算法心得—字符串篇

作者: 敲代码的密斯想 | 来源:发表于2018-06-15 11:04 被阅读23次

[Python] 算法心得—字符串篇
简单对称加密算法
日常笔记
连载02-Python列表的方法有什么，附实际案例源码【乐搏Te
K近邻(KNN)算法详解及Python实现
字符串匹配算法总结
AI 学习之路——轻松初探 Python 篇（三）
[Python] 算法心得—排序
leetcode算法类型（Python）：9. 回文数，难度：简
KMP算法文章合集

六个算法问题，使用python学习算法。

1.1 旋转字符串

给定字符串，要求把字符串前面若干个字符移动到字符串尾部。要求时间复杂度O(n)，空间复杂度O(1)。
如：'abcdefg'前面的2个字符'a'和'b'移到字符串尾部，就是'cdefgab'。

解法一：暴力移位法

def left_shift_one(s):
    slist = list(s)
    temp = slist[0]
    for i in range(1, len(s)):
        slist[i - 1] = slist[i]
    slist[len(s) - 1] = temp
    s = ''.join(slist)
    return s


def left_rotate_str(s, n):
    while n > 0:
        s = left_shift_one(s)
        n -= 1
    return s

时间复杂度O(n*len(s))，空间复杂度O(1)。

解法二：三步反转法

例如给定字符串'abcdefg'，若要让ab翻转到最后，只需三步操作：
step 1. 字符串分为两部分，即：X=ab, Y=cdefg；
step 2. 将X、Y均反转，得到：X=ba, Y=gfedc；
step 3. 将step2得到的结果反转，即将'bagfedc'反转，得到：cdefgab，即为想要的结果。

def reserve_str(s):
    s_list = list(s)
    start, end = 0, len(s)-1
    while start < end:
        s_list[start], s_list[end] = s_list[end], s_list[start]
        start += 1
        end -= 1
    return ''.join(s_list)


def left_rotate_str(s, n):
    x, y = list(s[:n]), list(s[n:])
    x = reserve_str(x)
    y = reserve_str(y)
    res = reserve_str(x+y)
    return ''.join(res)

时间复杂度O(n)，空间复杂度O(1)。

1.2 链表反转

给出一个链表和一个数k，比如：链表为1->2->3->4->5->6, k=2，翻转后为2->1->6->5->4->3；若k=3，翻转后3->2->1->6->5->4。

链表反转的方式

解法一：循环迭代

基本思想就是将每个节点的向前、向后指向变换。

def reverse_loop(head):
    if not head or not head.next:
        return head
    pre = None
    while head:
        next = head.next # 缓存当前节点的向后指针，下次迭代用
        head.next = pre # 把当前节点的向前指针作为当前节点的向后指针
        pre = head # 作为下次迭代(当前节点)的向前指针
        head = next # 作为下次迭代的(当前)节点
    return pre

解法二：递归

递归的思想就是：

head.next = None
head.next.next = head.next
head.next.next.next = head.next.next
...

代码实现：

def reverse_recursion(head):
    if not head or not head.next:
       return head
    new_head = reverse_recursion(head.next)
    
    head.next.next = head
    head.next = None
    
    return new_head

解法

def divide_linked_list(head, k):
    if head is None or head.next is None:
        return head
    p1 = head
    p2 = head.next
    while k >0:
        tmp = p2.next
        p2.next = p1
        p1 = p2
        p2 = tmp
        k -= 1
    p3 = reverse_recursion(p2)   # 后部分翻转
    head.data = p3.data
    head.next = p3.next
    return p1

2.1 字符串包含

给定一长一短的两个字符串A，B，假设A长B短，要求判断B是否包含在字符串A中。

解法一：暴力法

def string_contain(string_a, string_b):
    for b in string_b:
        if b not in string_a:
            return False
    return True

假设A字符串长n，B字符串长m，那么这种暴力解需要O(n*m)次操作。

解法二：线性扫描

这种方法首先需要将A, B字符串排序，随后同时对这两个字符串依次轮询。

def string_contain(string_a, string_b):
    list_a = sorted(string_a)
    list_b = sorted(string_b)
    pa, pb = 0, 0
    while pb < len(list_b):
        while pa < len(list_a) and (list_a[pa] < list_b[pb]):
            pa += 1
        if (pa >= len(list_a)) or (list_a[pa] > list_b[pb]):
            return False
        pb += 1
    return True

排序需要O(mlogm) + O(nlogn)次操作，之后的线性扫描则只需要O(m+n)次操作。

解法三：素数对应

将每个字母与一个素数相对应。遍历长字符串A，将每个字母对应的素数相乘得到一个乘积。再遍历短字符串B，将每个字符对应的素数除A的乘积，若不能整除，说明B中对应的字符不在A字符串中。

from functools import reduce

def string_contain(string_a, string_b):
    char_map = {'a': 2, 'b': 3, 'c': 5, 'd': 7, 'e': 11, 'f': 13, 'g': 17, 'h': 19, 'i': 23, 'j': 29, 'k': 31, 'l': 37,
                'm': 41, 'n': 43, 'o': 47, 'p': 53, 'q': 59, 'r': 61, 's': 67, 't': 71, 'u': 73, 'v': 79, 'w': 83,
                'x': 89, 'y': 97, 'z': 101}
    pd_a = reduce(lambda x, y: x * y, [char_map[x] for x in string_a])
    for b in list(string_b):
        if pd_a % char_map[b] != 0:
            return False
    return True

需要O(m+n)次操作，但要考虑素数相乘的结果有可能导致正数溢出。

3.1 回文字符串

判断一个字符串正着读和倒着读是否一样，比如：'abcba'即为回文字符串。

解法一：头尾指针向中间扫描

同时从字符串的头尾向中间扫描，知道头尾相遇。如果所有字符都一样，那么这就是一个回文字符串。

def is_palindrome(s):
    s = list(s)
    if len(s)>0:
        start, end = 0, len(s)-1
        while start<end:
            if s[start] != s[end]:
                return False
            start += 1
            end -= 1
        return True
    return True

解法二：中间向头尾扫描

也可以从字符串的中间向头尾扫描，如果所有字符都一样，就认为这是个回文字符串。

def is_palindrome(s):
    s = list(s)
    if len(s) > 0:
        front = len(s) // 2 - 1 if len(s) % 2 == 0 else len(s) // 2
        back = len(s) // 2
        while front >= 0 and back <= len(s)-1:
            if s[front] != s[back]:
                return False
            front -= 1
            back += 1
        return True
    return True

3.2 回文链表

如何判断一条单向链表是不是'回文'呢？
对于单项链表也可以像字符串那样，找到其中点，然后将后半翻转（见 1.1.2 链表反转），再同时从链表头部和中间开始遍历比较。

def is_palindrome(self, head):
    fast = slow = head
    while fast and fast.next:
        fast = fast.next.next
        slow = slow.next
    node = None
    while slow:
        current = slow
        slow = slow.next
        current.next = node
        node = current
    while node:
        if node.data != head.data:
            return False
        node = node.next
        head = head.next
    return True

4.1 最长回文子串

给定一个字符串，求它的最长回文子串的长度。

解法一：遍历所有元素

首先想到的是遍历字符串中的所有元素，用两个指针以某个元素为中心向左右扩展，记录并更新得到的最长的回文长度。在这里需要注意，奇数长度与偶数长度的回文字符串的判断稍有区别，需要分开处理。

def _odd(i, max, s, id):
    j, temp = 0, 0
    while j <= i and i + j < len(s):
        if s[i - j] != s[i + j]:
            break
        temp = j * 2 + 1
        j += 1
    if temp > max:
        max = temp
        id = i
    return max, id

def _even(i, max, s, id):
    j, temp = 0, 0
    while j <= i and i + j + 1 < len(s):
        if s[i - j] != s[i + j + 1]:
            break
        temp = j * 2 + 2
        j += 1
    if temp > max:
        max = temp
        id = i
    return max, id

def longest_palindrome(s):
    if len(s) == 0:
        return 0
    max, id = -1, 0
    for i in range(len(s)):
        odd_res = _odd(i, max, s, id)
        max, id = odd_res[0], odd_res[1]
        even_res = _even(i, max, s, id)
        max, id = even_res[0], even_res[1]
    return s[id - max // 2:id + (max + 1) // 2] if max % 2 != 0 else s[id - max // 2+1:id + max // 2 + 1]

解法二：统一处理奇偶

上述解法中，我们需要特别区分奇偶回文字符串的情况。那么有没有方法可以将所有情况都转换为奇数处理呢？

可以通过在每个字符的两边都插入一个特殊的符号，将所有可能的奇数或偶数长度的回文子串都转换成了奇数长度。比如 abba 变成 #a#b#b#a#，aba变成 #a#b#a#。

此外，还可以在首位各加上特殊符号，这样就不用特殊处理越界问题，比如^#a#b#a#$。

def longest_palindrome(s):
    s_j = '#'.join('^{}$'.format(s))
    if len(s_j) == 3:
        return 0
    max, id = -1, 0
    for i in range(len(s_j)):
        j = 0
        temp = 0
        while j <= i and i + j < len(s_j):
            if s_j[i - j] != s_j[i + j]:
                break
            temp = j * 2
            j += 1
        if temp > max:
            max = temp
            id = i // 2 - 1
    return s[id - max // 4:id + (max + 1) // 4 + 1] if max % 4 != 0 else s[id - max // 4 + 1:id + max // 4 + 1]

解法三： Manacher算法

下面将用到神奇的Manacher算法，使求最长回文子串的时间复杂度为线性O(N)。

Manacher算法解析

def longest_palindrome(s):
    # Transform S into T.
    # For example, S = "abba", T = "^#a#b#b#a#$".
    # ^ and $ signs are sentinels appended to each end to avoid bounds checking
    s_j = '#'.join('^{}$'.format(s))
    n = len(s_j)
    P = [0] * n
    id = mx = 0
    for i in range(1, n - 1):
        P[i] = (mx > i) and min(mx - i, P[2 * id - i])  # equals to i' = id - (i-id)
        # Attempt to expand palindrome centered at i
        while s_j[i + 1 + P[i]] == s_j[i - 1 - P[i]]:
            P[i] += 1

        # If palindrome centered at i expand past mx,
        # adjust center based on expanded palindrome.
        if i + P[i] > mx:
            id, mx = i, i + P[i]

    # Find the maximum element in P.
    maxLen, centerIndex = max((n, i) for i, n in enumerate(P))
    return s[(centerIndex - maxLen) // 2: (centerIndex + maxLen) // 2]

5 交替字符串

输入三个字符串s1、s2和s3，判断第三个字符串s3是否由前两个字符串s1和s2交错而成，即不改变s1和s2中各个字符原有的相对顺序，例如当s1 = “aabcc”，s2 = “dbbca”，s3 = “aadbbcbcac”时，则输出True，但如果s3=“accabdbbca”，则输出False。

这道题我们考虑用动态规划的方法，定义dp[i][j]为s1的前i和字串和s2和前j个字串是否构成交替字符串。

可以分两种情况讨论：

1. 如果s1当前字符(即s1[i-1])等于s3当前字符(即s3[i+j-1])，且dp[i-1][j]为真，那么可以取s1当前字符而忽略s2的情况，dp[i][j]返回真;
1. 如果s2当前字符等于s3当前字符，并且dp[i][j-1]为真，那么可以取s2而忽略s1的情况，dp[i][j]返回真，其它情况，dp[i][j]返回假

状态转移为：
dp[i][j] = (dp[i-1][j]&& s1[i-1]==s3[i+j-1]) || dp[i][j-1] && s2[j-1]==s3[i+j-1]

代码实现如下：

def is_inter_leave(s1, s2, s3):
    l_1, l_2, l_3 = len(s1), len(s2), len(s3)
    if l_1 + l_2 != l_3:
        return False
    dp = [[False for _ in range(l_1 + 1)] for _ in range(l_2 + 1)]
    dp[0][0] = True
    for i in range(1, l_1 + 1):
        dp[0][i] = dp[0][i - 1] and (s3[i - 1] == s1[i - 1])
    for i in range(1, l_2 + 1):
        dp[i][0] = dp[i - 1][0] and (s3[i - 1] == s2[i - 1])
    for i in range(1, l_2 + 1):
        for j in range(1, l_1 + 1):
            if (dp[i][j - 1] and s1[j - 1] == s3[i + j - 1]) or (dp[i - 1][j] and s2[i - 1] == s3[i + j - 1]):
                dp[i][j] = True
    return dp[l_2][l_1]

6 字符串编辑距离

给定一个源串和目标串，能够对源串进行如下操作：

在给定位置上插入一个字符
替换任意字符
删除任意字符
写一个程序，返回最小操作数，使得对源串进行这些操作后等于目标串，源串和目标串的长度都小于2000。

这题我们依旧可以采用动态规划的方法。例如字符串'ALGORITHM'变成'ALTRUISTIC'，那么把相关字符各自对齐后，如下图：

把源串S[0...i]='ALGORITHM'编辑成下面的目标串T[0...j]='ALTRUISTIC'，对于字符s[i]和t[j]的对应，有以下四种情况：

目标串空白，即S + 字符X，T + 空白，意味着源串要删字符，则操作数为dp[i-1, j] + 1
源串空白，即S + 空白，T + 字符，意味着源串需要插入字符，则操作数为dp[i, j-1] + 1
源串的字符和目标串的字符不一样，即S + 字符X，T + 字符Y，S变成T，意味着源串要修改字符，操作数为dp[i - 1, j - 1]
源串的字符和目标串的字符一样，不用修改，操作数为dp[i - 1, j - 1] + 1

代码如下：

def edit_distance(source, target):
    len_s = len(source)
    len_t = len(target)
    dp = [[0 for _ in range(len_t+1)] for _ in range(len_s+1)]
    for i in range(len_s+1):
        dp[i][0] = i
    for j in range(len_t+1):
        dp[0][j] = j
    for i in range(1, len_s+1):
        for j in range(1, len_t+1):
            if source[i-1] == target[j-1]:
                dp[i][j] = dp[i-1][j-1]
            else:
                dp[i][j] = 1+ min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1])
    return dp[len_s][len_t]

[Python] 算法心得—字符串篇
1.1.1 旋转字符串给定字符串，要求把字符串前面若干个字符移动到字符串尾部。要求时间复杂度O(n)，空间复杂度...
简单对称加密算法
字符串对称加密算法python module this
日常笔记
字符串匹配的KMP算法和朴素算法，及其python实现 http://blog.csdn.net/chinwufo...
连载02-Python列表的方法有什么，附实际案例源码【乐搏Te
[**上一篇文章： Python字符串详细知识点讲解附代码【从0到1学Python】**](Python字符串详细...
K近邻(KNN)算法详解及Python实现
K近邻(KNN)算法详解及Python实现今天浏览网页看到一篇用Python实现K近邻(KNN)算法的详解教程，...
字符串匹配算法总结
面写过一篇字符串匹配算法，总共涉及BF算法，RK算法，BM算法，还有一种算法是KMP算法。这几种算法思想和代码我都...
AI 学习之路——轻松初探 Python 篇（三）
这是「AI 学习之路」的第 3 篇，「Python 学习」的第 3 篇 Python 字符串使用和 C 语言比较类...
[Python] 算法心得—排序
1. 冒泡排序正如其名，不断地将最大/小的数冒泡上来，其时间复杂度为O(N^2)，具体实现代码如下： 2. 选择...
leetcode算法类型（Python）：9. 回文数，难度：简
leetcode算法类型（Python）：9. 回文数，难度：简单解答：解题思路就是把其转化为字符串并与...
KMP算法文章合集
字符串的查找：朴素查找算法和KMP算法暴力匹配算法与KMP算法（串的匹配）字符串查找算法BF和KMP 字符串匹...