序列比对（二十一）——中间字符串问题的算法及实现代码

作者: 生信了 | 来源:发表于2019-11-15 08:21 被阅读0次

序列比对（二十一）——中间字符串问题的算法及实现代码
序列比对(二)——Needleman-Wunsch算法之仿射罚分
全局比对代码实现
序列比对（二十）——基序发现问题的算法及实现代码
序列比对(三)——局部联配Smith-Waterman算法
打印bam文件比对后的碱基矩阵python脚本
3️⃣ 多序列比对(2):方法
序列比对原理
序列比对（十九）——基序发现和中间字符串问题
R语言实现动态规划

原创：hxj7

前文介绍了基序发现问题和中间字符串问题。本文给出了中间字符串的算法和实现代码。

中间字符串问题的简单算法及伪代码

前文《序列比对（19）基序发现和中间字符串问题》介绍了基序发现问题和中间字符串问题；《序列比对（20）基序发现问题的算法及实现代码》给出了基序问题的算法和实现代码。本文将介绍中间字符串问题的算法，并给出实现代码。

简单回顾一下，中间字符串问题其实就是要找到使得总距离最小的一个 $l$ -元组。即
$\underset{v}{\mathrm{argmin}} \, \underset{\boldsymbol{\vec{s}}}{\min} \, d_H(v,\boldsymbol{\vec{s}})$

或者：
$\underset{v}{\mathrm{argmin}} \, TotalDistance(v,DNA)$

由于要遍历所有可能的起始位点，如前文《序列比对（20）基序发现问题的算法及实现代码》一样，我们采用树结构以及DFS（深度优先搜索）。其简单算法如下（伪代码）：

image

image
图1到图2引自《生物信息学算法导论》

从上面的代码可以看出，算法构建了一个深度为 $l$ 的搜索树，除了叶顶点外，每个顶点有 $4$ 个子顶点（表示 $l$ -元组的每个位置都有可能有4种碱基）。当遍历到叶顶点时，计算总距离 $TotalDistance(v,DNA)$ ，如果该总距离比目前的最小总距离还小，那么就将该总距离记为新的最小总距离，并且将此时的 $l$ -元组记录下来。

分支定界策略改进简单算法

分支定界法已经在前文《序列比对（20）基序发现问题的算法及实现代码》中介绍过。一个深度为 $i$ 的顶点，如果其距离 $TotalDistance(v,i,DNA)$ 比目前的最小总距离还大，那么以这个顶点为根的分支树就没有继续遍历的意义了。

用分支定界策略改进上述简单算法的伪代码如下：

image

图3到图4引自《生物信息学算法导论》

注意，上述顶点的分支界限是很宽松的，后续文章我们会分析更紧的界。

实验效果

我们用《生物信息学算法导论》书中的两组序列进行实验，分别用简单算法（medianStr_BF.exe)和分支定界法（medianStr_BB.exe）去计算中间字符串。
第一组序列（存储在identity.txt文件中，下面示例结果中用到）：

CGGGGCTATGCAACTGGGTCGTCACATTCCCCTTTCGATA
TTTGAGGGTGCCCAATAAATGCAACTCCAAAGCGGACAAA
GGATGCAACTGATGCCGTTTGACGACCTAAATCAACGGCC
AAGGATGCAACTCCAGGAGCGCCTTTGCTGGTTCTACCTG
AATTTTCTAAAAAGATTATAATGTCGGTCCATGCAACTTC
CTGCTGTACAACTGAGATCATGCTGCATGCAACTTTCAAC
TACATGATCTTTTGATGCAACTTGGATGAGGGAATGATGC

第二组序列（存储在mutated.txt文件中，下面示例结果中用到）：

CGGGGCTATcCAgCTGGGTCGTCACATTCCCCTTTCGATA
TTTGAGGGTGCCCAATAAggGCAACTCCAAAGCGGACAAA
GGATGgAtCTGATGCCGTTTGACGACCTAAATCAACGGCC
AAGGAaGCAACcCCAGGAGCGCCTTTGCTGGTTCTACCTG
AATTTTCTAAAAAGATTATAATGTCGGTCCtTGgAACTTC
CTGCTGTACAACTGAGATCATGCTGCATGCcAtTTTCAAC
TACATGATCTTTTGATGgcACTTGGATGAGGGAATGATGC

令人惊喜的是简单算法的效率也非常高：
（只要对基序发现问题和中间字符串问题的简单算法的运行时间做简单分析）

image

为identity.txt文件中的7条序列计算中间字符串

image

为mutated.txt文件中的7条序列计算中间字符串

分支定界法的结果如下：

image

为identity.txt文件中的7条序列计算中间字符串

image

为mutated.txt文件中的7条序列计算中间字符串

具体代码

上文及前文都假定多条序列的长度是一样的，但是实际情况并不总是如此。代码实现过程中考虑到这一点，做了改进，使得多条序列长度不一致的情况下也可以用此代码来计算中间字符串。

（公众号：生信了）

首先是简单算法

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#define MAXSEQ 1000

struct Sequence {
    char* s;
    int* a;      // 该向量存储每个字符在字符集中的序号
    int l;
};
typedef struct Sequence* Seq;

const int nalpha = 4;     // 字符种数
const char sortedAlpha[] = {'A', 'C', 'G', 'T'};    // 排过序的字符集

char* sec2time(time_t t);
void strUpper(char *s);
int search(char c);                      /* 在排序后的字符集中寻找字符c，如果找到返回序号；找不到返回-1 */
Seq* readSeq(char* filename, const int t);          /* 从文件中读取多条序列 */
Seq create(char* s, const int n, const int i);  
void destroy(Seq seq);
int nextVertex(int* a, int d, const int l);     /* 得到下一个顶点 */
int getDistance(Seq* mulSeq, const int t, int* a, const int l);        /* 计算一个l-元组和DNA的最小总距离 */
int* getStartPoints(Seq* mulSeq, const int t, int* a, const int l);    /* 获取起始位点向量 */
void findMedianStr(Seq* mulSeq, const int t, const int l);           /* 寻找中间字符串 */

int main(void) {
    time_t start, end;
    char* tstr;
    int i, t, l;
    char filename[30];
    Seq* mulSeq;
    start = time(NULL);  // 开始时间
    printf("t: ");
    scanf(" %d", &t);
    printf("l: ");
    scanf(" %d", &l);
    printf("seq_file path: ");
    scanf(" %s", filename);
    mulSeq = readSeq(filename, t);
    findMedianStr(mulSeq, t, l);
    for (i = 0; i < t; i++)
        destroy(mulSeq[i]);
    free(mulSeq);
    end = time(NULL);   // 结束时间
    tstr = sec2time(end - start);
    printf("time spent: %s\n", tstr);
    free(tstr);
    return 0;
}

char* sec2time(time_t t) {
/* 将秒数转化为时间字符串 */
    int h, m, s;
    char* tstr;
    if ((tstr = malloc(sizeof(char) * 50)) == NULL) {
        fputs("Error: out of space!\n", stderr);
        exit(8);
    }
    h = t / 3600;
    m = (t % 3600) / 60;
    s = t % 60;
    if (sprintf(tstr, "%02d:%02d:%02d", h, m, s) > 0)
        return tstr;
    else {
        sprintf(tstr, "more than %d hours", h);
        return tstr;
    }
}

void strUpper(char *s) {
    while (*s != '\0') {
        if (*s >= 'a' && *s <= 'z')
            *s -= 32;
        s++;
    }
}

int search(char c) {
/* 在排序后的字符集中寻找字符c，如果找到返回序号；找不到返回-1 */
    int i;
    int th = 5;
    int h, l, m;
    if (nalpha < th) {
        for (i = 0; i < nalpha; i++)
            if (sortedAlpha[i] == c)
                return i;
        return -1;
    } else {
        l = 0;
        h = nalpha - 1;
        while (l <= h) {
            m = (l + h) / 2;
            if (sortedAlpha[m] == c)
                return m;
            else if (sortedAlpha[m] < c)
                l = m + 1;
            else 
                h = m - 1;
        }
        return -1;
    }
}

Seq* readSeq(char* filename, const int t) {
/* 从文件中读取多条序列 */
    int i, n;
    char buf[MAXSEQ];
    FILE* fp;
    Seq* mulSeq;
    if ((fp = fopen(filename, "r")) == NULL) {
        fprintf(stderr, "Error: cannot open '%s'\n", filename);
        exit(1);
    }
    if ((mulSeq = (Seq*) malloc(sizeof(Seq) * t)) == NULL) {
        fputs("Error: no space!\n", stderr);
        exit(8);        
    }
    for (i = 0; i < t; i++) {
        if (fgets(buf, MAXSEQ, fp) == NULL) {
            fprintf(stderr, "Error: reading seq from '%s'\n", filename);
            exit(2);
        }
        n = strlen(buf);
        if (buf[n - 1] != '\n' && ! feof(fp)) {
            fprintf(stderr, "Error: line %d is too long.\n", i + 1);
            exit(3);
        }
        if (! feof(fp)) {
            buf[n - 1] = '\0';
            mulSeq[i] = create(buf, n - 1, i);
        } else {
            mulSeq[i] = create(buf, n, i);
        }
    }
    fclose(fp);
    return mulSeq;
}

Seq create(char* s, const int n, const int i) {
/* n: s的长度，不包括末尾的\0 */
    Seq seq;
    int j;
    if ((seq = (Seq) malloc(sizeof(struct Sequence))) == NULL || \
        (seq->s = (char*) malloc(sizeof(char) * (n + 1))) == NULL || \
        (seq->a = (int*) malloc(sizeof(int) * n)) == NULL) {
        fputs("Error: no space!\n", stderr);
        exit(8);
    }
    strcpy(seq->s, s);
    // 将序列中所有字符变为大写
    strUpper(seq->s);
    // 找到每个字符在字符集中的序号。如果是在后期做这个工作，那么就有很多重复，降低效率。
    for (j = 0; j < n; j++)
        if ((seq->a[j] = search(seq->s[j])) < 0) {
            fprintf(stderr, "Error: char %d of seq %d is %c, which is out of range.\n", j + 1, i + 1, seq->s[j]);
            exit(16);
        }
    seq->l = n;
    return seq;
}

void destroy(Seq seq) {
    free(seq->a);
    free(seq->s);
    free(seq);
}

int nextVertex(int* a, int d, const int l) {
/* 得到下一个顶点 */
    int i;
    if (d < l - 1) {
        a[d + 1] = 0;
        return d + 1;
    }
    for (i = l - 1; i >= 0; i--) {
        if (a[i] < nalpha - 1) {
            a[i]++;
            return i;
        }
    }
    return -1;
}

int getDistance(Seq* mulSeq, const int t, int* a, const int l) {
/* 计算一个l-元组和DNA的最小总距离 */
    int i, j, k;
    int dist, minDist, totalDist;
    for (i = 0, totalDist = 0; i < t; i++) {
        for (j = 0, minDist = l; j <= mulSeq[i]->l - l; j++) {
            for (k = 0, dist = 0; k < l; k++)
                dist += a[k] == mulSeq[i]->a[j + k] ? 0 : 1;
            if (dist < minDist)
                minDist = dist;
        }
        totalDist += minDist;
    }
    return totalDist;
}       

int* getStartPoints(Seq* mulSeq, const int t, int* a, const int l) {
/* 获取起始位点向量 */
    int* st;
    int i, j, k;
    int dist, minDist;
    if ((st = (int*) malloc(sizeof(int) * t)) == NULL) {
        fputs("Error: no space!\n", stderr);
        exit(8);
    }
    for (i = 0; i < t; i++) {
        for (j = 0, st[i] = 0, minDist = l; j <= mulSeq[i]->l - l; j++) {
            for (k = 0, dist = 0; k < l; k++)
                dist += a[k] == mulSeq[i]->a[j + k] ? 0 : 1;
            if (dist < minDist) {
                st[i] = j;
                minDist = dist;
            }
        }
    }
    return st;
}

void findMedianStr(Seq* mulSeq, const int t, const int l) {
/* 寻找中间字符串 */
    int* a;     // l-元组
    int* ma;    // 最小距离对应的l-元组
    int i, d, j;
    int dist, minDist;
    char* cs;    // consensus.
    int* st;     // start point vector
    if ((a = (int*) malloc(sizeof(int) * l)) == NULL || \
        (ma = (int*) malloc(sizeof(int) * l)) == NULL || \
        (cs = (char*) malloc(sizeof(char) * (l + 1))) == NULL) {
        fputs("Error: no space!\n", stderr);
        exit(8);
    }
    // 遍历所有可能的l-元组，寻找最小距离
    for (i = 0; i < l; i++)
        ma[i] = a[i] = 0;
    minDist = l * t;
    d = 0;
    while (d >= 0) {
        if (d < l - 1)
            d = nextVertex(a, d, l);
        else {
            dist = getDistance(mulSeq, t, a, l);
            if (dist < minDist) {  // 注意最小值对应的l-元组可能有多种，这里只保留了其中一种。其实还可以设置当minDist = 0时终止搜索。
                minDist = dist;
                for (i = 0; i < l; i++)
                    ma[i] = a[i];
            }
            d = nextVertex(a, d, l);
        }
    }
    // 获取共有序列
    for (i = 0; i < l; i++)
        cs[i] = sortedAlpha[ma[i]];
    cs[l] = '\0';
    // 获取最优起始位点向量
    st = getStartPoints(mulSeq, t, ma, l);
    // 输出结果
    printf("Min distance: %d\n", minDist);
    printf("Starting point vector: ");
    for (i = 0; i < t; i++)
        printf("%d ", st[i] + 1);
    printf("\n");
    printf("Target seq:\n");
    for (i = 0; i < t; i++) {
        printf("\t");
        for (j = st[i]; j < st[i] + l; j++)
            printf("%c", mulSeq[i]->s[j]);
        printf("\n");
    }
    printf("Consensus seq: %s\n", cs);
    // 释放内存
    free(a);
    free(ma);
    free(st);
    free(cs);
}

然后是分支定界法

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#define MAXSEQ 1000

struct Sequence {
    char* s;
    int* a;      // 该向量存储每个字符在字符集中的序号
    int l;
};
typedef struct Sequence* Seq;

const int nalpha = 4;     // 字符种数
const char sortedAlpha[] = {'A', 'C', 'G', 'T'};    // 排过序的字符集

char* sec2time(time_t t);
void strUpper(char *s);
int search(char c);                      /* 在排序后的字符集中寻找字符c，如果找到返回序号；找不到返回-1 */
Seq* readSeq(char* filename, const int t);          /* 从文件中读取多条序列 */
Seq create(char* s, const int n, const int i);  
void destroy(Seq seq);
int nextVertex(int* a, int d, const int l);     /* 得到下一个顶点 */
int byPass(int* a, int d, const int l);           /* 跳过某个分支，得到下一个顶点 */
int getDistance(Seq* mulSeq, const int t, int* a, const int l, const int d);        /* 计算一个l-元组和DNA的最小总距离 */
int* getStartPoints(Seq* mulSeq, const int t, int* a, const int l);    /* 获取起始位点向量 */
void findMedianStr(Seq* mulSeq, const int t, const int l);           /* 寻找中间字符串 */

int main(void) {
    time_t start, end;
    char* tstr;
    int i, t, l;
    char filename[30];
    Seq* mulSeq;
    start = time(NULL);  // 开始时间
    printf("t: ");
    scanf(" %d", &t);
    printf("l: ");
    scanf(" %d", &l);
    printf("seq_file path: ");
    scanf(" %s", filename);
    mulSeq = readSeq(filename, t);
    findMedianStr(mulSeq, t, l);
    for (i = 0; i < t; i++)
        destroy(mulSeq[i]);
    free(mulSeq);
    end = time(NULL);   // 结束时间
    tstr = sec2time(end - start);
    printf("time spent: %s\n", tstr);
    free(tstr);
    return 0;
}

char* sec2time(time_t t) {
/* 将秒数转化为时间字符串 */
    int h, m, s;
    char* tstr;
    if ((tstr = malloc(sizeof(char) * 50)) == NULL) {
        fputs("Error: out of space!\n", stderr);
        exit(8);
    }
    h = t / 3600;
    m = (t % 3600) / 60;
    s = t % 60;
    if (sprintf(tstr, "%02d:%02d:%02d", h, m, s) > 0)
        return tstr;
    else {
        sprintf(tstr, "more than %d hours", h);
        return tstr;
    }
}

void strUpper(char *s) {
    while (*s != '\0') {
        if (*s >= 'a' && *s <= 'z')
            *s -= 32;
        s++;
    }
}

int search(char c) {
/* 在排序后的字符集中寻找字符c，如果找到返回序号；找不到返回-1 */
    int i;
    int th = 5;
    int h, l, m;
    if (nalpha < th) {
        for (i = 0; i < nalpha; i++)
            if (sortedAlpha[i] == c)
                return i;
        return -1;
    } else {
        l = 0;
        h = nalpha - 1;
        while (l <= h) {
            m = (l + h) / 2;
            if (sortedAlpha[m] == c)
                return m;
            else if (sortedAlpha[m] < c)
                l = m + 1;
            else 
                h = m - 1;
        }
        return -1;
    }
}

Seq* readSeq(char* filename, const int t) {
/* 从文件中读取多条序列 */
    int i, n;
    char buf[MAXSEQ];
    FILE* fp;
    Seq* mulSeq;
    if ((fp = fopen(filename, "r")) == NULL) {
        fprintf(stderr, "Error: cannot open '%s'\n", filename);
        exit(1);
    }
    if ((mulSeq = (Seq*) malloc(sizeof(Seq) * t)) == NULL) {
        fputs("Error: no space!\n", stderr);
        exit(8);        
    }
    for (i = 0; i < t; i++) {
        if (fgets(buf, MAXSEQ, fp) == NULL) {
            fprintf(stderr, "Error: reading seq from '%s'\n", filename);
            exit(2);
        }
        n = strlen(buf);
        if (buf[n - 1] != '\n' && ! feof(fp)) {
            fprintf(stderr, "Error: line %d is too long.\n", i + 1);
            exit(3);
        }
        if (! feof(fp)) {
            buf[n - 1] = '\0';
            mulSeq[i] = create(buf, n - 1, i);
        } else {
            mulSeq[i] = create(buf, n, i);
        }
    }
    fclose(fp);
    return mulSeq;
}

Seq create(char* s, const int n, const int i) {
/* n: s的长度，不包括末尾的\0 */
    Seq seq;
    int j;
    if ((seq = (Seq) malloc(sizeof(struct Sequence))) == NULL || \
        (seq->s = (char*) malloc(sizeof(char) * (n + 1))) == NULL || \
        (seq->a = (int*) malloc(sizeof(int) * n)) == NULL) {
        fputs("Error: no space!\n", stderr);
        exit(8);
    }
    strcpy(seq->s, s);
    // 将序列中所有字符变为大写
    strUpper(seq->s);
    // 找到每个字符在字符集中的序号。如果是在后期做这个工作，那么就有很多重复，降低效率。
    for (j = 0; j < n; j++)
        if ((seq->a[j] = search(seq->s[j])) < 0) {
            fprintf(stderr, "Error: char %d of seq %d is %c, which is out of range.\n", j + 1, i + 1, seq->s[j]);
            exit(16);
        }
    seq->l = n;
    return seq;
}

void destroy(Seq seq) {
    free(seq->a);
    free(seq->s);
    free(seq);
}

int nextVertex(int* a, int d, const int l) {
/* 得到下一个顶点 */
    int i;
    if (d < l - 1) {
        a[d + 1] = 0;
        return d + 1;
    }
    for (i = l - 1; i >= 0; i--) {
        if (a[i] < nalpha - 1) {
            a[i]++;
            return i;
        }
    }
    return -1;
}

int byPass(int* a, int d, const int l) {
/* 跳过某个分支，得到下一个顶点 */
    int i;
    for (i = d; i >= 0; i--) {
        if (a[i] < nalpha - 1) {
            a[i]++;
            return i;
        }
    }
    return -1;
}           


int getDistance(Seq* mulSeq, const int t, int* a, const int l, const int d) {
/* 计算一个l-元组和DNA的最小总距离 */
    int i, j, k;
    int dist, minDist, totalDist;
    for (i = 0, totalDist = 0; i < t; i++) {
        for (j = 0, minDist = l; j <= mulSeq[i]->l - l; j++) {
            for (k = 0, dist = 0; k <= d; k++)
                dist += a[k] == mulSeq[i]->a[j + k] ? 0 : 1;
            if (dist < minDist)
                minDist = dist;
        }
        totalDist += minDist;
    }
    return totalDist;
}       

int* getStartPoints(Seq* mulSeq, const int t, int* a, const int l) {
/* 获取起始位点向量 */
    int* st;
    int i, j, k;
    int dist, minDist;
    if ((st = (int*) malloc(sizeof(int) * t)) == NULL) {
        fputs("Error: no space!\n", stderr);
        exit(8);
    }
    for (i = 0; i < t; i++) {
        for (j = 0, st[i] = 0, minDist = l; j <= mulSeq[i]->l - l; j++) {
            for (k = 0, dist = 0; k < l; k++)
                dist += a[k] == mulSeq[i]->a[j + k] ? 0 : 1;
            if (dist < minDist) {
                st[i] = j;
                minDist = dist;
            }
        }
    }
    return st;
}

void findMedianStr(Seq* mulSeq, const int t, const int l) {
/* 寻找中间字符串 */
    int* a;     // l-元组
    int* ma;    // 最小距离对应的l-元组
    int i, d, j;
    int dist, minDist;
    char* cs;    // consensus.
    int* st;     // start point vector
    if ((a = (int*) malloc(sizeof(int) * l)) == NULL || \
        (ma = (int*) malloc(sizeof(int) * l)) == NULL || \
        (cs = (char*) malloc(sizeof(char) * (l + 1))) == NULL) {
        fputs("Error: no space!\n", stderr);
        exit(8);
    }
    // 遍历所有可能的l-元组，寻找最小距离
    for (i = 0; i < l; i++)
        ma[i] = a[i] = 0;
    minDist = l * t;
    d = 0;
    while (d >= 0) {
        if (d < l - 1) {
            dist = getDistance(mulSeq, t, a, l, d);
            if (dist > minDist)
                d = byPass(a, d, l);
            else
                d = nextVertex(a, d, l);
        } else {
            dist = getDistance(mulSeq, t, a, l, l - 1);
            if (dist < minDist) {  // 注意最小值对应的l-元组可能有多种，这里只保留了其中一种。其实还可以设置当minDist = 0时终止搜索。
                minDist = dist;
                for (i = 0; i < l; i++)
                    ma[i] = a[i];
            }
            d = nextVertex(a, d, l);
        }
    }
    // 获取共有序列
    for (i = 0; i < l; i++)
        cs[i] = sortedAlpha[ma[i]];
    cs[l] = '\0';
    // 获取最优起始位点向量
    st = getStartPoints(mulSeq, t, ma, l);
    // 输出结果
    printf("Min distance: %d\n", minDist);
    printf("Starting point vector: ");
    for (i = 0; i < t; i++)
        printf("%d ", st[i] + 1);
    printf("\n");
    printf("Target seq:\n");
    for (i = 0; i < t; i++) {
        printf("\t");
        for (j = st[i]; j < st[i] + l; j++)
            printf("%c", mulSeq[i]->s[j]);
        printf("\n");
    }
    printf("Consensus seq: %s\n", cs);
    // 释放内存
    free(a);
    free(ma);
    free(st);
    free(cs);
}

（公众号：生信了）

序列比对（二十一）——中间字符串问题的算法及实现代码
原创：hxj7 前文介绍了基序发现问题和中间字符串问题。本文给出了中间字符串的算法和实现代码。中间字符串问题的简...
序列比对(二)——Needleman-Wunsch算法之仿射罚分
原创： hxj7 关于全局比对Needleman-Wunsch算法的介绍及线性罚分的实现代码请参考序列比对（一）全...
全局比对代码实现
一个用Python代码实现的简单程序，使用Needleman–Wunsch算法去对两条序列进行全局比对。
序列比对（二十）——基序发现问题的算法及实现代码
原创：hxj7 前文介绍了基序发现问题和中间字符串问题，本文给出了基序发现问题的具体算法和实现代码。基序发现问题...
序列比对(三)——局部联配Smith-Waterman算法
原创：hxj7 关于全局联配的介绍可参见前文：序列比对（一）全局比对Needleman-Wunsch算法序列比对（...
打印bam文件比对后的碱基矩阵python脚本
脚本的初衷是想看多序列比对后的序列整齐度；但多序列比对往往是无监督学习，需要耗费大量的内存和时间，所以以下代码实现...
3️⃣ 多序列比对(2):方法
序列比对和序列特征分析总目录多序列比对的算法很多，有 1 动态规划求解 2 渐进策略算法 3 基于一致性的方法 ...
序列比对原理
在生物信息学研究中，序列比对是一个非常基础的问题，在很多研究中都会用到。主要几种算法包括全局比对算法（Needle...
序列比对（十九）——基序发现和中间字符串问题
原创：hxj7 本文介绍了基序发现问题和中间字符串问题。引言：DNA调控元件我们知道，DNA调控元件往往是一段...
R语言实现动态规划
-by杀杀在生信中，动态规划的原理是实现序列比对算法的基础原理之一经典案例之一就是小偷问题：假如一个小偷需要偷...

序列比对（二十一）——中间字符串问题的算法及实现代码

中间字符串问题的简单算法及伪代码

分支定界策略改进简单算法

实验效果

具体代码

相关文章

序列比对（二十一）——中间字符串问题的算法及实现代码

序列比对(二)——Needleman-Wunsch算法之仿射罚分

全局比对代码实现

序列比对（二十）——基序发现问题的算法及实现代码

序列比对(三)——局部联配Smith-Waterman算法

打印bam文件比对后的碱基矩阵python脚本

3️⃣ 多序列比对(2):方法

序列比对原理

序列比对（十九）——基序发现和中间字符串问题

R语言实现动态规划

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读