iOS-字符串查找

作者: FlyElephant | 来源:发表于2016-06-10 11:20 被阅读0次

字符串查找通常有四种方式,暴力查找,KMP查找,BoyerMoore查找以及RabinKarp算法查找,查找最简单的莫过于暴力查找,如果内容是“FlyElephant”,需要查找的内容是“Elephant”,上面的四种方式会给出不同的方案~

暴力查找

最简单的暴力查找:

+(BOOL)simpleMatchPattern:(NSString *)text pattern:(NSString *)pattern{
   NSInteger cnlen=text.length;
   NSInteger ptlen=pattern.length;
   for (NSInteger i=0; i<=cnlen-ptlen; i++) {
       NSInteger j;
       for (j=0; j<ptlen; j++) {
           unichar tChar=[text characterAtIndex:i+j];
           unichar pChar=[pattern characterAtIndex:j];
           if (tChar!=pChar) {
               break;
           }
       }
       if (j==ptlen) {
           return YES;
       }
   }
   return NO;
}

遍历内容的长度,获取索引m,然后索引m...m+匹配长度,逐一比较,这个算法可以稍微改进一下:

+(NSInteger)simpleBackMatchPattern:(NSString *)text pattern:(NSString *)pattern{
   NSInteger i,j;
   NSInteger cnlen=text.length;
   NSInteger ptlen=pattern.length;
   for (i=0,j=0; i<cnlen && j<ptlen; i++) {
       unichar tChar=[text characterAtIndex:i];
       unichar pChar=[pattern characterAtIndex:j];
       if (tChar==pChar) {
           j++;
       }else{
           i=i-j;
           j=0;
       }
   }
   if (j==ptlen) {
       return i-ptlen;
   }else{
        return 0;
   }
}

KMP算法

KMP是knuth,Morris和Pratt发明的算法,暴力查找的每次只是循环是固定的加1,假设内容中只有A,B两种你字符,假设查找的字符串是BAAAABAAB,模式字符串为BAAB,我们发现前两个匹配,第三个不匹配,暴力中,我们的循环会从索引1开始继续比较,但是匹配我们知道其他的全是A,所以我们应该从第二个字母B开始匹配,减少回退的成本,实现代码:

static const NSInteger LetterCount=256;
@interface KMPSearch()

@property (strong,nonatomic) NSString *pattern;

@property (strong,nonatomic) NSMutableArray *dfa;

@end

@implementation KMPSearch

-(instancetype)initWithPattern:(NSString *)pattern{
   self=[super init];
   if (self) {
       self.pattern=pattern;
       [self setupData];
   }
   return self;
}

-(NSInteger)search:(NSString *)text{
   NSInteger i,j;
   NSInteger plen=self.pattern.length;
   NSInteger tlen=text.length;
   for (i=0,j=0; i<tlen & j<plen; i++) {
       unichar uc=[text characterAtIndex:i];
       NSArray *arr=self.dfa[uc];
       j=[arr[j] integerValue];
   }
   if (j==plen) {
       return i-plen;
   }else{
       return -1;
   }
}

-(void)setupData{
   NSInteger plen=self.pattern.length;

   self.dfa=[[NSMutableArray alloc]initWithCapacity:LetterCount];
   
   for (NSInteger i=0; i<LetterCount; i++) {
       NSMutableArray *arr=[[NSMutableArray alloc]init];
       for (NSInteger m=0; m<plen; m++) {
           [arr addObject:@(0)];
       }
       [self.dfa addObject:arr];
   }
   
   NSInteger index=[self.pattern characterAtIndex:0];
   self.dfa[index][0]=@(1);
   for (NSInteger X=0,j=1; j<plen;j++) {
       for (NSInteger c=0; c<LetterCount; c++) {
           self.dfa[c][j]=self.dfa[c][X];
       }
       NSInteger uc=[self.pattern characterAtIndex:j];
       self.dfa[uc][j]=@(j+1);
       X=[self.dfa[uc][X] integerValue];
   }
   
}

@end

BoyerMooer算法

之前的暴力查找和KMP都是从左向右扫描文本对比实现的,但是BoyerMoore是从右向组走扫描实现的,具体实现:

static const NSInteger LetterCount=256;

@interface BoyerMoore()

@property (strong,nonatomic) NSString *pattern;

@property (strong,nonatomic) NSMutableArray *rightArr;

@end

@implementation BoyerMoore

-(instancetype)initWithPattern:(NSString *)pattern{
   self=[super init];
   if (self) {
       self.pattern=pattern;
       [self setup];
   }
   return self;
}

-(NSInteger)search:(NSString *)text{
   NSInteger tlen=text.length;
   NSInteger plen=self.pattern.length;
   NSInteger skip;
   for (NSInteger i=0; i<=tlen-plen; i+=skip) {
       skip=0;
       for (NSInteger j=plen-1;j>=0;j=j-1) {
           if ([self.pattern characterAtIndex:j]!=[text characterAtIndex:(i+j)]) {
               NSInteger index=[text characterAtIndex:(i+j)];
               skip=j-[self.rightArr[index] integerValue];
               break;
           }
       }
       if (skip==0) {
           return i;//匹配成功
       }
   }
   return -1;//匹配失败
}

-(void)setup{
   self.rightArr=[[NSMutableArray alloc]init];
   for (NSInteger i=0; i<LetterCount; i++) {
       [self.rightArr addObject:@(-1)];
   }
   
   for (NSInteger j=0; j<self.pattern.length; j++) {
       NSInteger index=[self.pattern characterAtIndex:j];
       self.rightArr[index]=@(j);
   }
}

@end

Rabin-Karp指纹字符串查找

Rabin和Karp这种算法和前三种走了完全的不同的方向,基于散列的查找,也就是每一个字符串都有一个对应的Hash值,如果Hash值相同的话,那么匹配成功,基本效率保持在O(n).

static  const NSInteger Q=997;
static  const NSInteger LetterCount=10;
@interface RabinKarp()

@property (strong,nonatomic) NSString *pattern;

@property (assign,nonatomic) NSInteger pCount;
@property (assign,nonatomic) NSInteger RM;
@property (assign,nonatomic) NSInteger pathHash;

@end

@implementation RabinKarp

-(instancetype)initWithPattern:(NSString *)pattern{
   self=[super init];
   if (self) {
       self.pattern=pattern;
       self.pCount=self.pattern.length;
       [self setup];
   }
   return self;
}

-(NSInteger)search:(NSString *)text{
   NSInteger tlen=text.length;
   NSInteger txtHash=[self hash:text length:self.pCount];
   //开头匹配成功
   if (self.pathHash==txtHash) {
       return 0;
   }
   
   for (NSInteger j=self.pCount; j<tlen; j++) {
       //逐一匹配
       NSInteger chValue=[[text substringWithRange:NSMakeRange(j-self.pCount, 1)] integerValue];
       txtHash=(txtHash+Q-self.RM*chValue%Q)%Q;
       NSInteger num=[[text substringWithRange:NSMakeRange(j, 1)] integerValue];
       txtHash=(txtHash*LetterCount+num)%Q;
       if (self.pathHash==txtHash) {
           return j-self.pCount+1;
       }
   }
   return -1;
}

-(void)setup{
   self.RM=1;
   for (NSInteger i=1; i<=self.pCount-1; i++) {
       self.RM=(LetterCount*self.RM)%Q;
   }
   self.pathHash=[self hash:self.pattern length:self.pCount];
}

-(NSInteger)hash:(NSString *)key length:(NSInteger)len{
   NSInteger hash=0;
   for (NSInteger j=0; j<len; j++) {
       NSInteger chValue=[[key substringWithRange:NSMakeRange(j, 1)] integerValue];
       hash=(LetterCount*hash+chValue)%Q;
   }
   return hash;
}

-(NSInteger)hashStr:(NSString *)key length:(NSInteger)len{
   NSInteger hash=0;
   for (NSInteger j=0; j<len; j++) {
       NSInteger chValue=[key characterAtIndex:j];
       hash=(LetterCount*hash+chValue)%Q;
   }
   return hash;
}

@end

测试代码:

   //1.暴力查找
   NSString *text=@"She is a girl";
   NSString *pattern=@"girl";
   if ([SearchMatch simpleMatchPattern:text pattern:pattern]) {
       NSLog(@"%@ is contains %@",text,pattern);
   }else{
       NSLog(@"%@ is not contains %@",text,pattern);
   }
   //2.简单回退
   text=@"FlyElephant";
   pattern=@"Elem";
   NSInteger matchIndex=[SearchMatch simpleBackMatchPattern:text pattern:pattern];
   if (matchIndex>0) {
       NSLog(@"%@ contains %@ ,index %ld",text,pattern,matchIndex);
   }else{
       NSLog(@"%@ is not contains %@",text,pattern);
   }
   //3.KMP查找
   NSString *kmpText=@"BCBAABACAABABACAA";
   NSString *kmpPattern=@"ABABAC";
   kmpText=@"AAACCBBDEF";
   kmpPattern=@"A";
   KMPSearch *kmp=[[KMPSearch alloc]initWithPattern:kmpPattern];
   NSInteger result=[kmp search:kmpText];
   if (result>=0) {
       NSLog(@"%@ contains %@ ,Match Begin index:%ld",kmpText,kmpPattern,result);
   }else{
       NSLog(@"%@ not contains %@ ",kmpText,kmpPattern);
   }
   //4.BoyerMoore查找
   NSString *bmText=@"FlyElephant";
   NSString *bmPattern=@"Fly";
   BoyerMoore *bm=[[BoyerMoore alloc]initWithPattern:bmPattern];
   NSInteger bmResult=[bm search:bmText];
   if (bmResult>=0) {
       NSLog(@"%@ contains %@,Begin Index:%ld",bmText,bmPattern,bmResult);
   }else{
       NSLog(@"%@ not contains %@",bmText,bmPattern);
   }
   
   //5.RabinKarp 查找
   NSString *rmText=@"3141592653589793";
   NSString *rmPattern=@"26535";
   RabinKarp *rm=[[RabinKarp alloc]initWithPattern:rmPattern];
   NSInteger rmResult=[rm search:rmText];
   if (rmResult>=0) {
       NSLog(@"%@ contains %@,Begin Index:%ld",rmText,rmPattern,rmResult);
   }else{
       NSLog(@"%@ not contains %@",rmText,rmPattern);
   }
FlyElephant.png

相关文章

网友评论

    本文标题:iOS-字符串查找

    本文链接:https://www.haomeiwen.com/subject/avmddttx.html