美文网首页
初学爬虫——java实现

初学爬虫——java实现

作者: Yuu_CX | 来源:发表于2016-11-22 21:29 被阅读0次

1.寻找本地指定文件上的邮箱帐号

public class TestDemo {
    public static void main(String[] args) throws IOException {
        List<String> list = getMails();
        for(String Mails:list){
            System.out.println(Mails);
        }
    }
    
    public static List<String> getMails() throws IOException{
        BufferedReader br = new BufferedReader(new FileReader("d:\\BugReport.txt"));//D盘中放了一个BugReport.txt文件
        String regex = "\\w+@\\w+(\\.\\w+)+";
        List<String> list = new ArrayList<String>();
        Pattern p = Pattern.compile(regex);
        String line = null;
        while((line=br.readLine())!=null){
            Matcher m = p.matcher(line);
            while(m.find()){
                list.add(m.group());
            }
        }
        return list;
    }
}

2.寻找任一网页上的邮箱帐号,这里以贴吧上的留邮箱帖子为例,获取该网页上的所有邮箱:

public class TestDemo {
    public static void main(String[] args) throws IOException {
        List<String> list = getMailsByWEB();
        for(String Mails:list){
            System.out.println(Mails);
        }
    }
    
    public static List<String> getMailsByWEB() throws IOException{
        URL url = new URL("http://tieba.baidu.com/p/2314539885");
        BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream()));
        String regex = "\\w+@\\w+(\\.\\w+)+";
        List<String> list = new ArrayList<String>();
        Pattern p = Pattern.compile(regex);
        String line = null;
        while((line=br.readLine())!=null){
            Matcher m = p.matcher(line);
            while(m.find()){
                list.add(m.group());
            }
        }
        return list;
    }
}

相关文章

网友评论

      本文标题:初学爬虫——java实现

      本文链接:https://www.haomeiwen.com/subject/sqjmpttx.html