美文网首页
初学爬虫——java实现

初学爬虫——java实现

作者: Yuu_CX | 来源:发表于2016-11-22 21:29 被阅读0次

    1.寻找本地指定文件上的邮箱帐号

    public class TestDemo {
        public static void main(String[] args) throws IOException {
            List<String> list = getMails();
            for(String Mails:list){
                System.out.println(Mails);
            }
        }
        
        public static List<String> getMails() throws IOException{
            BufferedReader br = new BufferedReader(new FileReader("d:\\BugReport.txt"));//D盘中放了一个BugReport.txt文件
            String regex = "\\w+@\\w+(\\.\\w+)+";
            List<String> list = new ArrayList<String>();
            Pattern p = Pattern.compile(regex);
            String line = null;
            while((line=br.readLine())!=null){
                Matcher m = p.matcher(line);
                while(m.find()){
                    list.add(m.group());
                }
            }
            return list;
        }
    }
    

    2.寻找任一网页上的邮箱帐号,这里以贴吧上的留邮箱帖子为例,获取该网页上的所有邮箱:

    public class TestDemo {
        public static void main(String[] args) throws IOException {
            List<String> list = getMailsByWEB();
            for(String Mails:list){
                System.out.println(Mails);
            }
        }
        
        public static List<String> getMailsByWEB() throws IOException{
            URL url = new URL("http://tieba.baidu.com/p/2314539885");
            BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream()));
            String regex = "\\w+@\\w+(\\.\\w+)+";
            List<String> list = new ArrayList<String>();
            Pattern p = Pattern.compile(regex);
            String line = null;
            while((line=br.readLine())!=null){
                Matcher m = p.matcher(line);
                while(m.find()){
                    list.add(m.group());
                }
            }
            return list;
        }
    }
    

    相关文章

      网友评论

          本文标题:初学爬虫——java实现

          本文链接:https://www.haomeiwen.com/subject/sqjmpttx.html