美文网首页工作生活
爬取网页内容

爬取网页内容

作者: kanaSki | 来源:发表于2019-07-03 20:20 被阅读0次
        public static void main(String[] args) throws Exception {
            URL url = new URL("https://www.jd.com");
            InputStream inputStream = url.openStream();
            BufferedReader br = new BufferedReader(new InputStreamReader(inputStream, "utf8"));
            String str = null;
            while ((str = br.readLine()) != null) {
                System.out.println(str);
            }
            br.close();
        }
    

    但是有的网站不允许,因此可以模拟浏览器进行访问。

        public static void main(String[] args) throws Exception {
            URL url = new URL("https://www.dianping.com");
            HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
            urlConnection.setRequestMethod("GET");
            urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36");
            BufferedReader br = new BufferedReader(new InputStreamReader(urlConnection.getInputStream()));
            String s = null;
            while ((s = br.readLine()) != null) {
                System.out.println(s);
            }
            br.close();
        }
    

    相关文章

      网友评论

        本文标题:爬取网页内容

        本文链接:https://www.haomeiwen.com/subject/fmxhhctx.html