美文网首页
关于代理ip的一些笔记

关于代理ip的一些笔记

作者: silencefun | 来源:发表于2019-01-22 18:19 被阅读20次

    还是爬虫需要ip池支撑。
    搜一下是一大堆免费的但是需要过滤筛选 能用的。

    1.免费代理ip的获取

    https://www.xicidaili.com/nn/

    image.png

    http://www.66ip.cn/nmtq.php?getnum=1

    image.png

    第一个 需要解析
    第二个 可以自定义数量

    2.验证

    看能否使用正常访问

     /**
     * 测试 代理ip是否有效
     * 
     * @param ip
     * @param port
     */
    public static void createIPAddress(String ip, int port) {
        URL url = null;
        try {
            url = new URL("http://www.baidu.com");
        } catch (MalformedURLException e) {
            System.out.println("url invalidate");
        }
        InetSocketAddress addr = null;
        addr = new InetSocketAddress(ip, port);
        Proxy proxy = new Proxy(Proxy.Type.HTTP, addr); // http proxy
        InputStream in = null;
        try {
            URLConnection conn = url.openConnection(proxy);
            conn.setConnectTimeout(1000);
            in = conn.getInputStream();
        } catch (Exception e) {
             e.printStackTrace();
             System.err.println("ip " + ip + " is not aviable");// 异常IP
        }
            String s = convertStreamToString(in);
    
        if (s.indexOf("baidu") > 0) {// 有效IP
            System.err.println(ip + ":" + port + " is ok");
            
            
            CrawlerUtis.appendLog("C:\\Users\\21555\\Desktop\\ip_enable.txt",
                    ip + " " + port + "\r\n");
        }
    }
    
    
    public static String convertStreamToString(InputStream is) {
        if (is == null)
            return "";
        BufferedReader reader = new BufferedReader(new InputStreamReader(is));
        StringBuilder sb = new StringBuilder();
        String line = null;
        try {
            while ((line = reader.readLine()) != null) {
                sb.append(line+"\r\n");
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                is.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        return sb.toString();
    
    }
    

    3.关于解析

    3.1xicidail网站的解析

     public static List<String> AnalyIppool() {
    
        try {
            URL url = new URL("https://www.xicidaili.com/nn/");
            URLConnection connection = url.openConnection();
            connection.setRequestProperty("User-Agent","Mozilla/4.0 (compatible, MSIE 7.0, Windows NT 5.1, TencentTraveler 4.0)");
                        //要加上User-Agent
                   connection.setRequestProperty("Charsert", "UTF-8"); //设置请求编码
             
                   connection.setRequestProperty("Content-Type",  "application/json"); 
                    connection.connect();
            InputStream in = connection.getInputStream();
     
            
            
            Document document = Jsoup.parse(convertStreamToString(in));
    
            Elements ss = document.getElementsByClass("odd");
            for (Element element : ss) {
                
                AnalyIpAndcheck(element.text());
                //System.out.println(element.text());
            }
    
            
        } catch (Exception e) {
            e.printStackTrace();
        }
    
        return null;
    
    }
    
    private static String AnalyIpAndcheck(String iporign) {
        String[] ipp=iporign.split(" ");
        createIPAddress(ipp[0],Integer.parseInt(ipp[1]));
        
        
        return null;
    }
    

    3.2 第二个直接是接口数据
    http://www.66ip.cn/nmtq.php?getnum=2000
    请求多个ip,每次读一行,然后可以使用线程池来执行。

    关键代码:

        private static ThreadPoolExecutor executor== new ThreadPoolExecutor(5, 30, 300, TimeUnit.MILLISECONDS, new ArrayBlockingQueue<Runnable>(3),
                    new ThreadPoolExecutor.CallerRunsPolicy());
    

    传入解析后的list

    public static List<String> AnalyIppool2(String path) {

        List<String> list = CrawlerUtis.filter(path);//转化为list 方法 通过本地文件读 可以直接写请求
        
        for (String string : list) {
            
            
            executor.execute(new Runnable() {
                
                @Override
                public void run() {
                    String[] ip=string.split(":");
                    createIPAddress(ip[0],Integer.parseInt(ip[1]));
                    
                }
            });
        
            
        }

    相关文章

      网友评论

          本文标题:关于代理ip的一些笔记

          本文链接:https://www.haomeiwen.com/subject/cujrjqtx.html