使用htmlunit可以很方便的实现URL文件的下载
public static String httpDownload(String url, String encode) {
WebClient webClient = new WebClient();
webClient.getOptions().setActiveXNative(false);
webClient.getOptions().setJavaScriptEnabled(false);
webClient.getOptions().setCssEnabled(false);
InputStream is = null;
String temp = null;
StringBuilder sb = new StringBuilder();
try {
Page page = webClient.getPage(url);
is = page.getWebResponse().getContentAsStream();
byte[] bytes = new byte[4096];
int len = 0;
while ((len = is.read(bytes)) != -1) {
sb.append(new String(bytes, 0, len, encode));
}
byte[] specialByte = { (byte) 0xC2, (byte) 0xA0 };
String UTFSpace = new String(specialByte, StandardCharsets.UTF_8);
temp = sb.toString().replaceAll(UTFSpace, " ");
} catch (Exception e) {
e.printStackTrace();
} finally {
if (null != is) {
try {
is.close();
} catch (IOException e) {
e.printStackTrace();
}
}
webClient.close();
}
return temp;
}
注意:该方法针对HTML文件进行了空字符替换,将0xC2和0xA0替换为了HTML里面的 。如果是其他类型的文件,不要替换空字符,不然会导致文件打不开或乱码什么的。
网友评论