1.背景描述
迁移数据时涉及到带中文的图片地址,类似:https://cccc.cn/cms/中国红_SPU001.jpg
DB在处理数据的时候报错
需要转换成这样的地址:https://cccc.cn/cms/%E4%B8%AD%E5%9B%BD%E7%BA%A2_SPU001.jpg
才能成功导入到数据库中。
2.需求
把excel中第3-5列中的url地址进行URICode,并更新到该excel中。
3.分析
网上有一些在线转换的工具,由于数量比较多,一个个的转比较费时,还容易出错,写一段代码来完成这个工作。
3.1 处理url的时候先检测到中文,然后把中文进行URICode,变成预期的url地址;
3.2 使用POI读取excel表格的url数据,进行3.1,转换后的url更新到原url在的位置。
4.编码
引入poi相关的jar包(gradle的方式)
implementation group: 'org.apache.poi', name: 'poi', version: '5.2.2'
implementation group: 'org.apache.poi', name: 'poi-ooxml', version: '5.2.2'
代码如下:
public class DealURLData {
private static final Integer INITIAL_COLUMN = 2;
private static final Integer END_COLUMN = 4;
private static final Integer INITIAL_ROW = 1;
private static final String FILE_PATH = "C:\\XXXX\\XXX\\Desktop\\XXX\\lotus.xlsx";
private static final String XLS_SUFFIX_NAME = "xls";
private static final String CHECK_CN_CHARACTERS = "[\\u4e00-\\u9fa5]|[\\u3002\\uff1b\\uff0c\\uff1a\\u201c\\u201d\\uff08\\uff09\\u3001\\uff1f\\u300a\\u300b\\u3010\\u3011]";
private static final String ESCAPE_CHARACTER_OF_NEWLINE = "\\N";
public static void main(String[] args) throws Exception {
readAndChangeImageUrl(FILE_PATH);
}
private static void readAndChangeImageUrl(String filePath) throws Exception {
FileInputStream fileInputStream = new FileInputStream(filePath);
Workbook workbook_des;
//to obtain the file suffix,xls uses HSSFWorkbook; xlsx uses the XSSFWorkbook
String suffix = getSuffix(filePath);
if (XLS_SUFFIX_NAME.equals(suffix)) {
workbook_des = new HSSFWorkbook(fileInputStream);
} else {
workbook_des = new XSSFWorkbook(fileInputStream);
}
Iterator<Sheet> iterator = workbook_des.sheetIterator();
//get the sheet by iterating through the iterator
while (iterator.hasNext()) {
Sheet sheet = iterator.next();
if (ObjectUtils.isEmpty(sheet)) {
break;
}
/* Iterate through columns 3 through 5, since these columns store the url's address*/
for (int columnNum = INITIAL_COLUMN; columnNum <= END_COLUMN; columnNum++) {
for (int rowNum = INITIAL_ROW; rowNum <= sheet.getLastRowNum(); rowNum++) {
// get the row
Row row = sheet.getRow(rowNum);
Cell cell = row.getCell(columnNum);
if (ObjectUtils.isEmpty(cell) || ESCAPE_CHARACTER_OF_NEWLINE.equals(cell)) {
break;
}
// set the Cell type is String
cell.setCellType(CellType.STRING);
// gets the data in the cell
String data = new DataFormatter().formatCellValue(cell);
// The url address is converted and updated to this cell
row.createCell(columnNum).setCellValue(urlChange(data));
}
}
}
FileOutputStream out = new FileOutputStream(filePath);
out.flush();
workbook_des.write(out);
out.close();
}
/**
* get the suffix of the file
* @param filepath
*/
private static String getSuffix(String filepath) {
if (StringUtils.isEmpty(filepath)) {
return "";
}
int index = filepath.lastIndexOf(".");
if (index == -1) {
return "";
}
return filepath.substring(index + 1, filepath.length());
}
/**
* @Description convert the Chinese characters in the url
* @Param url
**/
private static String urlChange(String url) {
Matcher matcher = Pattern.compile(CHECK_CN_CHARACTERS).matcher(url);
while (matcher.find()) {
String tmp = matcher.group();
try {
url = url.replaceAll(tmp, java.net.URLEncoder.encode(tmp, "utf-8"));
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
}
return url;
}
}
5.验证
转换前的样子:
转换前.png
转换后的结果:
转换后的结果.png
6.回炉
当我满怀欣喜执行数据的时候,发现还有问题,像Crème-d’Nude这种字符,里面没有中文,上面的代码也不会转换,在更新数据库中还是有问题。
就是要把url转成前端用 encodeURIComponent 转码后的结果入下图:
FE-encodeURIComponent.png
后端用下面代码执行后的结果为:
BE-encodeUrl.png
在网上找了下述代码,测试可以
public static String urlEncode(String url) throws UnsupportedEncodingException {
if(url == null) {
return null;
}
final String reserved_char = ";/?:@=&";
String ret = "";
for(int i=0; i < url.length(); i++) {
String cs = String.valueOf( url.charAt(i) );
if(reserved_char.contains(cs)){
ret += cs;
}else{
ret += URLEncoder.encode(cs, "utf-8");
}
}
return ret.replace("+", "%20");
}
参考链接:
https://blog.csdn.net/zhemeban/article/details/124895170
https://blog.csdn.net/weixin_50002647/article/details/123986127
https://blog.csdn.net/qq_35856336/article/details/92629290
网友评论