美文网首页
JAVA实现批量URICode url中的中文并更新

JAVA实现批量URICode url中的中文并更新

作者: 燃灯道童 | 来源:发表于2022-11-16 22:53 被阅读0次
    1.背景描述

    迁移数据时涉及到带中文的图片地址,类似:https://cccc.cn/cms/中国红_SPU001.jpg
    DB在处理数据的时候报错

    错误图片.png
    需要转换成这样的地址:https://cccc.cn/cms/%E4%B8%AD%E5%9B%BD%E7%BA%A2_SPU001.jpg
    才能成功导入到数据库中。
    2.需求

    把excel中第3-5列中的url地址进行URICode,并更新到该excel中。

    3.分析

    网上有一些在线转换的工具,由于数量比较多,一个个的转比较费时,还容易出错,写一段代码来完成这个工作。
    3.1 处理url的时候先检测到中文,然后把中文进行URICode,变成预期的url地址;
    3.2 使用POI读取excel表格的url数据,进行3.1,转换后的url更新到原url在的位置。

    4.编码

    引入poi相关的jar包(gradle的方式)

     implementation group: 'org.apache.poi', name: 'poi', version: '5.2.2'
     implementation group: 'org.apache.poi', name: 'poi-ooxml', version: '5.2.2'
    

    代码如下:

    public class DealURLData {
    
        private static final Integer INITIAL_COLUMN = 2;
        private static final Integer END_COLUMN = 4;
        private static final Integer INITIAL_ROW = 1;
        private static final String FILE_PATH = "C:\\XXXX\\XXX\\Desktop\\XXX\\lotus.xlsx";
        private static final String XLS_SUFFIX_NAME = "xls";
        private static final String CHECK_CN_CHARACTERS = "[\\u4e00-\\u9fa5]|[\\u3002\\uff1b\\uff0c\\uff1a\\u201c\\u201d\\uff08\\uff09\\u3001\\uff1f\\u300a\\u300b\\u3010\\u3011]";
        private static final String ESCAPE_CHARACTER_OF_NEWLINE = "\\N";
    
        public static void main(String[] args) throws Exception {
    
            readAndChangeImageUrl(FILE_PATH);
    
        }
    
        private static void readAndChangeImageUrl(String filePath) throws Exception {
            FileInputStream fileInputStream = new FileInputStream(filePath);
            Workbook workbook_des;
            //to obtain the file suffix,xls uses HSSFWorkbook; xlsx uses the XSSFWorkbook
            String suffix = getSuffix(filePath);
            if (XLS_SUFFIX_NAME.equals(suffix)) {
                workbook_des = new HSSFWorkbook(fileInputStream);
            } else {
                workbook_des = new XSSFWorkbook(fileInputStream);
            }
    
            Iterator<Sheet> iterator = workbook_des.sheetIterator();
            //get the sheet by iterating through the iterator
            while (iterator.hasNext()) {
                Sheet sheet = iterator.next();
                if (ObjectUtils.isEmpty(sheet)) {
                    break;
                }
                /* Iterate through columns 3 through 5, since these columns store the url's address*/
                for (int columnNum = INITIAL_COLUMN; columnNum <= END_COLUMN; columnNum++) {
                    for (int rowNum = INITIAL_ROW; rowNum <= sheet.getLastRowNum(); rowNum++) {
                        // get the row
                        Row row = sheet.getRow(rowNum);
                        Cell cell = row.getCell(columnNum);
                        if (ObjectUtils.isEmpty(cell) || ESCAPE_CHARACTER_OF_NEWLINE.equals(cell)) {
                            break;
                        }
                        // set the Cell type is String
                        cell.setCellType(CellType.STRING);
                        // gets the data in the cell
                        String data = new DataFormatter().formatCellValue(cell);
                        // The url address is converted and updated to this cell
                        row.createCell(columnNum).setCellValue(urlChange(data));
                    }
                }
            }
            FileOutputStream out = new FileOutputStream(filePath);
            out.flush();
            workbook_des.write(out);
            out.close();
        }
    
        /**
         * get the suffix of the file 
         * @param filepath
         */
        private static String getSuffix(String filepath) {
            if (StringUtils.isEmpty(filepath)) {
                return "";
            }
            int index = filepath.lastIndexOf(".");
            if (index == -1) {
                return "";
            }
            return filepath.substring(index + 1, filepath.length());
        }
    
        /**
         * @Description convert the Chinese characters in the url
         * @Param url
         **/
        private static String urlChange(String url) {
            Matcher matcher = Pattern.compile(CHECK_CN_CHARACTERS).matcher(url);
            while (matcher.find()) {
                String tmp = matcher.group();
                try {
                    url = url.replaceAll(tmp, java.net.URLEncoder.encode(tmp, "utf-8"));
                } catch (UnsupportedEncodingException e) {
                    e.printStackTrace();
                }
            }
            return url;
        }
    
    }
    
    5.验证

    转换前的样子:


    转换前.png

    转换后的结果:


    转换后的结果.png
    6.回炉

    当我满怀欣喜执行数据的时候,发现还有问题,像Crème-d’Nude这种字符,里面没有中文,上面的代码也不会转换,在更新数据库中还是有问题。
    就是要把url转成前端用 encodeURIComponent 转码后的结果入下图:


    FE-encodeURIComponent.png

    后端用下面代码执行后的结果为:


    BE-encodeUrl.png

    在网上找了下述代码,测试可以

    public static String urlEncode(String url) throws UnsupportedEncodingException {
           if(url == null) {
               return null;
           }
           final String reserved_char = ";/?:@=&";
           String ret = "";
           for(int i=0; i < url.length(); i++) {
               String cs = String.valueOf( url.charAt(i) );
               if(reserved_char.contains(cs)){
                   ret += cs;
               }else{
                   ret += URLEncoder.encode(cs, "utf-8");
               }
           }
           return ret.replace("+", "%20");
       }
    

    参考链接:
    https://blog.csdn.net/zhemeban/article/details/124895170
    https://blog.csdn.net/weixin_50002647/article/details/123986127
    https://blog.csdn.net/qq_35856336/article/details/92629290

    相关文章

      网友评论

          本文标题:JAVA实现批量URICode url中的中文并更新

          本文链接:https://www.haomeiwen.com/subject/gksuxdtx.html