美文网首页工作生活
mysql和hbase关联查询统计及java8流处理实战

mysql和hbase关联查询统计及java8流处理实战

作者: mml_慢慢来 | 来源:发表于2019-07-03 10:55 被阅读0次

    新项目重构,由于数据量太大,采用了mysql存主表和hbase存记录表的方式

    (使用的phoenix操作hbase,通过mybatis多数据源连接mysql和phoenix,具体实现移步https://blog.csdn.net/qq_31349087/article/details/88535387)

    现有一个需求,分别按老师,班级,校区的维度查询学员的实操合格率,作业达标率,目前老师,班级,校区信息都在mysql,学员的做题记录在hbase,经过分析,按时间先在mysql查询班级列表,每条记录包含该班的学员和计划做题的题目id,然后根据学员id和题目id去hbase里做统计查询,之后在使用java8的分组查出老师和校区维度的数据

    @Service
    public class JobStatServiceImpl implements JobStatService {
     
        @Autowired
        private ErrorHistoryMapper errorHistoryMapper;
     
        @Autowired
        private HBaseErrorHistoryMapper hBaseErrorHistoryMapper;
     
        @Autowired
        private CacheRedisService cacheRedisService;
     
        @Override
        public List<JobStatResult> getJobStatList(LocalDateTime beginDate, LocalDateTime endDate) {
            List<JobStatResult> jobStatList = errorHistoryMapper.getJobStatList(beginDate, endDate);
            if (jobStatList != null && !jobStatList.isEmpty()) {
                jobStatList.parallelStream().iterator().forEachRemaining(jobStatResult -> {
                    List<Integer> studentIds = strToList(jobStatResult.getStudentIds(), true);
                    List<String> sectionCodes = strToList(jobStatResult.getSectionCodes(), false);
                    Map<String, Long> jobTrueCount = hBaseErrorHistoryMapper.getJobTrueCount(studentIds, sectionCodes);
                    jobStatResult.setPlanNumber(getPlanNumber(jobStatResult.getSectionCodes()));//计划做题数
                    jobStatResult.setDoneNumber(jobTrueCount.get("DONENUMBER"));//完成数
                    jobStatResult.setTrueNumber(jobTrueCount.get("TRUENUMBER"));//正确数
                });//班级达标率
     
                System.err.println(JSON.toJSONString(jobStatList));
     
     
                Collection<JobStatResult> values = jobStatList.parallelStream().filter(Objects::nonNull)
                    .collect(Collectors.groupingBy(JobStatResult::getTeacherId,
                    Collectors.reducing(new JobStatResult(), (obj1, obj2) -> {
                        JobStatResult jobStatResult = new JobStatResult();
                        BeanUtils.copyProperties(obj2, jobStatResult);
                        jobStatResult.setPlanNumber(obj1.getPlanNumber() + obj2.getPlanNumber());
                        jobStatResult.setDoneNumber(obj1.getDoneNumber() + obj2.getDoneNumber());
                        jobStatResult.setTrueNumber(obj1.getTrueNumber() + obj2.getTrueNumber());
                        return jobStatResult;
                    }))).values(); //老师达标率
     
                System.err.println(JSON.toJSONString(collect));
     
                Collection<JobStatResult> values = jobStatList.parallelStream().collect(Collectors.groupingBy(JobStatResult::getSchoolCode,
                        Collectors.reducing(new JobStatResult(), (obj1, obj2) -> {
                            obj1.setPlanNumber(obj1.getPlanNumber() + obj2.getPlanNumber());
                            obj1.setDoneNumber(obj1.getDoneNumber() + obj2.getDoneNumber());
                            obj1.setTrueNumber(obj1.getTrueNumber() + obj2.getTrueNumber());
                            return obj1;
                        }))).values();//校区达标率
     
                System.err.println(JSON.toJSONString(values));
     
            }
            return jobStatList;
        }
     
        /**
         * 计划做题数
         * @param str
         * @return
         */
        private Long getPlanNumber(String str) {
            AtomicLong planNumber = new AtomicLong(0);
            if (str == null || str.isEmpty()) {
                return planNumber.get();
            }
            String[] split = str.split(",");
            Arrays.asList(split).parallelStream().filter(s -> {
                if (s == null || s.isEmpty())
                    return false;
                return true;
            }).forEach(s -> {
                Long chapterPlanNumber = cacheRedisService.getChapterPlanNumber(s);
                planNumber.addAndGet(chapterPlanNumber);
            });
            return planNumber.get();
        }
     
        /**
         * str转list
         * @param str
         * @param f
         * @return
         */
        private List strToList(String str, boolean f) {
            List list = f ? new ArrayList(){{add(0);}} : new ArrayList(){{add("kckm");}}; //随意设个值,防止为空报错
            if (str == null || str.isEmpty()) {
                return list;
            }
            String[] split = str.split(",");
            if (f) {
                List<Integer> collect = Arrays.stream(split).filter(s -> {
                    if (s == null || s.isEmpty())
                        return false;
                    return true;
                }).mapToInt(Integer::valueOf).boxed().collect(Collectors.toList());
                list.addAll(collect);
            } else {
                List<String> collect = Arrays.stream(split).filter(s -> {
                    if (s == null || s.isEmpty())
                        return false;
                    return true;
                }).collect(Collectors.toList());
                list.addAll(collect);
            }
            return list;
        }
    }
    

    使用Collectors.groupingBy按字段分组,然后使用Collectors.reducing进行合并,这里的java8的mapReduce和hadoop的mapReduce都是一种编程模型,map(映射)reduce(规约),我这里用的是list.parallelStream(),内部会自己创建多线程跑你自定义的任务()好像是用的jdk7的forkjoin框架),所以需要注意线程安全问题,我这里统计计划做题数的时候定义了一个AtomicLong原子类,可以保证多线程环境下累加的数据正确性

    建议,hbase不支持事务,mysql+hbase不能保证数据一致性,最好hbase存一些比较久远的数据,新进的数据还是放mysql,这样关系型数据库也方便操作

    相关文章

      网友评论

        本文标题:mysql和hbase关联查询统计及java8流处理实战

        本文链接:https://www.haomeiwen.com/subject/ajxqhctx.html