美文网首页Java高级进阶
Java实现GroupBy/分组TopN功能

Java实现GroupBy/分组TopN功能

作者: e4e9aa34f536 | 来源:发表于2018-11-06 21:15 被阅读1次

    介绍

    在Java 8 的Lambda(stream)之前,要在Java代码中实现类似SQL中的group by分组聚合功能,还是比较困难的。这之前Java对函数式编程支持不是很好,Scala则把函数式编程发挥到了机制,实现一个group by聚合对Scala来说就是几行代码的事情:

    valbirds = List("Golden Eagle","Gyrfalcon","American Robin","Mountain BlueBird","Mountain-Hawk Eagle")valgroupByFirstLetter =birds.groupby(_.charAt(0))

    输出:

    Map(M-> List(MountainBlueBird, Mountain-Hawk Eagle), G -> List(GoldenEagle, Gyrfalcon),  A -> List(AmericanRobin))

    Java也有一些第三方的函数库来支持,例如Guava的Function,以及functional java这样的库。 但总的来说,内存对Java集合进行GroupBy ,OrderBy, Limit等TopN操作还是比较繁琐。本文实现一个简单的group功能,支持自定义key以及聚合函数,通过简单的几个类,可以实现SQL都比较难实现的先分组,然后组内排序,最后取组内TopN。

    源码可以在这里下载;

    实现

    假设我们有这样一个Person类:

    packageme.lin;classPerson{privateString name;privateintage;privatedoublesalary;publicPerson(String name,intage,doublesalary){super();this.name = name;this.age = age;this.salary = salary; }publicStringgetName(){returnname; }publicvoidsetName(String name){this.name = name; }publicintgetAge(){returnage; }publicvoidsetAge(intage){this.age = age; }publicdoublegetSalary(){returnsalary; }publicvoidsetSalary(doublesalary){this.salary = salary; }publicStringgetNameAndAge(){returnthis.getName() +"-"+this.getAge(); }@OverridepublicStringtoString(){return"Person [name="+ name +", age="+ age +", salary="+ salary +"]"; }}

    对于一个Person的List,想要根据年龄进行统计,取第一个值,取salary最高值等。实现如下:

    聚合操作

    定义一个聚合接口,用于对分组后的元素进行聚合操作,类比到MySQL中的count(*) 、sum():

    package me.lin;import java.util.List;/**

    *

    * 聚合操作

    *

    * Created by Brandon on 2016/7/21.

    */publicinterfaceAggregator{/** * 每一组的聚合操作 * *@paramkey 组别标识key *@paramvalues 属于该组的元素集合 *@return*/Object aggregate(Object key ,List values);}

    我们实现几个聚合操作,更复杂的操作支持完全可以自己定义。

    CountAggragator:package me.lin;importjava.util.List;/**

    *

    * 计数聚合操作

    *

    * Created by Brandon on 2016/7/21.

    */publicclassCountAggregatorimplementsAggregator{@OverridepublicObjectaggregate(Objectkey,List values) {returnvalues.size(); }}

    FisrtAggregator:package me.lin;importjava.util.List;/**

    *

    * 取第一个元素

    *

    * Created by Brandon on 2016/7/21.

    */publicclassFirstAggregatorimplementsAggregator{@OverridepublicObjectaggregate(Objectkey,List values) {if( values.size() >=1) {returnvalues.get(0); }else{returnnull; } }}

    TopNAggregator:packageme.lin;importjava.util.ArrayList;importjava.util.Collections;importjava.util.Comparator;importjava.util.List;/**

    *

    * 取每组TopN

    *

    * Created by Brandon on 2016/7/21.

    */publicclass TopNAggregator implements Aggregator {privateComparator comparator;privateintlimit;publicTopNAggregator(Comparator comparator,intlimit) {this.limit = limit;this.comparator = comparator; } @OverridepublicObjectaggregate(Objectkey, List values) {if(values ==null|| values.size() ==0) {returnnull; } ArrayListcopy=newArrayList<>( values ); Collections.sort(copy, comparator);intsize= values.size();inttoIndex = Math.min(limit,size);returncopy.subList(0, toIndex); }}

    分组实现

    接下来是分组实现,简单起见,采用工具类实现:

    packageme.lin;importjava.lang.reflect.Field;importjava.lang.reflect.InvocationTargetException;importjava.lang.reflect.Method;importjava.util.ArrayList;importjava.util.Collection;importjava.util.Collections;importjava.util.HashMap;importjava.util.Map;/**

    * Collection分组工具类

    */publicclass GroupUtils {/**

    * 分组聚合

    *

    * @param listToDeal 待分组的数据,相当于SQL中的原始表

    * @param clazz 带分组数据元素类型

    * @param groupBy 分组的属性名称

    * @param aggregatorMap 聚合器,key为聚合器名称,作为返回结果中聚合值map中的key

    * @param <T> 元素类型Class

    * @return

    * @throws NoSuchFieldException

    * @throws SecurityException

    * @throws IllegalArgumentException

    * @throws IllegalAccessException

    */publicstatic Map> groupByProperty( Collection listToDeal, Class clazz,StringgroupBy, Map> aggregatorMap)throwsNoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException { Map> groupResult =newHashMap>();for(T ele : listToDeal) { Field field = clazz.getDeclaredField(groupBy); field.setAccessible(true);Objectkey= field.get(ele);if(!groupResult.containsKey(key)) { groupResult.put(key,newArrayList()); } groupResult.get(key).add(ele); }returninvokeAggregators(groupResult, aggregatorMap); }publicstatic Map> groupByMethod( Collection listToDeal, Class clazz,StringgroupByMethodName, Map> aggregatorMap)throwsNoSuchMethodException, SecurityException, IllegalAccessException, IllegalArgumentException, InvocationTargetException { Map> groupResult =newHashMap>();for(T ele : listToDeal) { Method groupByMenthod = clazz.getDeclaredMethod(groupByMethodName); groupByMenthod.setAccessible(true);Objectkey= groupByMenthod.invoke(ele);if(!groupResult.containsKey(key)) { groupResult.put(key,newArrayList()); } groupResult.get(key).add(ele); }returninvokeAggregators(groupResult, aggregatorMap); }privatestatic Map> invokeAggregators(Map> groupResult, Map> aggregatorMap) { Map> aggResults =newHashMap<>();for(Objectkey: groupResult.keySet()) { Collection group = groupResult.get(key); Map aggValues = doInvokeAggregators(key, group, aggregatorMap);if(aggValues !=null&& aggValues.size() >0) { aggResults.put(key, aggValues); } }returnaggResults; }privatestatic Map doInvokeAggregators(Objectkey, Collection group, Map> aggregatorMap) { Map aggResults =newHashMap();if(group !=null&& group.size() >0) {// 调用当前key的每一个聚合函数for(StringaggKey : aggregatorMap.keySet()) { Aggregator aggregator = aggregatorMap.get(aggKey);ObjectaggResult = aggregator.aggregate(key, Collections.unmodifiableList(newArrayList(group))); aggResults.put(aggKey, aggResult); } }returnaggResults; }}

    上述代码中,分组的key可以指定元素的属性,也可以指定元素的方法,通过自己实现复杂方法和聚合函数,可以实现很强大的分组功能。

    测试

    根据属性分组

    下面测试一下根据属性分组:

    packageme.lin;importjava.util.ArrayList;importjava.util.Comparator;importjava.util.HashMap;importjava.util.List;importjava.util.Map;publicclass GroupByPropertyTest {publicstaticvoidmain(String[] args)throwsNoSuchFieldException, SecurityException, IllegalArgumentException, IllegalAccessException { List persons =newArrayList<>(); persons.add(newPerson("Brandon",15,5000)); persons.add(newPerson("Braney",15,15000)); persons.add(newPerson("Jack",10,5000)); persons.add(newPerson("Robin",10,500000)); persons.add(newPerson("Tony",10,1400000)); Map> aggregatorMap =newHashMap<>(); aggregatorMap.put("count",newCountAggregator()); aggregatorMap.put("first",newFirstAggregator()); Comparator comparator =newComparator() {publicintcompare(finalPerson o1,finalPerson o2) {doublediff = o1.getSalary() - o2.getSalary();if(diff ==0) {return0; }returndiff >0?-1:1; } }; aggregatorMap.put("top2",newTopNAggregator( comparator ,2)); Map> aggResults = GroupUtils.groupByProperty(persons, Person.class,"age", aggregatorMap);for(Objectkey: aggResults.keySet()) { System.out.println("Key:"+key); Map results = aggResults.get(key);for(StringaggKey : results.keySet()) { System.out.println(" aggkey->"+ results.get(aggKey)); } } }}

    输出结果:

    Key:10 aggkey->3 aggkey->Person [name=Jack,age=10,salary=5000.0] aggkey->[Person [name=Tony,age=10,salary=1400000.0], Person [name=Robin,age=10,salary=500000.0]]Key:15 aggkey->2 aggkey->Person [name=Brandon,age=15,salary=5000.0] aggkey->[Person [name=Braney,age=15,salary=15000.0], Person [name=Brandon,age=15,salary=5000.0]]

    根据方法返回值分组

    测试根据方法返回值分组:

    packageme.lin;importjava.util.ArrayList;importjava.util.Comparator;importjava.util.HashMap;importjava.util.List;importjava.util.Map;publicclass GroupByMethodTest {publicstaticvoidmain(String[] args)throwsException { List persons =newArrayList<>(); persons.add(newPerson("Brandon",15,5000)); persons.add(newPerson("Brandon",15,15000)); persons.add(newPerson("Jack",10,5000)); persons.add(newPerson("Robin",10,500000)); persons.add(newPerson("Tony",10,1400000)); Map> aggregatorMap =newHashMap<>(); aggregatorMap.put("count",newCountAggregator()); aggregatorMap.put("first",newFirstAggregator()); Comparator comparator =newComparator() {publicintcompare(finalPerson o1,finalPerson o2) {doublediff = o1.getSalary() - o2.getSalary();if(diff ==0) {return0; }returndiff >0?-1:1; } }; aggregatorMap.put("top2",newTopNAggregator(comparator,2)); Map> aggResults = GroupUtils.groupByMethod(persons, Person.class,"getNameAndAge", aggregatorMap);for(Objectkey: aggResults.keySet()) { System.out.println("Key:"+key); Map results = aggResults.get(key);for(StringaggKey : results.keySet()) { System.out.println(" "+ aggKey +"->"+ results.get(aggKey)); } } }}

    测试结果:

    Key:Robin-10 count->1 first->Person [name=Robin,age=10,salary=500000.0] top2->[Person [name=Robin,age=10,salary=500000.0]]Key:Jack-10 count->1 first->Person [name=Jack,age=10,salary=5000.0] top2->[Person [name=Jack,age=10,salary=5000.0]]Key:Tony-10 count->1 first->Person [name=Tony,age=10,salary=1400000.0] top2->[Person [name=Tony,age=10,salary=1400000.0]]Key:Brandon-15 count->2 first->Person [name=Brandon,age=15,salary=5000.0] top2->[Person [name=Brandon,age=15,salary=15000.0], Person [name=Brandon,age=15,salary=5000.0]]

    以上就是GroupBy的简单实现,如果问题,欢迎指出。

    有兴趣可以加一下854630135这个群去交流一下噢

    欢迎交流。

    相关文章

      网友评论

        本文标题:Java实现GroupBy/分组TopN功能

        本文链接:https://www.haomeiwen.com/subject/cwspxqtx.html