摘要:
本文提出了一个基于OSS(Optimal-Solution Space) Model构建和机器学习预测的Stencil自动优化框架。框架通过feature extractor提取architecture,algorithm,input等多个维度的特征构建OSS,并训练off-line的模型以提供online的预测。通过与SDSL,PATUS等state-of-the-art自动优化系统的比较,FAST可以极快的速度取得相当的优化性能。
Motivation:
文章基于的一个主要观察是,two stencil computations share the same (near-)optimal solutions if they have high similarity in computing features.
OSS:
可以将OSS理解为选取K最优的策略,而非得到单一的最优解。
我们希望做得是从feature vector得到OSS。
具体的映射则不是f->OSS的映射,而是x(feature difference)到OR(Overlapping Ratio)的映射。
y的定义参见原文,不再列举更多公式。
一个值得注意的问题是,OSS规模的选择。规模太小,准确率肯定似乎有问题的,规模大了开销又上去了。通过OSS规模和OR以及Performance Lower Bound的关系,作者发现得到如下结论:
a small OSS covers most of the solutions with the highest performance.
larger OSSs have higher OR they share more optimal (near-optimal) solutions with each other.
代码生成
eDSL codes--->high level language(native code)--->auto-tuned code(blocking,OpenMP,unrolling,SIMDization,Compiler flag etc)
评估
Dataset
FDTD 3D 5-point stencil with order-1 computational electrodynamics.
HEAT 3D 7-point stencil with order-1 chemical di�usion
WAVE 3D 25-points stencil with order-4 fluid dynamics,
POISSON 3D 19-points stencil with order-1 mechanical engineering
HIMENO 3D 19-points with order-1 UNKNOWN
Comparation
Baseline:straightforward implementation
SDSL:The stencil domain specic language (SDSL)
Patus:The Patus stencil optimization framework
术语习得
自动优化策略:
>search-based 检索空间大,研究者采用pruning,heuristic searching等优化手段
>prediction-based 开销小,但并不好构建。(对输入敏感,near-optimal和optimal区分度不够)
DSL(Domain Specific Language)
对于一些特定领域的问题,构建专门的DSL语言进行描述。
执行source to source的转换,将DSL转为某高级语言(C/CUDA等);再对高级语言进行优化和代码生成。
我们可以认为理想情况下,相关领域的专家可以非常容易地利用DSL进行算法设计,而不需掌握很多编程语言的知识。不过为了DSL,我们显然需要构建相应的编译器以实现代码的转换,比如使用ROSE,LLVM/Clang。
polyhedral compiler optimization
参见SDSL论文
相关文章
多平台自动优化
S. Hong, H. Cha, E. Sedlar, and K. Olukotun. Green-marl: a dsl for easy and ecient graph analysis. 2012
C. Matthias, S. Olaf, and B. Helmar. Patus: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. 2011
T. Lutz, C. Fensch, and M. Cole. Partans: An autotuning framework for stencil computation on multi-gpu systems. 2013
M. Khan, P. Basu, G. Rudy, M. Hall, C. Chen, and J. Chame. A script-based autotuning compiler system to generate high-performance cuda code.
网友评论