除了进行图形渲染,我们还可以利用GPU硬件特点,将一些在CPU上执行起来很耗时的计算任务分配给GPU来完成(一些特定的计算任务,在GPU上快的真不是一点半点)。GPGPU Programming(General-purpose GPU Programming)的概念由来已久,但在使用OpenGL与GPU打交道时,我们只能用比较隐蔽的方式来实践,比如将我们想执行的计算任务嵌入到图形渲染管线当中。但有了Metal,我们就不需要这么拐弯抹角了。Metal提供了专门的计算管线,让我们可以用更加直接,易读的代码调度GPU来执行计算任务。接下来用一个简单的例子(调整图片的饱和度)来一起学习一下,如何使用Metal做计算。
- MTLDevice
- MTLCommandQueue
- MTLCommandBuffer
- MTLCommandEncoder
- MTLCommand
- MTLComputePipelineState & MTLLibrary & MTLFunction
guard let device = MTLCreateSystemDefaultDevice() else {
return nil
guard let commandQueue = device.makeCommandQueue() else {
return nil
guard let library = device.makeDefaultLibrary() else {
return nil
guard let kernelFunction = library.makeFunction(name: "adjust_saturation") else {
return nil
let computePipelineState: MTLComputePipelineState
do {
computePipelineState = try device.makeComputePipelineState(function: kernelFunction)
} catch let _ {
return nil
在创建MTLFunction实例的时用到的 adjust_saturation 是定义在.metal文件中的shader方法,方法内容如下:
kernel void adjust_saturation(texture2d<float, access::read> inTexture[[texture(0)]],
texture2d<float, access::write> outTexture[[texture(1)]],
constant float* saturation [[buffer(0)]],
uint2 gid [[thread_position_in_grid]]) {
float4 inColor = inTexture.read(gid);
float value = dot(inColor.rgb, float3(0.299, 0.587, 0.114));
float4 grayColor(value, value, value, 1.0);
float4 outColor = mix(grayColor, inColor, *saturation);
outTexture.write(outColor, gid);
这个方法的参数有两张texture(一张用来做输入,另外一张做输出),一个float类型的参数,作为饱和度计算参数以及标记为 [[thread_position_in_grid]]的gid参数,暂时认为gid标记了本次计算在整个计算任务当中的id。
// prepare input texture
let cmImage = cmImageFromUIImage(uiImage: image) // 自定义方法,从UIImage对象加载图片数据
let textureDescriptor = MTLTextureDescriptor()
textureDescriptor.width = cmImage.width
textureDescriptor.height = cmImage.height
textureDescriptor.pixelFormat = MTLPixelFormat.bgra8Unorm
textureDescriptor.usage = .shaderRead
let inTexture = device.makeTexture(descriptor: textureDescriptor)!
let region = MTLRegion(origin: MTLOrigin(x: 0, y: 0, z: 0), size: MTLSize(width: cmImage.width, height: cmImage.height, depth: 1))
inTexture.replace(region: region, mipmapLevel: 0, withBytes: NSData(data: cmImage.data!).bytes, bytesPerRow: cmImage.width * 4)
// prepare output texture
let outTextureDescriptor = MTLTextureDescriptor()
outTextureDescriptor.width = cmImage.width
outTextureDescriptor.height = cmImage.height
outTextureDescriptor.pixelFormat = MTLPixelFormat.bgra8Unorm
outTextureDescriptor.usage = MTLTextureUsage.shaderWrite
let outTexture = device.makeTexture(descriptor: outTextureDescriptor)!
guard let commandBuffer = commandQueue.makeCommandBuffer() else {
return nil
guard let commandEncorder = commandBuffer.makeComputeCommandEncoder() else {
return nil
commandEncorder.setTexture(inTexture, index: 0)
commandEncorder.setTexture(outTexture, index: 1)
var saturation: float_t = 0.1
commandEncorder.setBytes(&saturation, length: MemoryLayout<float_t>.size, index: 0)
let width = cmImage.width
let height = cmImage.height
let groupSize = 16
let groupCountWidth = (width + groupSize) / groupSize - 1
let groupCountHeight = (height + groupSize) / groupSize - 1
commandEncorder.dispatchThreadgroups(MTLSize(width: groupCountWidth, height: groupCountHeight, depth: 1), threadsPerThreadgroup: MTLSize(width: groupSize, height: groupSize, depth: 1))
然后创建CommandBuffer和CommandEncoder对象,用CommandEncoder对象配置计算管线,配置kernel方法的输入(inTexture, outTexture, saturation 等)。
最后通过dispatchThreadgroups方法,将计算任务分发到GPU。 这里引入了Metal Compute中的另外的三个概念:
- thread
- thread group
- grid size
首先,关于grid size
A compute pass must specify the number of times to execute a kernel function. This number corresponds to the grid size, which is defined in terms of threads and threadgroups.
即,grid size定义了一次GPU的compute pass里,shader方法需要执行的总次数。grid size使用MTLSize数据结构来定义,包含三个分量,在本例当中,grid size为(imageWidth, imageHeight, 1)。同时,根据文档的描述,我们不会直接去设置grid size,而是通过设置thread group size和thread group counts的方式来间接设置grid size。
关于 thread group size / thread group count
A threadgroup is a 3D group of threads that are executed concurrently by a kernel function.
thread group size定义了一次有多少计算被并行执行。thread group size的最大值和GPU硬件有关,在本例当中我们使用(16, 16,1),即一次有256个计算任务被并行执行。 根据图片的分辨,我们可以计算得到thread group count。
// create image from out texture
let imageBytes = UnsafeMutablePointer<UInt8>.allocate(capacity: cmImage.width * cmImage.height * 4)
outTexture.getBytes(imageBytes, bytesPerRow: cmImage.width * 4, from: region, mipmapLevel: 0)
let context = CGContext(data: imageBytes, width: cmImage.width, height: cmImage.height, bitsPerComponent: 8, bytesPerRow: cmImage.width * 4, space: CGColorSpaceCreateDeviceRGB(), bitmapInfo: CGImageAlphaInfo.premultipliedLast.rawValue)!
let cgImage = context.makeImage()!
return UIImage(cgImage: cgImage, scale: 1.0, orientation: UIImageOrientation.downMirrored)
UIImage --> MTLTexture
class CMImage: NSObject {
var width: Int = 0
var height: Int = 0
var data: Data?
func cmImageFromUIImage(uiImage: UIImage) -> CMImage {
let image = CMImage()
image.width = Int(uiImage.size.width)
image.height = Int(uiImage.size.height)
let bytes = UnsafeMutablePointer<UInt8>.allocate(capacity: image.width * image.height * 4)
let context = CGContext(data: bytes, width: image.width, height: image.height, bitsPerComponent: 8, bytesPerRow: image.width * 4, space: CGColorSpaceCreateDeviceRGB(), bitmapInfo: CGImageAlphaInfo.premultipliedLast.rawValue)
context?.translateBy(x: 0, y: uiImage.size.height)
context?.scaleBy(x: 1, y: -1)
context?.draw(uiImage.cgImage!, in: CGRect(x: 0, y: 0, width: uiImage.size.width, height: uiImage.size.height))
image.data = Data(bytes: bytes, count: image.width * image.height * 4)
return image
为了图方便,在本例中,将Init Phase和Compute Pass相关的代码都塞入了一个方法当中, 但根据苹果的最佳实践文档,Device, Library,CommandQueue,ComputePipeline等对象应当仅在App的初始化过程中创建一次,而不是每次执行计算都重复创建。
以上仅能算作Metal计算方面的Hello World,后面还有很多的内容值得我们去深入学习,感兴趣的朋友们一起加油吧!