美文网首页机器学习
Swift-Vision图像识别框架

Swift-Vision图像识别框架

作者: 请叫我小陈陈 | 来源:发表于2019-10-12 09:40 被阅读0次

    背景

    最近公司的iOS技术分享知识都多多少少和Vision有点关系,所以打算来学习一下。简单的了解之后,发现Vision是一个功能很强大的框架。

    Vision应用场景

    • 人脸检测
    • 图像对比分析
    • 二维码/条形码检测
    • 文字检测
    • 目标跟踪

    说到识别检测,除了Vision框架之外,Apple还提供了另外两个框架可以实现:

    对此苹果还提供了一份性能对比图:

    Vision VS CoreImage VS AVFoundation
    从苹果给的对比图可知相对于已有框架Core Image和AVFoundation,Vision的准确度是最好的,同时和Core Image支持平台数量是一样的,但是需要较多的处理时间及电源消耗。Vision又是苹果封装过的库,相较于Core Image这种底层库,API会友好太多,减少了我们的开发量。

    Vision体系结构中的重要成员简介

    1.RequestHandler

    • VNImageBasedRequest

    An object that processes one or more image analysis requests pertaining to a single image.
    处理与单个图像有关的一个或多个图像分析请求的对象

    • VNSequenceRequestHandler

    An object that processes image analysis requests for each frame in a sequence.
    处理序列中每个帧的图像分析请求的对象。

    2. VNRequest

    • VNImageBasedRequest

    The abstract superclass for image analysis requests that focus on a specific part of an image.
    用于图像分析的抽象超类请求关注图像的特定部分。

    VNImageBasedRequest.png

    3.VNObservation

    The abstract superclass for analysis results.
    分析结果的抽象超类。

    VNObservation.png

    Vision的使用流程

    1、给我们需求的Request提供相应的RequestHandler
    2、RequestHandler需要持有需要识别的图片信息,并将结果分发给每个Request的completionHandler中
    3、可以从results属性中得到Observation数组
    4、Observation数组中的内容会根据不同的Request返回不同的Observation
    5、每个Observation有boundingBox等属性,存储的是识别到的相应特征的坐标
    6、我们拿到坐标之后就可以为所欲为了

    大致用图片整理表示为:


    Vision.png

    假如现在有个需求需要对一张图片我们需要做标记出里面的人脸、矩形、二维码、文字官方demo
    VNImageRequestHandlerVNSequenceRequestHandler提供的识别方法可传入一个[VNRequest]

    //VNImageRequestHandler
    public init(cvPixelBuffer pixelBuffer: CVPixelBuffer, options: [VNImageOption : Any] = [:])
    public init(cvPixelBuffer pixelBuffer: CVPixelBuffer, orientation: CGImagePropertyOrientation, options: [VNImageOption : Any] = [:])
    public init(cgImage image: CGImage, options: [VNImageOption : Any] = [:])
    public init(cgImage image: CGImage, orientation: CGImagePropertyOrientation, options: [VNImageOption : Any] = [:])
    public init(ciImage image: CIImage, options: [VNImageOption : Any] = [:])
    public init(ciImage image: CIImage, orientation: CGImagePropertyOrientation, options: [VNImageOption : Any] = [:])
    public init(url imageURL: URL, options: [VNImageOption : Any] = [:])
    public init(url imageURL: URL, orientation: CGImagePropertyOrientation, options: [VNImageOption : Any] = [:])
    public init(data imageData: Data, options: [VNImageOption : Any] = [:])
    public init(data imageData: Data, orientation: CGImagePropertyOrientation, options: [VNImageOption : Any] = [:])
    open func perform(_ requests: [VNRequest]) throws
    
    //VNSequenceRequestHandler
    open func perform(_ requests: [VNRequest], on pixelBuffer: CVPixelBuffer) throws
    open func perform(_ requests: [VNRequest], on pixelBuffer: CVPixelBuffer, orientation: CGImagePropertyOrientation) throws
    open func perform(_ requests: [VNRequest], on image: CGImage) throws
    open func perform(_ requests: [VNRequest], on image: CGImage, orientation: CGImagePropertyOrientation) throws
    open func perform(_ requests: [VNRequest], on image: CIImage) throws
    open func perform(_ requests: [VNRequest], on image: CIImage, orientation: CGImagePropertyOrientation) throws
    open func perform(_ requests: [VNRequest], onImageURL imageURL: URL) throws
    open func perform(_ requests: [VNRequest], onImageURL imageURL: URL, orientation: CGImagePropertyOrientation) throws
    open func perform(_ requests: [VNRequest], onImageData imageData: Data) throws
    open func perform(_ requests: [VNRequest], onImageData imageData: Data, orientation: CGImagePropertyOrientation) throws
    

    所以我们在查询Vision之前创建所有请求,将他们捆绑在请求数组中,并在一次调用中提交该数组.Vision运行每个请求并在其自己的线程上执行其完成处理程序。CoreImage的话需要初始化4个CIDetector分别将识别类型设置为:CIDetectorTypeFace、CIDetectorTypeRectangle、CIDetectorTypeQRCode、CIDetectorTypeText.相当于对一张图片会进行4次计算.比较浪费资源.

    从上面的也可以看出Vision的识别具有方向性.如果识别的方法和图片的方向不一致,Vision可能就无法正确检测出我们想要的特征.

    另外查看以上两个类的初始化方法,可知Vision支持的初始化图片数据类型:

    • CVPixelBuffer
    • CGImage
    • CIImage
    • URL
    • Data

    关于人脸检测,CoreImage可识别的有:
    详情可看

    open class CIFaceFeature : CIFeature {
        open var bounds: CGRect { get }
        open var hasLeftEyePosition: Bool { get }
        open var leftEyePosition: CGPoint { get }
        open var hasRightEyePosition: Bool { get }
        open var rightEyePosition: CGPoint { get }
        open var hasMouthPosition: Bool { get }
        open var mouthPosition: CGPoint { get }
        open var hasTrackingID: Bool { get }
        open var trackingID: Int32 { get }
        open var hasTrackingFrameCount: Bool { get }
        open var trackingFrameCount: Int32 { get }
        open var hasFaceAngle: Bool { get }
        open var faceAngle: Float { get }
        open var hasSmile: Bool { get }
        open var leftEyeClosed: Bool { get }
        open var rightEyeClosed: Bool { get }
    }
    

    Vision可识别的有:

    open var boundingBox: CGRect { get }
    open var landmarks: VNFaceLandmarks2D? { get }
    open var roll: NSNumber? { get }
    open var yaw: NSNumber? { get }
    
    open class VNFaceLandmarks2D : VNFaceLandmarks {
        open var allPoints: VNFaceLandmarkRegion2D? { get }
        open var faceContour: VNFaceLandmarkRegion2D? { get }//从左脸颊到下巴到右脸颊的面部轮廓的点的区域
        open var leftEye: VNFaceLandmarkRegion2D? { get }//左眼轮廓
        open var rightEye: VNFaceLandmarkRegion2D? { get }//右眼轮廓
        open var leftEyebrow: VNFaceLandmarkRegion2D? { get }//左边眉毛的轮廓
        open var rightEyebrow: VNFaceLandmarkRegion2D? { get }//右边眉毛的轮廓
        open var nose: VNFaceLandmarkRegion2D? { get }//鼻子的轮廓
        open var noseCrest: VNFaceLandmarkRegion2D? { get }//鼻子中央嵴痕迹的点的区域。
        open var medianLine: VNFaceLandmarkRegion2D? { get }//脸中心线轨迹的点的区域
        open var outerLips: VNFaceLandmarkRegion2D? { get }//外嘴唇的轮廓
        open var innerLips: VNFaceLandmarkRegion2D? { get }//内嘴唇的轮廓
        open var leftPupil: VNFaceLandmarkRegion2D? { get }//左边瞳孔的轮廓
        open var rightPupil: VNFaceLandmarkRegion2D? { get }//右边瞳孔的轮廓
    }
    open class VNFaceLandmarkRegion2D : VNFaceLandmarkRegion {
        open var __normalizedPoints: UnsafePointer<CGPoint> { get } //某一部位所有的像素点
        open func __pointsInImage(imageSize: CGSize) -> UnsafePointer<CGPoint>//某一部位的所有像素点的个数
    }
    
    

    举例:
    识别图片,并绘制一个矩形去标示我们的人脸.

    lazy var faceDetectionRequest = VNDetectFaceRectanglesRequest(completionHandler: self.handleDetectedFaces)
    fileprivate func handleDetectedFaces(request: VNRequest?, error: Error?) {
            if let nsError = error as NSError? {
                self.presentAlert("Face Detection Error", error: nsError)
                return
            }
            // Perform drawing on the main thread.
            DispatchQueue.main.async {
                guard let drawLayer = self.pathLayer,
                    let results = request?.results as? [VNFaceObservation] else {
                        return
                }
                self.draw(faces: results, onImageWithBounds: drawLayer.bounds)
                drawLayer.setNeedsDisplay()
            }
        }
    

    识别图片,并绘制曲线去标示我们的人脸特征(脸的轮廓、左右眼、左右眉毛……)

    lazy var faceLandmarkRequest = VNDetectFaceLandmarksRequest(completionHandler: self.handleDetectedFaceLandmarks)
    fileprivate func handleDetectedFaceLandmarks(request: VNRequest?, error: Error?) {
            if let nsError = error as NSError? {
                self.presentAlert("Face Landmark Detection Error", error: nsError)
                return
            }
            // Perform drawing on the main thread.
            DispatchQueue.main.async {
                guard let drawLayer = self.pathLayer,
                    let results = request?.results as? [VNFaceObservation] else {
                        return
                }
                self.drawFeatures(onFaces: results, onImageWithBounds: drawLayer.bounds)
                drawLayer.setNeedsDisplay()
            }
        }
    
    此时此刻此景,上头姐妹应该附图一张:biu~~
    上头姐妹.jpg
    从上图可看出图片中的人脸轮廓以及下方的文字都被识别出来了.

    识别文字和二维码和这个类似就不贴了,只是需要注意的是:
    1、对于文本观察,通过检查属性来定位单个字符。characterBoxes

    // Tell Vision to report bounding box around each character.
    textDetectRequest.reportCharacterBoxes = true
    

    2、对于条形码观察,symbologies包含属性中的有效负载信息

    // Restrict detection to most common symbologies.
    barcodeDetectRequest.symbologies = [.QR, .Aztec, .UPCE]
    

    3、对于矩形观察,通过设置一些属性可以起到过滤检测结果的需求:

     // Customize & configure the request to detect only certain rectangles.
    rectDetectRequest.maximumObservations = 8 // Vision currently supports up to 16.
    rectDetectRequest.minimumConfidence = 0.6 // Be confident.
    rectDetectRequest.minimumAspectRatio = 0.3 // height / width
    

    4、对于地平线角度的观察,demo里面没有,但是也比较容易理解.

    lazy var horizonRequest = VNDetectHorizonRequest(completionHandler: self.handleDetectedHorizon)
    fileprivate func handleDetectedHorizon(request: VNRequest?, error: Error?) {
            if let nsError = error as NSError? {
                self.presentAlert("Horizon Detection Error", error: nsError)
                return
            }
    
            guard let results = request?.results as? [VNHorizonObservation] else {
                return
            }
            results.forEach({ observation in
                print(observation.angle)//观察到的地平线的角度。
                print(observation.transform)//变换应用于检测到的地平线。
            })
        }
    

    在创建了所有的请求之后,我们将其加入数组:

     /// - Tag: CreateRequests
        fileprivate func createVisionRequests() -> [VNRequest] {
            
            // Create an array to collect all desired requests.
            var requests: [VNRequest] = []
            
            // Create & include a request if and only if switch is ON.
            if self.rectSwitch.isOn {
                requests.append(self.rectangleDetectionRequest)
            }
            if self.faceSwitch.isOn {
                // Break rectangle & face landmark detection into 2 stages to have more fluid feedback in UI.
                requests.append(self.faceDetectionRequest)
                requests.append(self.faceLandmarkRequest)
            }
            if self.textSwitch.isOn {
                requests.append(self.textDetectionRequest)
            }
            if self.barcodeSwitch.isOn {
                requests.append(self.barcodeDetectionRequest)
            }
    
            requests.append(self.horizonRequest)
            // Return grouped requests as a single array.
            return requests
        }
    

    然后再调用perform去执行识别.需要注意的是,识别的图片的方法要和我们orientation的方向一致(demo代码中有做处理).其次因为识别的过程比较消耗资源,因此使用后台队列以避免在执行时阻塞主队列。
    然后回到上面我们写的completionHandler,我们拿到VNObservation之后会在主线程去刷新绘制我们的UI.

    /// - Tag: PerformRequests
        fileprivate func performVisionRequest(image: CGImage, orientation: CGImagePropertyOrientation) {
            
            // Fetch desired requests based on switch status.
            let requests = createVisionRequests()
            // Create a request handler.
            let imageRequestHandler = VNImageRequestHandler(cgImage: image,
                                                            orientation: orientation,
                                                            options: [:])
            
            // Send the requests to the request handler.
            DispatchQueue.global(qos: .userInitiated).async {
                do {
                    try imageRequestHandler.perform(requests)
                } catch let error as NSError {
                    print("Failed to perform image request: \(error)")
                    self.presentAlert("Image Request Failed", error: error)
                    return
                }
            }
        }
    

    上面的资料以及demo都是描述检测静止图像中的对象,vision还有个重要的功能就是可以实时去检测对象,比如:
    实时跟踪用户脸部demo链接
    大致实现过程:
    1、配置相机捕获视频
    2、识别视频中的人脸并检测出相应的特征
    有一点需要注意的是,VNImageRequestHandler可以检测静态的对象,但是他不能从一帧到下一帧去携带信息,所以对于实时的跟踪我们需要用到VNSequenceRequestHandler.对于request也是需要用到VNTrackObjectRequest

    open class VNTrackObjectRequest : VNTrackingRequest {
        //使用检测到的对象观察创建新的对象跟踪请求。
        public init(detectedObjectObservation observation: VNDetectedObjectObservation)
        public init(detectedObjectObservation observation: VNDetectedObjectObservation, completionHandler: VNRequestCompletionHandler? = nil)
    }
    

    整个环节,比较复杂的部分就是捕获视频之后的识别处理,我们在通过摄像头捕获视频之后,拿到sampleBuffer,如果检测器没有检测到面部则创建VNImageRequestHandler请求检测面部,一旦检测面部成功,则通过创建VNTrackObjectRequest去跟踪检测它.
    核心代码如下:

    guard let requests = self.trackingRequests, !requests.isEmpty else {
                //如果检测器没有检测到面部,创建VNImageRequestHandler请求检测面部
                let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer,
                                                                orientation: exifOrientation,
                                                                options: requestHandlerOptions)
                
                do {
                    guard let detectRequests = self.detectionRequests else {
                        return
                    }
                    try imageRequestHandler.perform(detectRequests)
                } catch let error as NSError {
                    NSLog("Failed to perform FaceRectangleRequest: %@", error)
                }
                return
            }
            //一旦检测面部成功,则通过创建VNTrackObjectRequest去跟踪检测它
            do {
                try self.sequenceRequestHandler.perform(requests,
                                                         on: pixelBuffer,
                                                         orientation: exifOrientation)
            } catch let error as NSError {
                NSLog("Failed to perform SequenceRequest: %@", error)
            }
            
            var newTrackingRequests = [VNTrackObjectRequest]()
            //...(此处省略一抹多的代码)
            do {
                try imageRequestHandler.perform(faceLandmarkRequests)
            } catch let error as NSError {
                NSLog("Failed to perform FaceLandmarkRequest: %@", error)
            }
    

    实时跟踪整个视频中的对象demo链接

    除此之外,Vision还可以和CoreML结合进行分类请求和标记图像等,总之是一个值得学习的框架.
    有兴趣的小伙伴可自行去下载demo来把玩一下.

    另外就是和Vision框架看起来会误会的还有一个框架VisionKit:使用iOS相机扫描在Notes应用程序中捕获的文档。目前还是beta版本.

    学习资料如下:
    Swift之Vision 图像识别框架
    iOS黑科技之(AVFoundation)动态人脸识别(二)
    基于iOS8以上版本的AV Foundation框架特性之--AVCaptureDevice

    相关文章

      网友评论

        本文标题:Swift-Vision图像识别框架

        本文链接:https://www.haomeiwen.com/subject/beifuctx.html