其实我觉得udacity的项目安排顺序很好。因为我在学习这个之前从来没有学过opencv之类的东西的。 我是通过这个项目了解到Opencv到底是怎么用的。再说一些有趣的东西就是,这个项目都是通过python做的,而我在上这个课之前,都不知道python是什么东西。所以刚开始学的时候,无比的痛苦。python语法,append都不会用。list,tuple,class 统统不会用。可想而知,多么的艰难。。
有一些和我一样刚开始用python语言的人会有一种疑问, 干嘛非得用jupyter。这里我想说,因为jupyter可以一段一段查看代码,这个功能很好(jupyter 的很多特点中的一个)。当我写了一堆代码后,发现代码中有错误,但是我又不知道这个错误的源头在哪。生气。那么只有利用jupyter从头开始,一点一点,一个个cell分开执行。直到找到错误,直到修改好所有的errors。
1 代码传送门
2 代码环境
我是通过下载anaconda的jupyter notebook来做的。具体步骤请参考网上的安装资料。(之后可能会更新一下如何安装tensorflow和opencv2,安装过程其实我认为是比较痛苦的)
3 project 目的
4 涉及到的知识点
cv2.inRange() #for color selectioncv2.fillPoly() #for regions selectioncv2.line()#to draw lines on an image given endpointscv2.addWeighted()#to coadd / overlay two imagescv2.cvtColor()to grayscale or change colorcv2.imwrite()to output images to filecv2.bitwise_and()#to apply a mask to an image
ROI(region of interest)
hough transfrom(检测直线算法)
理解x<threshold && x> threshold 这种代码的意义
虽然Udacity给出的jupyter notebook的template也涉及到HTML库,但是我们只需要知道简单的用法就可以了,不用太过在意
5 代码解析
5.1 import 库
首先最开始的是,import 库。 库的import跟c语言的include是一个意思。就是把我们所需要的所有的函数包拿来备用。这里我们import了一下库。matplotlib,numpy其实都很好安装及import,但是吧这个cv2真得是特别难装。。。。尤其是以后基于tensorlfow GPU版本安装cv2更费劲。。又跑题了。
#importing some useful packages
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import cv2
%matplotlib inline
这里%matplotlib inline是一种jupyter notebook的特别的用法。叫magic mthods。这是干啥的呢? matplot 他本身默认是不会在jupyter notebook代码cell之间打开plot的。所以%matplotlib inline 就是命令matplot在cell之间打开plot。
import 。。。as 。。 就是把特定函数包单独按照我们想要的简称命名的。
5.2 读取图片
为什么项目目的是输入视频,但是我们读取的却是图片呢? 这里需要解释一下,所有的视频都是连续的图片。FPS是指 frame per second ,也就是说一分钟播放几个图片。如果FPS30的话,就是一分钟播放30个图片。所以,处理图片和处理视频基本上是一个事情。其实就是我们的算法1秒钟处理30个图片,然后通过其他代码把这个再挨个播放或者合成成一个视频就好了。
这里用到了mpimg。值得注意的是,mpimg可以读取图片,cv2也可以的。但是他们读取图片之后的数据存储序列是不一样的。mpimg读取的是,RGB顺序的数据。而cv2读取的是BGR顺序的数据。数据本身并没有什么变化,除了顺序。RGB指的是 red,green,blue。 那么BGR指的是Blue, Green,Red。一般这三种数据类型称为三个channel。为啥是RGB这三种颜色? 因为他们是三原色。他们三个通过不同的组合,得到所有的颜色。
#reading in an image
image = mpimg.imread('test_images/solidWhiteRight.jpg')
#printing out some stats and plotting
print('This image is:', type(image), 'with dimensions:', image.shape)
plt.imshow(image) #call as plt.imshow(gray, cmap='gray') to show a grayscaled image

5.4 构建helper functions
什么是helper functions?
canny operator 用于边缘检测 。关键词:边缘检测
region of interest 简称ROI ,顾名思义就是我们只考虑及只计算我们关心范围内的东西。关键词:关心领域
draw lines 就是给定两个点的坐标(x1,x2,y1,y2)在图片上画出一条线。关键词:画线
weighted img 的作用就是为了可视化。我们最终算出来的车道线,所以要把标记好的车道线覆盖到原理的图片上。关键词:覆盖
import math
def grayscale(img):
"""Applies the Grayscale transform
This will return an image with only one color channel
but NOTE: to see the returned image as grayscale
you should call plt.imshow(gray, cmap='gray')"""
return cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
# Or use BGR2GRAY if you read an image with cv2.imread()
# return cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
def canny(img, low_threshold, high_threshold):
"""Applies the Canny transform"""
return cv2.Canny(img, low_threshold, high_threshold)
def gaussian_blur(img, kernel_size):
"""Applies a Gaussian Noise kernel"""
return cv2.GaussianBlur(img, (kernel_size, kernel_size), 0)
def region_of_interest(img, vertices):
Applies an image mask.
Only keeps the region of the image defined by the polygon
formed from `vertices`. The rest of the image is set to black.
#defining a blank mask to start with
mask = np.zeros_like(img)
#defining a 3 channel or 1 channel color to fill the mask with depending on the input image
if len(img.shape) > 2:
channel_count = img.shape[2] # i.e. 3 or 4 depending on your image
ignore_mask_color = (255,) * channel_count
ignore_mask_color = 255
#filling pixels inside the polygon defined by "vertices" with the fill color
cv2.fillPoly(mask, vertices, ignore_mask_color)
#returning the image only where mask pixels are nonzero
masked_image = cv2.bitwise_and(img, mask)
return masked_image
def draw_lines(img, lines, color=[255, 0, 0], thickness=10):
NOTE: this is the function you might want to use as a starting point once you want to
average/extrapolate the line segments you detect to map out the full
extent of the lane (going from the result shown in raw-lines-example.mp4
to that shown in P1_example.mp4).
Think about things like separating line segments by their
slope ((y2-y1)/(x2-x1)) to decide which segments are part of the left
line vs. the right line. Then, you can average the position of each of
the lines and extrapolate to the top and bottom of the lane.
This function draws `lines` with `color` and `thickness`.
Lines are drawn on the image inplace (mutates the image).
If you want to make the lines semi-transparent, think about combining
this function with the weighted_img() function below
for line in lines:
for x1,y1,x2,y2 in line:
cv2.line(img, (x1, y1), (x2, y2), color, thickness)
def hough_lines(img, rho, theta, threshold, min_line_len, max_line_gap):
`img` should be the output of a Canny transform.
Returns an image with hough lines drawn.
lines = cv2.HoughLinesP(img, rho, theta, threshold, np.array([]), minLineLength=min_line_len, maxLineGap=max_line_gap)
line_img = np.zeros((*img.shape, 3), dtype=np.uint8)
draw_lines(line_img, lines)
return line_img
# Python 3 has support for cool math symbols.
def weighted_img(img, initial_img, α=0.8, β=1., λ=0.):
`img` is the output of the hough_lines(), An image with lines drawn on it.
Should be a blank image (all black) with lines drawn on it.
`initial_img` should be the image before any processing.
The result image is computed as follows:
initial_img * α + img * β + λ
NOTE: initial_img and img must be the same shape!
return cv2.addWeighted(initial_img, α, img, β, λ)
#Removing noise slopes from the averaging performed below in lane_lines
def remove_noise(slopes, m = 2):
mean_value = np.mean(slopes)
stand_deviation = np.std(slopes)
for slope in slopes:
if abs(slope - mean_value) > (m * stand_deviation):
return slopes
5.5 主程序
下面就是主程序process image(输入)。
def process_image(img):
#find the size of image
xsize,ysize = [image.shape[1],image.shape[0]]
#copy the image to modify
origin_image = np.copy(img)
#make a gray image
gray = grayscale(img)
#Smooth with guassian blur
kernel_size = 9
blur_gray = gaussian_blur(gray, kernel_size)
#Use canny operator to extract edeges
low_threshold = 90
high_threshold = 180
edges = canny(blur_gray, 90,180)
#define the region of the interest
imshape = image.shape
vertices = np.array([[(0,imshape[0]),(470, 320), (550, 320), (imshape[1],imshape[0])]], dtype=np.int32)
masked_edges = region_of_interest(edges, vertices)
# Define the Hough transform parameters
# Make a blank the same size as our image to draw on
rho = 6 # distance resolution in pixels of the Hough grid
theta = np.pi/180 # angular resolution in radians of the Hough grid
threshold = 50 # minimum number of votes (intersections in Hough grid cell)
min_line_len = 25 #minimum number of pixels making up a line
max_line_gap = 25 # maximum gap in pixels between connectable line segments
line_image = np.copy(img)*0 # creating a blank to draw lines on
# Run Hough on edge detected image
# Output "lines" is an array containing endpoints [x1,y1,x2,y2] of detected line segments
lines = cv2.HoughLinesP(masked_edges, rho, theta, threshold, np.array([]), min_line_len, max_line_gap)
#Make lists of the lines and slopes for averaging
left_lines = []
left_slopes = []
right_slopes = []
right_lines = []
for line in lines:
for x1,y1,x2,y2 in line:
slope = (y2 - y1) / (x2-x1)
if slope < 0:
#Average line positions,zip function can generate the all the column elements in a list . * stands for unpacked lists
mean_left_pos = [sum(column)/len(column) for column in zip(*left_lines)]
mean_right_pos = [sum(column)/len(column) for column in zip(*right_lines)]
#Remove slope outliers, and take the average
mean_left_slope = np.mean(remove_noise(left_slopes))
mean_right_slope = np.mean(remove_noise(right_slopes))
#Extrapolate to our mask boundaries - up to 325, down to 539
#Exrapoplate the left line, right line to boundary up to y_top = 320, down to y_bottom = 540
mean_left_line = []
mean_right_line = []
for x1,y1,x2,y2 in mean_left_pos:
x = int(np.mean([x1, x2])) #Midpoint x
y = int(np.mean([y1, y2])) #Midpoint y
slope = mean_left_slope
#base on y = mx + b calculate the b = y-mx
b = y -(slope * x) #Solving y=mx+b for b
mean_left_line = [int((320-b) / slope), 320, int((540-b)/slope), 540]
for x1,y1,x2,y2 in mean_right_pos:
x = int(np.mean([x1, x2]))
y = int(np.mean([y1, y2]))
slope = mean_right_slope
b = y - (slope * x)
mean_right_line = [int((320-b)/slope), 320, int((540 - b)/slope), 540]
#The final lines of the lane
lines = [[mean_left_line], [mean_right_line]]
#Draw the lines to the line_image
draw_lines(line_image, lines)
# Transparent the processed lines image to original
weighted_image = weighted_img(line_image, img)
#return the weighted_image to the fucntion process_image
return weighted_image
xsize,ysize = [image.shape[1],image.shape[0]]
下面这行代码利用我们在helper function里面定义的函数,将彩色图变成灰度图。
#make a gray image
gray = grayscale(img)
也是利用helper function的函数,给像素点添加噪声。其实可以想象一下,添加噪声的图片会变得怎么样? (变得模糊)这里kernal_size就是人为设定的值。我是凭感觉设定的。
#Smooth with guassian blur
kernel_size = 9
blur_gray = gaussian_blur(gray, kernel_size)
下面是利用canny operator提取边缘的。为什么要检测边缘? 因为理解物体的边缘是我们识别物体的最基本的方法。计算机视觉也是一样的。canny 其实就是利用特定的operator,也就是一种3*3的矩阵,通过卷积对图片上的每一个点进行计算。计算后值如果在我们定义的low_threshold和high_threshold之间, 那么我们就认为他是有效的边缘点。可以用来识别物体。所以low_threshold和high_threshold也是认为调的。调的效果好,那么就是好的。
#Use canny operator to extract edeges
low_threshold = 90
high_threshold = 180
edges = canny(blur_gray, 90,180)


#define the region of the interest
imshape = image.shape
vertices = np.array([[(0,imshape[0]),(470, 320), (550, 320), (imshape[1],imshape[0])]], dtype=np.int32)
masked_edges = region_of_interest(edges, vertices)
下面是hough 变换的代码。边缘点,ROI已经定义好了,那么我们就要在图片上找找车道了。车道是直的,人类一眼就能看出来。但是想没想过人类是如何判断直的呢?计算机视觉的算法又应该怎么落实呢?人类是通过透视和车道大部分是直的这种假设来判断的。那对与计算机视觉也是一样。计算机需要找到图片里有一定规律的点,然后把他们都标记出来。how? 所有有类似的斜率且像素间的距离不大的两点,认为其是直线。hough transfrom就是做这个事情的。 xy坐标系里的直线在hough space里,可以用一个点来表示。如下面的图片。其实就是把y=mx+b用斜率和截距来表示。那么在hough space里面,聚集在一定范围内的点们就是一条直线。这个一定范围就是用rho来表示,theta就是指hough space里点构成的直线的斜率。有点说不明白,建议在网上找个动图或者看看这个链接第三章 霍夫变换(Hough Transform)(作者看到了如果觉得不妥可以告诉我)
# Define the Hough transform parameters
# Make a blank the same size as our image to draw on
rho = 6 # distance resolution in pixels of the Hough grid
theta = np.pi/180 # angular resolution in radians of the Hough grid
threshold = 50 # minimum number of votes (intersections in Hough grid cell)
min_line_len = 25 #minimum number of pixels making up a line
max_line_gap = 25 # maximum gap in pixels between connectable line segments
line_image = np.copy(img)*0 # creating a blank to draw lines on
# Run Hough on edge detected image
# Output "lines" is an array containing endpoints [x1,y1,x2,y2] of detected line segments
lines = cv2.HoughLinesP(masked_edges, rho, theta, threshold, np.array([]), min_line_len, max_line_gap)


左边的点对应右边哪个? 答案是A

对应的是哪个? 答案是C

剩下的就简单了。因为有ROI我们只会看到车道里面的直线。那么有很多小的直线,但是都是不连续怎么办? 我们先通过挨个定义左右两边的直线们来计算出连续的直线。
#Make lists of the lines and slopes for averaging
left_lines = []
left_slopes = []
right_slopes = []
right_lines = []
for line in lines:
for x1,y1,x2,y2 in line:
slope = (y2 - y1) / (x2-x1)
if slope < 0:
#Average line positions,zip function can generate the all the column elements in a list . * stands for unpacked lists
mean_left_pos = [sum(column)/len(column) for column in zip(*left_lines)]
mean_right_pos = [sum(column)/len(column) for column in zip(*right_lines)]
#Remove slope outliers, and take the average
mean_left_slope = np.mean(remove_noise(left_slopes))
mean_right_slope = np.mean(remove_noise(right_slopes))
#Extrapolate to our mask boundaries - up to 325, down to 539
#Exrapoplate the left line, right line to boundary up to y_top = 320, down to y_bottom = 540
mean_left_line = []
mean_right_line = []
for x1,y1,x2,y2 in mean_left_pos:
x = int(np.mean([x1, x2])) #Midpoint x
y = int(np.mean([y1, y2])) #Midpoint y
slope = mean_left_slope
#base on y = mx + b calculate the b = y-mx
b = y -(slope * x) #Solving y=mx+b for b
mean_left_line = [int((320-b) / slope), 320, int((540-b)/slope), 540]
for x1,y1,x2,y2 in mean_right_pos:
x = int(np.mean([x1, x2]))
y = int(np.mean([y1, y2]))
slope = mean_right_slope
b = y - (slope * x)
mean_right_line = [int((320-b)/slope), 320, int((540 - b)/slope), 540]
#The final lines of the lane
lines = [[mean_left_line], [mean_right_line]]
#Draw the lines to the line_image
draw_lines(line_image, lines)
# Transparent the processed lines image to original
weighted_image = weighted_img(line_image, img)
#return the weighted_image to the fucntion process_image
6 结果

7 总结