The Math and Code Behind Aligning Faces

by Sabbir Ahmed


Posted on: 2 years, 10 months ago


The figure shows the input unaligned image and output aligned image, video source: Silicon Valley
The figure shows the input unaligned image and output aligned image, video source: Silicon Valley

Facial alignment is a prerequisite to many machine learning and deep learning applications. We will be solving them with OpenCV and dlib (python binding). So let’s get started.

First Things First

Before moving forward, I would like to show what we will be doing in this post.

Real-Time Face alignment (pun intended)

The source code will be available in the git linked below. I will be discussing the most important chunk of codes.

by-sabbir/face_alignment

Shape Prediction

For aligning faces, the first step is predicting shape. We can assume there will be a straight line connecting our eyes. Our goal is to get the slope of the straight line. Now let’s see how we are going to arrange our workspace directory:

├── aligned_photos
├── files
│ ├── get_predictor_files.sh
│ ├── shape_predictor_5_face_landmarks.dat
├── main.py

For now, we need to focus on files directory, from dlib.net/files we need to download 5 points facial landmarks shape predictor. This data set gives us two points co-ordinate of the right eye, two points co-ordinate of the left eye and nose position prediction. Let’s take a look at the initial code for face detector and shape predictor:


# use models/downloader.sh
PREDICTOR_PATH = os.path.join("models", "shape_predictor_68_face_landmarks.dat")

if not os.path.isfile(PREDICTOR_PATH):
    print("[ERROR] USE models/downloader.sh to download the predictor")
    sys.exit()

detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor(PREDICTOR_PATH)

Now that our shape predictor and face detector are set up we need to fetch information for the modules. Here comes OpenCV, it can fetch data from various sources. Details about OpenCV is beyond the scope. We will stick to what is necessary.


import cv2 as cv

source = 0  # replace 0 with any file with absolute path to analyze from video source
cap = cv.VideoCapture(source)

while True:
    ret, frame = cap.read()
    # skipping an empty frame
    if not ret:
        continue
    # press 'esc' to exit the program
    if cv.waitKey(10) & 0xff == 27:
        break
# cleaning up
cap.release()
cv.destroyAllWindows()

We created “cap” instance of OpenCV’s VideoCapture() class. It will allow us to read images as a NumPy array. And “source” can be defined as still images or video or camera.

Now we will first detect faces with the help of dlib’s frontal face detector. This will help us choose region of interest and predicting shape.


#optionally you can draw roi box by calling draw_rect() helper function
def draw_rect(image, rect):
    x, y = rect.tl_corner().x, rect.tl_corner().y
    w, h = rect.width(), rect.height()
    cv.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)

    
# summing up with detecting face and drawing rectangle
faces = face_detector(frame, 0)
for face in faces:
    draw_rect(frame, face)

This will return a list, which will contain all the faces the algorithm detected and their co-ordinates(bounding boxes). With this information now we can easily predict shape. Here’s how we will do it.


# will continue from the previous dlib_face_detector.py line 10
for face in faces:
    draw_rect(frame, face)
    # 5 points shape predictor spits out four cordinates of eyes and one for nose
    points = shape_predictor(frame, face)

Passing the image and the face co-ordinates parameters, we can predict the shape with just one line of code. I have written some helper functions for clarity purposes. From the helper functions, we can get the straight line connecting the eyes.

We end this section here. Till now we intuitively have lines connecting eyes and nose.


Math for Alignment

It’s Time for us to get our hands dirty with some math! Not to worry the math is simple. We all know from high school if a line lies between two points (x1, y1) and (x2, y2) the linear equation for the line will be,

Image for post

Fig-1: Equation of a line given two points

From the line formula, we can easily derive the angular distance of the line from the x-axis

Image for post

Fig-2: Angle from the gradient, m

We’ve got what is needed the most! The angle! Now we simply rotate the image by this angle then we are good to go. To rotate the image which is typically a 3-dimensional matrix, we have to transform the matrix with a rotation matrix, we may have to shear the matrix a bit. But nothing serious. we will walk through this like a summer garden.
If we want to rotate a matrix by its origin (0, 0), the rotation matrix is given by:

Image for post

Fig-3: Rotation Matrix

But the problem is the probability of finding a face right at the origin is pretty close to zero. So we have to generalize the whole thing for an arbitrary point.
If we just rotate the matrix and translate the matrix to the arbitrary point, we get what we want. simple as it seems. In one line we will rotate then translate. In action, the matrix looks like this:

Image for post

Fig-4: Final Transformation Matrix

We don’t have to worry about the computation Numpy will do that for us. We have to get the simplest form of the transformation matrix, which is:

Image for post

Fig-5: New coordinates x, y for face alignment

What we have done in this section, so far, is called “Geometric Transformation”.


The Final Touch — Code

I will show you how to do image alignment in TWO LINES! I will assume you cloned the git repo. So far we had co-ordinates of eyes and nose and the angle of the line connecting eyes. We have enough information to do the rest.

center, angle = get_angle(frame, points, type=5)
warped_img = align_img(unaligned, center, angle)

The final code snippet may look a bit large. But this is as it is, simple as two-liner code.


# helper function to compute angle and center of the line connecting eyes.
def get_angle(image, shape, type=5):
    if type == 5:
        shapes = np.array(shape.parts())
        left_eye = ((shapes[0].x + shapes[1].x) // 2, (shapes[0].y + shapes[1].y) // 2)
        right_eye = ((shapes[2].x + shapes[3].x) // 2, (shapes[2].y + shapes[3].y) // 2)
        nose = (shapes[4].x, shapes[4].y)
        eye_center = ((left_eye[0] + right_eye[0]) // 2, (left_eye[1] + right_eye[1]) // 2)
        cv.circle(image, left_eye, 2, COLOR["white"], -1)
        cv.circle(image, right_eye, 2, COLOR["white"], -1)
        cv.circle(image, nose, 2, COLOR["white"], -1)
        cv.circle(image, eye_center, 5, COLOR["white"], -1)
        
        cv.line(image, left_eye, right_eye, COLOR["blue"], 2)
        cv.line(image, left_eye, nose, COLOR["blue"], 2)
        cv.line(image, right_eye, nose, COLOR["blue"], 2)

        # computing angle
        dx = left_eye[0] - right_eye[0]
        dy = left_eye[1] - right_eye[1]
        angle = np.degrees(np.arctan2(dy, dx))
        
        cv.putText(image, str(int(abs(angle))) + " deg", (eye_center[0], eye_center[1] - 20), cv.FONT_HERSHEY_SIMPLEX, 0.6, COLOR["white"], 1)
    return eye_center, angle

def align_img(image, center, angle):
    height, width = image.shape[0], image.shape[1]
    M = cv.getRotationMatrix2D(center, angle, 1)
    return cv.warpAffine(image, M, (width, height))


# for clearity we will copy&paste the whole code from shape_predictor.py
for face in faces:
    draw_rect(frame, face)
    # 5 points shape predictor spits out four cordinates of eyes and one for nose
    points = shape_predictor(frame, face)
    center, angle = get_angle(frame, points, type=5)
    # final image by rotating the image we get the warped image
    warped_img = align_img(unaligned, center, angle)

Gists will be found here

The get_angle function spits out the new center(x, y) and the angle we need to rotate the image. Then we feed the information to align _img function which returns the warped image. This can be used to do whatever preprocessing or data engineering you need to do.