Feature Article

Sparse Coding for Flexible, Robust 3D Facial-Expression Synthesis Yuxu Lin and Mingli Song ■ Zhejiang University Dao Thi Phuong Quynh and Ying He ■ Nanyang Technological University Chun Chen ■ Zhejiang University

T

he last decade has witnessed rapid development of 3D facial-expression synthesis owing to the extensive requirements for highly realistic animation in movies and video games. Although 3D scanners enable the scanning of real human faces, capturing facial-expression sequences for large numbers of people is expensive and difficult. A proposed modeling On the other hand, synthesizframework applies sparse ing realistic facial expressions coding to synthesize 3D without high-quality 3D face expressive faces, using models is challenging and timespecified coefficients or consuming. So, a flexible, robust framework to synthesize realisexpression examples. It tic 3D facial expressions from also robustly recovers facial captured data would be highly expressions from noisy desirable. and incomplete data. This Existing 3D facial-expression approach can synthesize modeling approaches involve higher-quality expressions in complicated representation and less time than the state-of-themanipulation. (For more on art techniques. some of these approaches, see the sidebar.) In contrast, the human brain can represent and reconstruct facial expressions in a sparse way and figure out intact faces from incomplete or vague faces. So, it’s natural to follow the human manner of representing and recovering facial expressions to overcome the conventional approaches’ limitations. Sparse coding is the representation of items by the strong activation of a relatively small set of neurons. It has proven effective in mimicking receptive fields of neurons in the visual cortex, and its sparse repre76

March/April 2012

sentation is close to how human brains represent objects.1 Moreover, sparse coding can be used to train redundant (overcomplete) dictionaries and sparse coefficients, which are stable to recovery signals (are always able to recover the signals) with a very low noise level from noisy data.2 We’ve developed an approach that flexibly applies sparse coding to obtain a redundant dictionary of subject faces and basic expressions. So, it can synthesize 3D facial expressions by newly specified coefficients based on the redundant dictionary. Unlike existing approaches, ours can flexibly synthesize facial expressions and generate expressions based on noisy or incomplete 3D faces.

The Basic Problem A practical system for 3D facial animation should have three features: ■■

■■

■■

Expression generation. It can accurately generate realistic expressions for an arbitrary neutral face given by coefficients. Expression retargeting or cloning. It can reproduce an example 3D face’s expressions on any other neutral face. Robustness. It can generate realistic facial expressions robustly from noisy or even incomplete 3D faces.

The key issue in designing such a system is finding an effective representation of facial expressions that’s robust to noise. Strongly inspired by the

Published by the IEEE Computer Society

0272-1716/12/$31.00 © 2012 IEEE

Related Work in Facial-Expression Synthesis

T

here has been considerable research on 3D facialexpression synthesis since the 1970s.1,2 The existing approaches fall into three categories: parameter-driven, example-based, and learning-based synthesis. Parameter-driven synthesis parameterizes 3D faces and controls their shape and action by a parameter set. This technique was common in computer graphics’ early days.3,4 Unfortunately, it usually uses a low-resolution face model to reduce computational cost and thus can’t mimic subtle expression details (wrinkles, furrows, and so on) on the target face owing to their sparse vertex distribution. Although this technique adds textures to the 3D faces to enhance the realism, assessing the synthesized deformations’ quality is difficult because the textures mask them. To overcome parameter-driven methods’ limitations, researchers developed example-based synthesis. Jun-Yong Noh and Ulrich Neumann used motion vectors to represent the vertex deformation that expressions cause in the source face.5 They cloned facial expressions by applying motion vectors on the target face. Mingli Song and his colleagues introduced the vertex-tent coordinate to model local deformations in the source face.6 They transferred these local deformations to the target face under a consistency constraint. Example-based synthesis is popular but is computationally expensive and sensitive to noise, which might lead to a singular deformation and produce flaws and artifacts on the target face. In addition, it can’t produce an expressive face without example faces. To make expression synthesis more adaptable, some researchers have proposed learning-based methods that gain knowledge from training faces. Daniel Vlasic and his

abstracting nature of sparse coding, we adopt sparse coding that minimizes this function:

(

2

)

arg min X − D ⋅ C + γ ⋅ C 0 ,(1) C ,D

where D is a dictionary consisting of several basis vectors of the linear space spanned by the training set. C is the set of coefficient vectors corresponding to the set of training faces, X. Each face is represented by its vertex coordinates in a column vector [v1x, v1y, v1z ... v ix, v iy, v iz ... vkx, vky, vkz]T, where k represents the number of vertices and v ix, v iy, and v iz are the x, y, and z coordinates of vertex i. The arrangement of faces in X depends on the application—that is, coefficient-based facial-expression synthesis or facial-expression retargeting. We discuss this in more detail later. The second term of Equation 1, γ ⋅ C 0 , measures the sparseness of C corresponding to the training faces. That is, it constrains most elements of C to

colleagues presented a multilinear model for face modeling.7 They organized the training faces to construct a three-mode tensor (expression, subject, and vertex). They synthesized facial expressions by applying different coefficients obtained by tensor decomposition. Later, Dacheng Tao and his colleagues presented Bayesian tensor analysis to explain the multilinear model from a probabilistic view.2 Both methods work well for clean and well-tessellated data but can’t deal with 3D faces with missing data, which structured-light scanners often produce owing to highlights or occlusions.

References 1. F.I. Parke and K. Waters, Computer Facial Animation, A K Peters, 1996. 2. D. Tao et al., “Bayesian Tensor Approach for 3-D Face Mod­ eling,” IEEE Trans. Circuits and Systems for Video Technology, vol. 18, no. 10, 2008, pp. 1397–1410. 3. F.I. Parke, “Parameterized Models for Facial Animation,” IEEE Computer Graphics and Applications, vol. 2, no. 9, pp. 61–68, 1982. 4. K. Waters, “A Muscle Model for Animation Three-Dimensional Facial Expression,” Proc. Siggraph, ACM, 1987, pp. 17–24. 5. J.-Y. Noh and U. Neumann, “Expression Cloning,” Proc. Siggraph, ACM, 2001, pp. 277–288. 6. M. Song et al., “A Generic Framework for Efficient 2-D and 3-D Facial Expression Analogy,” IEEE Trans. Multimedia, vol. 9, no. 7, 2007, pp. 1384–1395. 7. D. Vlasic et al., “Face Transfer with Multilinear Models,” ACM Trans. Graphics, vol. 24, no. 3, 2005, pp. 426–433.

be zero. So, it provides a compact representation of the training set. Although solving Equation 1 is NP-hard, David Donoho proved that replacing γ ⋅ C 0 by the L1norm γ ⋅ C 1 can keep the sparsest solution under most situations, which leads to a much simpler optimization:3

(

2

)

arg min X − D ⋅ C + γ ⋅ C 1 (2) C ,D

such that

∑D

2 ij

= 1, ∀j ,(3)

i

where Equation 3 normalizes the basis vectors in D, preventing them from being zero, which would be the trivial solution. We omit Equation 3 in the rest of this explanation for convenience. To efficiently solve Equation 2, we use Honglak Lee and his colleagues’ approach.4 They formulated the equation as a combination of two convex optimization problems, then employed featuresign search to solve the L1-regularized least-squares IEEE Computer Graphics and Applications

77

Feature Article 250

Frequency

200 150 100 50 0 –10 –9 –8 –7 –6 –5 –4 –3 –2 –1 0

(a)

(b)

1

2

3

4

5

6

7

8

9 10

Error (mm)

Figure 1. Data acquisition. (a) Two expressions of the same subject. We marked 37 salient features on two 3D faces of the same subject and computed the geodesics between every pair of features. Then, we computed the difference of the pairwise geodesic between two expressions. (b) More than 75 percent of the geodesic differences are within ±3 mm (or 1.5 percent), and 90 percent are within ±5mm (or 2.5 percent). The results demonstrate that human facial expressions are approximate isometric transformations.

problem to learn C. To learn D, they proposed a Lagrange-dual method for the L2-constrained least-squares problem. Our sparse-coding facial-expression-synthesis framework uses three operations. The train operation solves Equation 2, abstracting the linear space spanned by the training set. The project operation computes the coefficient vector set (C) for X on the basis of D, which solves this subproblem of Equation 2:

(

2

)

arg min X − D ⋅ C + γ ⋅ C 1 . C

The recover operation recovers a face from C: X i = D ⋅ Ci , where Ci is the coefficient vector and Xi is its corresponding recovered face.

Facial-Expression Synthesis Our framework starts with data acquisition and model training to learn the dictionary. Using the dictionary, it then carries out facial-expression synthesis, facial-expression retargeting, and incompleteface recovery.

To verify our hypothesis, we marked feature points on two expressions of the same subject face (Figure 1a shows one example) and computed the pairwise geodesics—that is, the geodesic between every pair of feature points. We removed the eyes and mouth to eliminate topological ambiguity. Then we computed the corresponding geodesic difference between the two expressions. Figure 1b shows the geodesic distances for different expressions of the same subject. More than 75 percent of the geodesic differences are within ±3 mm (or 1.5 percent), and 90 percent are within ±5 mm (or 2.5 percent). These results verify that facial expressions are approximate isometric transformations. As Figure 2 shows, the geodesic patterns are highly consistent among the expressions. After computing the geodesic difference for a face, we crop the face using the user-specified threshold. Given the captured data (see Figure 3a), we first specify the salient feature points, such as the eyes and mouth. Then we compute a multisource geodesic mask using the detected feature points (see Figure 3b). Because the geodesic is invariant in facial expressions, we can accurately and robustly segment the expression data (see Figure 3c).

Model Training Data Acquisition To capture expressions, we employ the video-based 3D-shape acquisition system that Song Zhang and Peisen Huang developed.5 The system can obtain the geometry and textures with 512 × 512 resolution at 30 fps. This lets us accurately measure all subtle facial expressions. We hypothesized that facial expressions are approximate isometric transformations. If that’s the case, intrinsic properties such as geodesics and Gaussian curvature will be expression invariant. 78

March/April 2012

We collect the training dataset of 3D faces from the subjects by 3D scanner, as we just discussed. For each person, we obtain one neutral 3D face and a number of basic expressions (our experiments used nine). We denote the 3D face dataset as Tr = {Xij}, where Xij is the ith subject with the jth basic facial expression. Xij is a column vector containing vertex coordinates. Using the training dataset, the train operation abstracts the linear space of each basic facial expression:

(a)

(b) Figure 2. Given (a) four expressions of the same subject, we compute the geodesic distance from the nose tip and (b) visualize them using isolines. These geodesic patterns are highly consistent except for some small differences near the mouth and eyes.

(

)

2 arg min X − D ⋅ C + γ ⋅ C 1 ,(4) C ,D

T T where X =  X:,T1 , X:,T2 ,…, X:,Tn  , D =  D1T , D2T ,…, DnT  , and X:, j = [ X1 j , X 2 j ,…, X mj ] denote 3D faces with the jth basic facial expression. In other words, C represents each type of basic facial expression X:,j in terms of the corresponding dictionary Dj. Also, C measures only the variance of face shapes in a single basic facial expression instead of the variance among the different basic facial expressions. That is, we expect that different facial expressions of the same subject will share the same coefficient vector in terms of D, which we use as the constraint for facial-expression synthesis. We can also use the training dataset to train a dictionary DA that abstracts the linear space spanned by faces from different subjects:

(

2

)

subject. By treating that face as a new subject, we synthesize expressions in the following three steps (see Figure 4). Basic-expression recovery. As we described earlier, the learned dictionary Di reflects only the variance of facial shapes in the ith basic expression, and different basic expressions from the same subject should share the same coefficient vector. So, we can synthesize all basic expressions of the new subject if we can compute the coefficient vector corresponding to each basic expression. Given a neutral face of a new subject and a subdictionary D1, we estimate the corresponding coefficient vector by projecting the neutral face to D1. The recover operation synthesizes basic expressions:

(

)

2 arg min F − D1 ⋅ C F + γ ⋅ C F 1 , CF

T

arg min Y − D A ⋅ C A + γ ⋅ C A 1 ,

Y T , Y T ,…, Y T  = D ⋅ C F , Fm   F1 F2

where Y = [X11, X12, ..., X 21, X 22, ..., Xmn] is a matrix consisting of all the training faces. We use DA to recover facial expressions from noisy or incomplete faces.

where C F is the estimated corresponding coefficient vector, and YF = [YF1 , YF2 ,…, YFm ] denotes m synthesized basic expressions of the given neutral face.

Coefficient-Based Synthesis

Expression space learning. Using the synthesized basic expressions, the train operation learns the new subject’s (linear) expression space DF:

D A ,C A

Our approach requires only one neutral face to synthesize any other expressions of the same

IEEE Computer Graphics and Applications

79

Feature Article

(a)

(b)

(c)

Figure 3. Creating and using geodesic masks. (a) Captured facial data. (b) Geodesic masks from the detected features on the mouth and eyes, which are invariant to the expressions and are approximate isometric transformations. (c) Segmented facial expressions using the geodesic masks. Because the geodesic is invariant in facial expressions, we can accurately and robustly segment the expression data.

(

2

)

arg min YF − DF ⋅ C F + γ ⋅ C F 1 . DF

Similar to Equation 4, this problem’s solution also employs the feature-sign search and a Lagrangedual method. Expression synthesizing. After reconstructing DF, we can synthesize any expression Fe by the recover operation—that is, linear combination of the basis vectors in DF: Fe = DF ⋅ C e , where Ce is the expression’s coefficient vector. We can specify Ce manually or automatically (for example, through expression retargeting). Figure 5 illustrates the results. Although most of the synthesized expressions in Figure 5 are realistic, the last two seem unnatural owing to infeasible coefficient vectors, which usually exceed the natural faces’ scope. 80

March/April 2012

Facial-Expression Retargeting Expression retargeting usually makes an expression analogy between the performer and avatar. So, we can state a typical expression-retargeting problem as this: given a source neutral face S, a target neutral face T, and a source expression face S′, construct a target face T′ whose expression is the same as S′. Figure 6 depicts the expression-retargeting workflow, which comprises the following three steps. Basic-expression recovery. To recover the basic expressions for the source and target faces, we employ the method we used for expression synthesis. We denote the source face’s basic expressions as YS = [YS1 , YS2 ,…, YSn ] and the target face’s expressions as YT = [YT1 , YT2 ,…, YTn ] . Expression space colearning. Using the recovered basic expressions, the train operation learns the expression space of the source or target:

Basic expression recovery

Coefficients

D (dictionary)

Expression space learning

Recovery

Dictionary

Figure 4. The algorithmic pipeline for expression synthesis by setting coefficients. Our approach requires only one neutral face to synthesize any other expressions of the same subject.

(b)

(a)

(c)

Figure 5. Expression synthesizing. (a) The neutral face of a subject. (b) Basic expressions recovered from the neutral face. (c) Facial expressions synthesized by random coefficient vectors. Although most of the synthesized expressions are realistic, the last two seem unnatural owing to infeasible coefficient vectors, which usually exceed the natural faces’ scope. 2  T arg min  YST , YTT  − DST ⋅ C ST + γ ⋅ C ST DST ,CST  

 , 

1 

T where DST =  DST , DTT  . We constrain the source and target faces to share the same coefficient vector set, CST. So, the same expressions of the source and target faces share the same corresponding coefficient vector in terms of DST.

Coefficient transfer. Because T′ and S′ have the same corresponding coefficient vector in terms of DST, we can estimate the coefficient vector by projecting S′ to DS. Then, the recover operation synthesizes T′ with the same coefficient vector and DT :

(

)

2 arg min S ′ − DS ⋅ C ′ + γ ⋅ C ′ 1 , C′



T ′ = DT ⋅ C ′ , where C′ is the coefficient vector of S′ projected to DS , and T′ shares the same basic coefficient vector in terms of DT.

Incomplete-Face Recovery Owing to unavoidable scanning errors, some input faces often suffer from noises or unwanted holes on the surfaces. Such incomplete faces can’t be used directly for facial-expression synthesis, retargeting, and so on. To synthesize facial expressions robustly, we employ partial projection. This strategy has two steps. First, the project operation estimates the coefficient vector. Then, the recover operation synthesizes the intact face. IEEE Computer Graphics and Applications

81

Feature Article

S

T Basic Basic expression expression recovery recovery

Expression space colearning

D

S’

Coefficients

Coefficient mapping T’

Codictionary

Figure 6. The algorithmic pipeline for expression retargeting. Given a source neutral face (S), a target neutral face (T), and a source expression face (S′), we construct a target face (T′) whose expression is the same as the source expression face.

Let Finp denote the incomplete face, and IF the face’s valid vertex indices. We compute the partial projection as 2  ˆ A ⋅ Cinp + γ ⋅ Cinp  , arg min  Finp − D 1 Cinp  

Experimental Results We performed our experiments on a 64-bit Windows system with an Intel T9300 processor and 2 Gbytes of RAM. The training faces consisted of 30 subjects with 10 basic facial expressions (the first one was neutral); we set the sparseness at 0.3.

Fcomp = D A ⋅ Cinp ,(5) ˆ A is the subdictionary of DA (learned where D during model training) that contains only those rows with indices in IF. Although the partial projection doesn’t take into account the missing vertices, the optimization in Equation 5 recovers every vertex, including the missing one. This is ˆ A contains information that implies because D the dependency between the known and missing vertices. Figure 7 compares the faces recovered from incomplete or noisy data (Gaussian noise with a 0 mean and 0.2 percent standard deviation) with the ground truth Ftruth. We compute the mean-square error of each recovered expressive face Fcomp as

E=

Fcomp − Ftruth

2

ne

DiagLength ( Ftruth )

,

where DiagLength(⋅) computes the diagonal length of the bounding box of the 3D face and ne is the number of components of Ftruth. The small meansquare errors demonstrate that our approach can robustly handle incomplete or noisy data. 82

March/April 2012

Face Correspondence To carry out the evaluation, we carefully aligned all the faces to find correspondences among them in advance. For higher accuracy, we employed a supervised-correspondence method;6 the algorithm employs the following two steps (see Figure 8a). Cylindrical projection. For the face selected as the template (the top-left face in Figure 8a), we manually marked the predefined feature points on the template. Then, we obtained a template mask mesh by Delaunay triangulation. To maintain consistency, we had each input face’s mask mesh adopt the template mask mesh’s topology, rather than recompute the triangulation. Finally, we performed cylindrical projection on these 3D faces and their masks to obtain their corresponding 2D coordinates. We developed a friendly interface that lets users easily mark the feature points on 3D faces. Mapping barycentric coefficients. As Figure 8b shows, given a vertex V = (x, y, z) in the template face, we obtained its corresponding 2D coordinate V2d = (x2d, y2d) by cylindrical projection. Moreover, we

(a)

0.61%

0.57%

0.76%

0.72%

0.81%

0.60%

(b)

0.61%

0.57%

0.76%

0.72%

0.81%

0.60%

(c)

0.61%

0.57%

0.76%

0.72%

0.81%

0.60%

Figure 7. Facial-expression recovery from incomplete or noisy data. (a) The ground truth. (b) The incomplete or noisy faces. (c) The expressive faces recovered by partial projection. The small mean-square errors (shown below each recovered face) demonstrate that our approach can robustly handle incomplete or noisy data.

Table 1. The capabilities of three approaches to facial-expression synthesis. Approach

Coefficient-based synthesis

Expression retargeting

Recovery with incomplete data

Bayesian tensor analysis

P

P



Facial-expression analogy



P



Our approach

P

P

P

computed its barycentric coefficients according to its surrounding mask triangle T2d. Assuming the corresponding mask triangle of T2d is T2′d in the input face, we easily found V’s corresponding vertex V ′ with the same barycentric coefficients in T2′d . By carrying out this barycentric-coefficient mapping vertex-by-vertex, we achieved correspondence among all the 3D faces.

Comparing Approaches to Facial-Expression Retargeting We compared our sparse-coding approach to two representative approaches to facial-expression

synthesis: Bayesian tensor analysis (BTA) and facialexpression analogy (FEA).6 (For more on BTA, see the sidebar.) Table 1 lists the three approaches’ capabilities. BTA uses different coefficients than our approach, and FEA doesn’t support coefficientbased facial-expression synthesis. So, it’s more feasible for us to evaluate these approaches on the basis of facial-expression retargeting instead of coefficient-based synthesis. We retargeted the 3D facial expressions back to the source neutral face and compared the synthesized results with the ground truth (see Figure 9). Given Ftruth, we computed the mean-square error IEEE Computer Graphics and Applications

83

Feature Article Correspondence

Template Cylindrical projection

Barycentric-coefficient mapping Input faces

...

...

...

Cylindrical projection

Cylindrical projection

(a)

T2d

V2d

V

T’2d

V’2d

V’

(b) Figure 8. Face correspondence. (a) The pipeline. (b) Mapping barycentric coefficients. V indicates a vertice; T indicates a mask triangle. This method provides higher accuracy.

of each synthesized result T′ as

E=

T ′ − Ftruth

2

ne

DiagLength ( Ftruth )

.

As the mean-square errors in Figure 9 show, our approach outperforms BTA and is comparable to FEA. We also compared the three approaches with clean data (see Figure 10). BTA produced artifacts on the chin due to unstable solutions (see Figure 10b). FEA produced a twist near the mouth (see Figure 10c). Our approach produced artifact-free, realistic results (see Figure 10d). In addition, we added Gaussian noise (with a 0 mean and 0.2 percent standard deviation) to the test 3D faces (see Figure 11). The learning-based approaches (BTA and our approach) were robust 84

March/April 2012

because the training faces provided knowledge of faces’ shapes that constrained the recovered face during synthesis. FEA failed to produce satisfactory results, owing to the singular solution caused by the noise. Our approach’s execution time has two parts: facial-expression modeling and facial-expression synthesis. The former includes basic-expression recovery and expression space colearning. The latter only needs to run once for each newly input 3D face. In our experiments (see Figures 10 and 11), model training took an average of 15.04 sec. Synthesis took only 0.067 sec., which was much more efficient than BTA (0.64 sec.) and FEA (2.32 sec.).

Results with Incomplete Data or Noisy Faces To further evaluate our approach’s robustness and flexibility, we conducted experiments with incom-

Neutral face

(a)

(b)

0.67%

0.95%

0.72%

0.80%

(c)

0.11%

0.07%

0.086%

0.26%

(d)

0.44%

0.58%

0.59%

0.47%

Figure 9. Comparing three approaches to facial-expression retargeting: (a) the ground truth, (b) Bayesian tensor analysis (BTA), (c) facial-expression analogy (FEA), and (d) our approach. The source and target faces are the same. The mean-square error appears below each synthesized result. Our approach outperforms BTA and is comparable to FEA.

plete and noisy data. (This experiment didn’t include BTA and FEA because neither can deal with incomplete data.) We generated incomplete faces by removing some vertices manually (see Figure 12). We generated the noisy faces with missing data by adding Gaussian noise with a 0 mean and 0.2 percent standard deviation on the incomplete faces. The results show that our approach successfully recovers expressions for the target faces.

O

ur approach has two limitations that deserve further attention. First, we had to carefully align the 3D faces before training and synthesis. We’d like to investigate an automatic algorithm for 3D face correspondence. Furthermore, conformal parameterization could replace cylindrical projection to provide more accurate feature correspondence by preserving more local shape information. Second, our approach doesn’t provide the range of coefficients for natural facial-expression synthesis. IEEE Computer Graphics and Applications

85

Feature Article Neutral face

(a)

(b)

(c)

(d) Figure 10. Comparing the three approaches, using clean data: (a) neutral faces for the source and target, (b) BTA, (c) FEA, and (d) our approach. The red arrows in the BTA results highlight artifacts. Also, the FEA results in column 5 are inconsistent with the source. Our approach produced artifact-free, realistic results.

To find this range, we could use some statistical techniques.

Acknowledgments We thank the editor and all reviewers for their careful review and constructive suggestions. This article was supported by the National Natural Science Foundation of China under grant 60873124, the Natural Science Foundation of Zhejiang Province under grant Y1090516, the Fundamental Research Funds for the Central Universities under grant 2009QNA5015, and Singapore National Research Foundation grant NRF2008IDM-IDM004-006.

References 1. B. Olshausen and D. Field, “Emergence of SimpleCell Receptive Field Properties by Learning a Sparse 86

March/April 2012

Code for Natural Images,” Nature, vol. 381, no. 6583, 1996, pp. 607–609. 2. E. Candès, J. Romberg, and T. Tao, “Stable Signal Recovery from Incomplete and Inaccurate Measurements,” Comm. Pure and Applied Mathematics, vol. 59, no. 8, 2006, pp. 1207–1223. 3. D. Donoho, “For Most Large Underdetermined Systems of Linear Equations the Minimal ℓ1-Norm Solution Is Also the Sparsest Solution,” Comm. Pure and Applied Mathematics, vol. 59, no. 6, 2006, pp. 797–829. 4. H. Lee et al., “Efficient Sparse Coding Algorithms,” Proc. 20th Conf. Neural Information Processing Systems, MIT Press, 2007, pp. 801–808. 5. S. Zhang and P. Huang, “High-Resolution, Real-Time 3D Shape Acquisition,” Proc. 2004 Computer Vision and Pattern Recognition Workshop (CVPRW 04), IEEE CS Press, 2004, p. 28. 6. M. Song et al., “A Generic Framework for Efficient

Neutral face

(a)

(b)

(c)

(d) Figure 11. Comparing the three approaches with noisy data (Gaussian noise with a 0 mean and 0.2 percent standard deviation): (a) neutral faces, (b) BTA, (c) FEA, and (d) our approach. Unlike our approach, BTA and FEA produced artifacts and flaws. 2-D and 3-D Facial Expression Analogy,” IEEE Trans. Multimedia, vol. 9, no. 7, 2007, pp. 1384–1395. Yuxu Lin is a PhD candidate in Zhejiang University’s College of Computer Science. His research interests mainly include 3D face modeling and deformation. Lin has a bachelor in computer science and technology from Zhejiang University. He’s a member of IEEE. Contact him at [email protected]. Mingli Song is an associate professor in Zhejiang University’s College of Computer Science. His research interests in

clude face modeling and facial-expression analysis. Song has a PhD in computer science from Zhejiang University. He’s a member of IEEE. He’s the corresponding author. Contact him at [email protected]. Dao Thi Phuong Quynh is a PhD student in Nanyang Technological University’s School of Computer Engineering. Her research interests include computational geometry and machine learning. Quynh has a BS in applied mathematics and computer science from Moscow State University. Contact her at [email protected]. IEEE Computer Graphics and Applications

87

Feature Article Neutral face

(a)

(b)

(c)

(d) Figure 12. 3D facial-expression retargeting from incomplete or noisy faces. The (a) source faces and recovered results for (b) target face 1, (c) target face 2, and (d) target face 3. Our approach successfully recovered expressions for the targets.

Ying He is an assistant professor in Nanyang Technological University’s School of Computer Engineering. His research interests are computer graphics, computer-aided design, and scientific visualization. He has a PhD in computer science from Stony Brook University. Contact him at [email protected]. Chun Chen is a professor in Zhejiang University’s College

of Computer Science. His research interests include computer vision, computer graphics, and embedded technology. Chen has a PhD in computer science from Zhejiang University. Contact him at [email protected]. Selected CS articles and columns are also available for free at http://ComputingNow.computer.org.

Silver Bullet Security Podcast In-depth interviews with security gurus. Hosted by Gary McGraw.

www.computer.org/security/podcasts *Also available at iTunes Sponsored by

88

March/April 2012

Sparse coding for flexible, robust 3D facial-expression synthesis.

Computer animation researchers have been extensively investigating 3D facial-expression synthesis for decades. However, flexible, robust production of...
11MB Sizes 2 Downloads 3 Views