sensors Article
Human Detection Based on the Generation of a Background Image and Fuzzy System by Using a Thermal Camera Eun Som Jeon, Jong Hyun Kim, Hyung Gil Hong, Ganbayar Batchuluun and Kang Ryoung Park * Division of Electronics and Electrical Engineering, Dongguk University, 30 Pildong-ro 1-gil, Jung-gu, Seoul 100-715, Korea;
[email protected] (E.S.J.);
[email protected] (J.H.K.);
[email protected] (H.G.H.);
[email protected] (G.B.) * Correspondence:
[email protected]; Tel.: +82-10-3111-7022; Fax: +82-2-2277-8735 Academic Editors: Vincenzo Spagnolo and Dragan Indjin Received: 3 January 2016; Accepted: 24 March 2016; Published: 30 March 2016
Abstract: Recently, human detection has been used in various applications. Although visible light cameras are usually employed for this purpose, human detection based on visible light cameras has limitations due to darkness, shadows, sunlight, etc. An approach using a thermal (far infrared light) camera has been studied as an alternative for human detection, however, the performance of human detection by thermal cameras is degraded in case of low temperature differences between humans and background. To overcome these drawbacks, we propose a new method for human detection by using thermal camera images. The main contribution of our research is that the thresholds for creating the binarized difference image between the input and background (reference) images can be adaptively determined based on fuzzy systems by using the information derived from the background image and difference values between background and input image. By using our method, human area can be correctly detected irrespective of the various conditions of input and background (reference) images. For the performance evaluation of the proposed method, experiments were performed with the 15 datasets captured under different weather and light conditions. In addition, the experiments with an open database were also performed. The experimental results confirm that the proposed method can robustly detect human shapes in various environments. Keywords: human detection; thermal camera image; generation of background image; fuzzy system
1. Introduction With the recent development of computer vision and pattern recognition technologies, human detection has been used in various applications, including intelligent surveillance systems [1–9]. Because of its various advantages of being less affected by poor conditions like illumination changes, low light, and fog, etc., human detection by thermal cameras has been highlighted. The gray level value of an object in a thermal image is determined by the temperature of the object. Generally, humans in an image are warmer than the background, so the gray level of a human’s area is usually higher than that of the surrounding environment. However, the properties of these areas can be affected by the temperature or environmental conditions. The condition that a human area is brighter than the surrounding areas in a thermal image is typically satisfied during night and winter, but in summer, the condition is changed, and the brightness of a human image is darker than the background during summer or on a hot day. These factors can affect the accuracy of detecting human areas in thermal images [9–19] and make it difficult to distinguish human areas from the background in the image. In order to overcome these drawbacks and to extend the applications of human tracking and behavioral recognition, various researches have recently focused on detecting human areas. There are several previous studies related to human detection in thermal camera images. These can be divided Sensors 2016, 16, 453; doi:10.3390/s16040453
www.mdpi.com/journal/sensors
Sensors 2016, 16, 453
2 of 31
into two categories: those without background models [4,5,10–20], and those with background models [21–36]. In the former category, some methods have employed features based on the histogram of the oriented gradient (HOG) [4,5,10–15] with a support vector machine (SVM) [13–15], the soft-label [16], and the edge features with an adaptive boosting method (Adaboost) [17]. Fuzzy systems are used to classify human areas without using background information [18–20]. The advantage of these methods is that procedures for constructing backgrounds are not required, but they do however require training procedures for extracting or obtaining a pre-defined template, as well as various types of scales for the detection of different sizes of humans. In addition, various conditions from images captured at different times and from different views can affect the accuracy of the results. They also require a significant amount of processing time to detect humans because of the need to scan the entire region of the image. Because of these drawbacks, human detection methods with background models have been employed as an alternative. The Gaussian approach [21–23], expectation minimization [24,25], texture change [26], and statistical methods [22,27–32] are used to create a background image. In addition, image averaging [32–35] or running average methods [35,36] can be used for background modeling. After extracting the background information, contour saliency maps (CSM) [21–23], template matching with CSM [32], and shape- and appearance-based features obtained using principal component analysis (PCA) [24,25] are used for human detection. Spatiotemporal texture vectors [26] can also be applied. In previous studies [34,35], fuzzy-based methods for background subtraction have been employed. The advantage of these methods is that they can be applied to multiple conditions of images that have various object sizes. These methods can also be applied to various environmental conditions such as snow, rain, sunlight, night, and daytime. However, their performance is influenced by the similarity between the background and the object because these performances are based on a background subtraction method. That is, they did not consider the cases where the background is similar to the object in the image. In addition, if there are motionless people located in the same positions in all the frames, these people can be factors for generating erroneous backgrounds, and degradation of performance for human detection can therefore occur because of these erroneous backgrounds. To overcome these drawbacks, we present herein a new approach to detect human areas in a thermal image under varying environmental conditions. The proposed research is novel in the following four respects: - First, the threshold for background subtraction is adaptively determined based on a fuzzy system. This system uses the information derived from the background image and difference values between the background and input image. - Second, the problem of two or more than two people being in the similar place with occlusion is solved by our method. Based on four conditions (the width, height, size, and ratio of height to width), the candidate region is separated into two parts. In addition, if the width or height of the detected box is larger than a threshold, our algorithm also checks whether there exist two or more than two histogram values which are lower than the threshold. If so, the candidate region is horizontally or vertically divided into three or more than three regions at the positions of the histogram values. - Third, for human confirmation, the separated regions are verified based on the size and the distance between two or more regions in close proximity to one another. If a region is small and there is another small region nearby, these two regions are merged as an exact human region. - Fourth, our method is confirmed to robustly detect human areas in various environments through intensive experiments with 15 sets of data (captured under different weather and light conditions) and an open database. The main contribution and advantage of our method are that the thresholds for creating the binarized difference image between the input and background images are determined adaptively based on fuzzy systems by using the information derived from the background image and difference
Sensors 2016, 16, 453 Sensors 2016, 16, 453
3 of 31
3 of 31
basedbetween on fuzzybackground systems by using the information frommethod, the background values and input image. Byderived using our human image areas and can difference be correctly values between background and input image. By using our method, human areas can be be correctly detected, irrespective of the various conditions of input and background images, which can a crucial detected, irrespective of the various conditions of input and background images, which can be a requirement for intelligent surveillance system applications. Our work is suitable only for the case crucial requirement for intelligent surveillance system applications. Our work is suitable only for of intelligent surveillance using static cameras, therefore, we do not consider cases with dynamic the case of intelligent surveillance using static cameras, therefore, we do not consider cases with background such as advanced driver assistance systems in our research. In previous research [37], the dynamic background such as advanced driver assistance systems in our research. In previous authors proposed a method for detecting human areas, but the above four novel points of our research research [37], the authors proposed a method for detecting human areas, but the above four novel arepoints different from the previous research of our research are different from[37]. the previous research [37]. The remainder of this article is organized we provide providean anoverview overviewofofthe the proposed The remainder of this article is organized as as follows: follows: we proposed system and an algorithm for human detection in Section 2. We present the experimental results and system and an algorithm for human detection in Section 2. We present the experimental results and analysis in Section 3. Finally, the conclusions are discussed in Section 4. analysis in Section 3. Finally, the conclusions are discussed in Section 4.
2. 2. Proposed ProposedMethod Method 2.1.2.1. Overall Procedure ofofProposed Overall Procedure ProposedMethod Method AnAn overview inFigure Figure1.1.We Wepropose proposea athree three step system overviewofofthe theproposed proposedmethod method is is presented presented in step system forfor detecting humans in a thermal image: generation of a background image (model); obtaining detecting humans in a thermal image: generation of a background image (model); obtaining a a difference system with with the thebackground backgroundand andinput inputimage; image;and and detection difference image image based based on on fuzzy fuzzy system detection of of humans in the image. In thisInpaper, image the sub-bands medium-wave humans in difference the difference image. this the paper, theobtained image in obtained in the ofsub-bands of IR medium-wave (MWIR, 3–8 µm) and long-wave IR (LWIR, 8–15 µm) is called a thermal image [38]. IR (MWIR, 3–8 μm) and long-wave IR (LWIR, 8–15 μm) is called a thermal image [38].
Figure1.1.Overall Overall procedure procedure of of the Figure the proposed proposedmethod. method.
First, a background image is generated. An image is created by filtering, and non-background areas are removed. In addition, a correct background image is created (see the details in Section 2.2).
Sensors 2016, 16, 453
4 of 31
Then, a (pixel) difference image is obtained from the background and input images. The threshold for extracting a candidate area is adaptively determined using a fuzzy system which uses the brightness feature of the generated background and input image (see the details in Section 2.3). The third step is to detect human areas. Incorrect human areas are removed by size filtering and morphological operations. Based on a vertical and horizontal histogram and the size and ratio of the area, the candidate area is separated. After removing incorrect human areas based on the size and ratio of the candidate area, further procedures for detecting correct human area are performed. The remaining areas are merged adaptively based on the distance between the objects and camera viewing direction (see the details in Section 2.4). Finally, the correct human areas are obtained. 2.2. Generating a Background Image Previous research used background subtraction methods with background modeling to detect human areas [21–36]. In order to detect human areas by the background subtraction method, generating a correct background image is necessarily required. Statistical methods [22,27–32], temporal averaging [31–35], and running average-based methods [36,38] are used for generating background images. However, there are some ghost shadows in a generated background image, which come about as the result of temporal averaging methods. Moreover, the research for making background images does not implement further procedures for considering motionless people in an image. If there is a motionless human in all frames which are used for creating a background image, erroneous background images can be generated. In order to overcome this drawback, Dai et al. proposed a method to create a background image by using multiple images obtained from two other sequences [24]. However, the intensity of the created image from the procedure of making a background image is quite different from that of the input image. For instance, if a sequence obtained at daytime and other sequence obtained at night are used for creating a background by an averaging method, the generated background image has the average brightness of these two sequences, is the intensity of which is much different compared to the input image. Therefore, incorrect background images can be created and detection errors can occur. Therefore, we propose a method for creating a correct background image to overcome these problems. Our method of creating a correct background image is referred to previous research [37]. A flow chart of the proposed method is presented in Figure 2. To begin, a background image is generated by using training images. To solve the problem of ghost shadows in a background image, the median values of pixels from successive multiple frames (from 10 to 70 frames) in a sequence are used [22,27–32]. By using the median values, a median image, which corresponds to a preliminary background image, is created, as illustrated in Figure 3a. However, the motionless humans in all frames can cause the incorrect inclusion of non-background areas in the created image. Therefore, further procedures are performed to generate a correct background image. A 3 ˆ 3 pixels max filter is applied to the created median image to enhance the human area compared to background area. In general, the gray level of a person in a thermal image is higher than that of background. Therefore, by max filtering, human area is shown to be more evident than the background. Based on the average and standard deviation value of the background image, a binary image which shows the candidate non-background area (human area) is created by Equation (3) [37,39]: řM řN µ“ dř σ“
i “1
j“1 Imed pi, jq
MˆN M řN 2 i “1 j“1 pImed pi, jq ´ µq
MˆN´1
$ ’ & 0 i f ppImed pi, jq ă µ ´ P ˆ σq and pµ ą Qqq or B pi, jq “ ppImed pi, jq ą µ ` P ˆ σq and pµ ď Qqq ’ % 1 otherwise
(1)
(2)
(3)
background images does not implement further procedures for considering motionless people in an image. If there is a motionless human in all frames which are used for creating a background image, Sensors 2016, 16, 453 5 of 31 erroneous background images can be generated. In order to overcome this drawback, Dai et al. proposed a method to create a background generated by 453 using training images. To solve the problem of ghost shadows in a background image, Sensors 16, 5 of 31 image2016, by using multiple images obtained from two other sequences [24]. However, the intensity of the median values of pixels from successive multiple frames (from 10 to 70 frames) in a sequence the created image from the procedure of making a background image is quite different from that of are used [22,27–32]. By using the median values, a median image, which corresponds to a the input instance, a sequence obtained(i,atj) daytime andmedian other sequence obtained at where: Imedimage. (i,background j) is For the gray levelifvalue at the position of created image; and N are preliminary image, is created, as illustrated inaFigure 3a. However, theMmotionless night are used for creating a background by an averaging method, the generated background image the widthinand the image, respectively; is the average and σ is standard deviation of humans allheight framesofcan cause the incorrect µinclusion of non-background areas in the value created has the average brightness of these two sequences, is the intensity of which is much different the median image;further B(i, j) isprocedures a binary image, which presents candidate non-background (human image. Therefore, are performed to generate a correct background area image. A3× compared toand theQinput image. Therefore, incorrect background images can be created andthe detection area); and P are the optimal parameters, which are determined experimentally with images 3 pixels max filter is applied to the created median image to enhance the human area compared to errors can occur. (which were not for all the experiments shown in Section 3). of background area.used In general, the gray level of of performance a person in ameasurements thermal image is higher than that background. Therefore, by max filtering, human area is shown to be more evident than the background. Based on the average and standard deviation value of the background image, a binary image which shows the candidate non-background area (human area) is created by Equation (3) [37,39]: ∑
=
σ=
(, )=
0 (( (( 1 ℎ
∑
(, )
(1)
×
∑
∑
(
( , )− )
×
−1
(, )< − × ) (, )> + × )
(2) ( > )) ( ≤ ))
(3)
where: Imed(i, j) is the gray level value at the position (i, j) of a created median image; M and N are the width and height of the image, respectively; μ is the average and σ is standard deviation value of the median image; B(i, j) is a binary image, which presents candidate non-background area (human area); and P and Q are the optimal parameters, which are determined experimentally with the images (which were not usedFigure for all 2. the experiments of performance measurements shown in Section 3). Flow chart of generating a background image (model). Figure 2. Flow chart of generating a background image (model).
Therefore, we propose a method for creating a correct background image to overcome these problems. Our method of creating a correct background image is referred to previous research [37]. A flow chart of the proposed method is presented in Figure 2. To begin, a background image is
(a)
(b)
(c)
(d)
Figure 3. 3. Examples Figure Examples of of generating generating the the background background image image from from database database I: I: (a) (a) preliminary preliminary background background image obtained obtained by by median medianvalue valuefrom fromthe thesequence sequenceofof images; extracted candidate human area image images; (b)(b) extracted candidate human area by by binarization; (c) extracted human areas by labeling, size filtering and a morphological operation; binarization; (c) extracted human areas by labeling, size filtering and a morphological operation; and andthe (d)final the final background image. (d) background image.
Sensors 2016, 16, 453
6 of 31
With these images, the ground-truth areas of human were manually depicted. In addition, according to various P and Q, the human areas could be automatically detected based on the Equation (3). With the ground-truth and automatically detected areas, we can calculate the PPV, Sensitivity, and F1-score of the Equations (17)–(19). Optimal P and Q were determined, with which the highest PPV, Sensitivity, and F-score of human detection were obtained. The selected P and Q are 1.5 and 120.4, respectively. In addition, the same values of P and Q were used for all the experiments in Section 3. In general, human areas are much smaller than the background area. Therefore, the average value of the median image (µ of the Equation (1)) determines the equation that should be applied for binarization. After binarization, the candidates of human areas are detected as shown in Figure 3b. In order to extract exact human areas to be erased, a component labeling and a morphological operation are applied to the binarized image. Through the component labeling, the pixel positions of isolated candidate area can be located [40]. Then, morphological operations including dilation and erosion on the candidate area can reduce the small-sized noises and combine the incorrectly separated regions [40]. In addition, component labeling and size filtering are performed to remove a great number of small or large areas, which are not regarded as human areas. Because the pixel positions of isolated candidate areas can be located through component labeling, the pixel number of the candidate area can be counted [40]. Based on the pixel number, small or large areas (which is difficult to be regarded as human area) can be removed (size filtering). The candidates of human areas extracted are shown in Figure 3c. These areas should be erased to generate a correct background image. In order to erase these areas, we propose as erasing method with a detailed explanation. Linear interpolation is the main idea of this method [37]. If there is a candidate human area, the leftmost and rightmost positions of the candidate area for every row are determined as the X1 and X2 positions, respectively. Those positions smaller than X1 and larger than X2 are determined as the pixel positions (Xa and Xb ) of the background region, which implies nonhuman regions. After extracting Xa and Xb positions, the candidate human area is erased by linear interpolation with the pixel value of these positions based on the Equation (4). This procedure is performed iteratively for the entire image: Y“
Yb ´ Ya pX ´ Xa q ` Ya Xb ´ Xa
(4)
where Xa and Xb respectively represent the x positions of the background region; Ya and Yb are the pixel values of Xa and Xb , respectively; X is an intermediate x position between Xa and Xb ; Y is the calculated pixel value of X by linear interpolation. After performing this erasing method, the human candidate area and its surrounding area can be replaced to near pixel values. That is, the human area that is a cause of the generation of erroneous background images can be removed. Finally, a correct background image is created. Although motionless humans are located at the same position in all of the frames, all non-background areas (human areas) are erased and a correct image is generated. In this study, we used fifteen kinds of databases. More detailed experimental results with these databases are presented in Section 3.1. One example of the generated background image is provided in Figure 3d, where there are no human areas in the final background image. Two more examples of the generated background images are presented in Figures 4 and 5. As illustrated in Figures 4 and 5, a correct background image, which does not include human areas, is generated. That is, the proposed method can create correct background images for the procedure of human detection.
Sensors 2016, 16, 453 Sensors 2016, 16, 453 Sensors 2016, 16, 453
7 of 31 7 of 31
7 of 31
(a) (a)
(b) (b)
(c) (c)
(d)
(d)
Figure Thefirst firstexample examplefor forgenerating generating aa background background image Figure 4. 4.The imagefrom fromdatabase databaseIII: III:(a)(a)preliminary preliminary Figure 4. The first example for generating a background image from database III: (a) preliminary background image obtained by median value from the sequence of images; (b) extracted candidate background image obtained by median value from the sequence of images; (b) extracted candidate background image obtained by median value from thebysequence of images; (b) extracted candidate human area by binarization; (c) extracted human areas labeling, size filtering and a morphological human area by binarization; (c) extracted human areas by labeling, size filtering and a morphological human area by binarization; (c) extractedimage. human areas by labeling, size filtering and a morphological operation; and thefinal finalbackground background operation; and (d)(d) the image.
operation; and (d) the final background image.
(a)
(a)
(c)
(b)
(b)
(d)
Figure 5. The second (c) example for obtaining a background image from database (d)VIII: (a) preliminary background image obtained by median value from the sequence of images; (b) extracted candidate human area by binarization; extracted human areas by labeling, size filtering and a morphological Figure 5. The The second second example(c) for obtaining background image from from database VIII: (a) preliminary preliminary Figure 5. example for obtaining aa background image database VIII: (a) operation; and (d) the final background image. background image obtained by median value from the sequence of images; (b) extracted candidate background image obtained the sequence of images; (b) extracted candidate
human area area by by binarization; binarization; (c) (c) extracted extracted human human areas areas by by labeling, labeling, size size filtering filtering and and aa morphological morphological human operation; and and (d) (d) the the final finalbackground backgroundimage. image. operation;
Sensors 2016, 16, 453
8 of 31
2.3. Generating a Difference Image Based on the Fuzzy System Given Background and Input Image In the proposed method, in order to extract human candidate areas, a background subtraction method is used. First, to reduce noises which are usually generated in thermal image, 3 ˆ 3 median filtering is applied to a generated background image before subtraction. Further, 3 ˆ 3 median and average filtering is applied to an input image for reducing the noises. By using the difference between the background and input images, a binary image which presents candidate human areas is created. To create a difference image, the optimal threshold is required to cover the brightness variation of an image caused by various environmental conditions. In order to determine threshold adaptively, fuzzy system is used for the proposed method (see the details in Section 2.3.1). 2.3.1. Definition of the Membership Function A fuzzy system for the proposed method is illustrated in Figure 6. The characteristics of human areas in thermal images are changed because of temperature and environmental factors. For example, in general, the intensity of humans in thermal images captured at night or during winter is much higher than that of the background. However, if the image is captured at daytime or much high temperature conditions, these intensity conditions are reversed. To consider these conditions, we use two kinds of features for the fuzzy system to extract the correct threshold for obtaining a human candidate area by background subtraction. The average value of the generated background image (F1 of the Equation (5)) and the sum of the difference values between the background and input images (F2 of the Equation (7)) are used as two input features. First, the average value of the generated background image is simply obtained using Equation (5): řM řN i “1 j“1 B pi, jq F1 “ (5) MXN where B(i, j) is the gray level value at the position (i, j) of a generated background image; M and N are the width and height of the image, respectively; and F1 is the average value of the generated Sensors 2016, 16,image. 453 9 of 31 background
Figure 6. Fuzzy system for the proposed method to extract adaptive threshold threshold for for ROI ROI extraction. extraction.
Fordetermine determining effective threshold for obtaining regions which represent a To the an condition stating whether intensity of humanofisinterest higher (ROI) than that of background human candidate area, the brightness of a background image is the main concern for the or not, Equations (6) and (7) are employed. Based on Equations (6) and (7), the sum of the difference background subtraction technique. If the brightness difference between a human and the values between the background and input images is obtained: background is too small and the threshold for subtraction is too large, it is hard to extract a human # area. On the other hand, if the brightness large and the threshold for subtraction is It pi, jq difference ´ B pi, jq i fis|Itoo t pi, jq ´ B pi, jq| ą Tq pi, jq “ D t too small, it is also difficult to define a human0 otherwise area because other neighboring areas can (6) be determined as human areas. Therefore, in order to represent membership functions for the M ÿ N brightness of background images, we use three membership functions for low (L), medium (M), ÿ (7) F “ Dt pi, jq 2 and high (H), as shown in Figure 7a. i “1 j “1
Sensors 2016, 16, 453
9 of 31
Sensors 2016, 16, 453
9 of 31
where: B(i, j) and It (i, j) are the gray level value at the position (i, j) of a generated background and input image, respectively; and T is a threshold, which was experimentally determined with the images (which were not used for all the experiments of performance measurements shown in Section 3). With these images, according to various T, the pixels of candidate human areas could be distinguished in the image (Dt (i, j)) based on Equation (6). By human observation on these generated images according to various T, optimal T (80) was determined, with which the candidate human areas could be most distinctive in the image. In addition, the same value of T was used for all the experiments in Section 3. Dt (i, j) is determined by the absolute difference value between the background and input images. If the absolute difference value is larger than T, the position (i, j) is determined as the pixel of candidate area. In the Equation (7), M and N are the width and height of the image, respectively. F2 is the sum of the difference values between the background and input images. If F2 is higher than 0, Figure system for the than proposed extract adaptive thresholdthe for ROI extraction. the intensity of6.aFuzzy human is higher thatmethod of the to background; otherwise, intensity of a human is lower than that of background. The term t indicates the frame number of the input image in the sequence. For determining an effective threshold for obtaining regions of interest (ROI) which represent a For determining an effective threshold for obtaining regions of interest (ROI) which represent human candidate area, the brightness of a background image is the main concern for the a human candidate area, the brightness of a background image is the main concern for the background background subtraction technique. If the brightness difference between a human and the subtraction technique. If the brightness difference between a human and the background is too small background is too small and the threshold for subtraction is too large, it is hard to extract a human and the threshold for subtraction is too large, it is hard to extract a human area. On the other hand, if area. On the other hand, if the brightness difference is too large and the threshold for subtraction is thetoo brightness is too large and the for subtraction is too small, it is areas also difficult small, itdifference is also difficult to define a threshold human area because other neighboring can be to define a humanasarea because other neighboring areas to canrepresent be determined as human areas. Therefore, determined human areas. Therefore, in order membership functions for the in order to represent membership functions for the brightness of background images, we use(M), three brightness of background images, we use three membership functions for low (L), medium membership functions forinlow (L), 7a. medium (M), and high (H), as shown in Figure 7a. and high (H), as shown Figure
(a)
(b)
(c) Figure 7. Membership functions for fuzzy system to extract adaptive threshold for ROI extraction: (a) Figure 7. Membership functions for fuzzy system to extract adaptive threshold for ROI extraction: average value of the background image; (b) sum of difference values between background and input (a) average value of the background image; (b) sum of difference values between background and image; and (c) obtaining the output optimal threshold. input image; and (c) obtaining the output optimal threshold.
Sensors 2016, 16, 453
10 of 31
To distribute the intensity conditions of humans compared to the background, the sum of the difference values is used with two membership functions for low (L) and high (H), as shown in Figure 7b. For an output optimal threshold which is used to ROI extraction, five membership functions are used. There are five types of linguistic values; very low (VL), low (L), medium (M), high (H), very high (VH), as illustrated in Figure 7c. An adaptive threshold to be used in the ROI extraction procedure is calculated with the output optimal threshold of the fuzzy system (see the details in Section 2.3.3). That is, the output threshold of the fuzzy system determines the human candidate area (see the details in Section 2.3.4). The linear (triangular) function is used because it has been widely adopted in fuzzy systems considering the problem complexity and its fast processing speed [41–43]. Like conventional researches using fuzzy system [41–46], the gradient and y-intercept of each linear function are manually determined based on the experience of human developer. Equations (8)–(10) show the mathematical definitions of Figure 7a–c, respectively: $ ’ ´2.5x ` 1 p0 ď x ď 0.4q : Low ’ ’ & 5x ´ 1.5 p0.3 ď x ď 0.5q : Medium y“ (8) ’ ´5x ` 3.5 p0.5 ď x ď 0.7q : Medium ’ ’ % 2.5x ´ 1.5 p0.6 ď x ď 1q : High $ ˆ ˙ 20 ’ ’ x ` 1 p0 ď x ď 0.65q : Low & ´ ˆ 13 ˙ y“ (9) 20 7 ’ ’ x´ p0.35 ď x ď 1q : High % 13 13 $ 1 p0 ď x ď 0.1q : Very Low ’ ’ ’ ’ ’ ´10x ` 2 p0.1 ď x ď 0.2q : Very Low ’ ’ ’ ’ 10x ´ 1.5 p0.15 ď x ď 0.25q : Low ’ ’ ’ ’ ’ ´10x ` 3.5 p0.25 ď x ď 0.35q : Low ’ ’ & 10x ´ 3 p0.3 ď x ď 0.4q : Medium (10) y“ ’ ´10x ` 5 p0.4 ď x ď 0.5q : Medium ’ ’ ’ ’ 10x ´ 4.5 p0.45 ď x ď 0.55q : High ’ ’ ’ ’ ’ ´10x ` 6.5 p0.55 ď x ď 0.65q : High ’ ’ ’ ’ 10x ´ 6 p0.6 ď x ď 0.7q : Very High ’ ’ % 1 p0.7 ď x ď 1q : Very High 2.3.2. Fuzzy Rules with Considering the Characteristics of Background and Input Images As described in Table 1, if the average value of the background image (F1 ) and the sum of the difference values between the background and input images (F2 ) is low (L) and high (H), respectively, the possibility of difference between the background and input images is very high (VH). Therefore, the output threshold (p) is determined with a large value. For high F1 and high F2 , the possibility of difference between background and input image is very low (VL). That is, in this case, the intensity of a human is very similar to that of the background, and it has high pixel value. Based on these fuzzy rules, the output threshold (p) is determined. Table 1. Fuzzy rules based on the characteristics of the background and input images. Input 1 (F 1 )
Input 2 (F 2 )
Output (p)
L L M M H H
L H L H L H
L VH M M H VL
Sensors 2016, 16, 453
11 of 31
Sensors 2016, 16, 453
11 of 31
2.3.3. Decision of the Optimal Threshold Using Defuzzification As illustrated in Figure 8, based on the F1 and F2, four output values (f1(L) and f1(M) for F1, and
2.3.3. f2Decision of the Optimal Threshold Using Defuzzification (L) and f2(H) for F2) are calculated. For example, in the first instance, we assume that f1(L), f1(M), f2(L), and f2(H) obtained 1 (0.32) and F2 (0.57) are 0.2 (L), 0.1 (M), 0.136 (L), and As illustrated in Figureby 8,output based values on thefor F1 Fand F2 , four output values (f 1 (L) and f 1 (M) for F1 , and 0.358 (H), respectively, as presented in Figure 8 andin Table With these four (0.2that (L), f0.1(L), (M), f 2 (L) and f 2 (H) for F2 ) are calculated. For example, the 2. first instance, wevalues assume f 1 (M), 1 0.136 (L), and 0.358 (H)), we can obtain four combinations of ((0.2 (L), 0.136 (L)), (0.2 (L), 0.358 (H)), f 2 (L), and f 2 (H) obtained by output values for F1 (0.32) and F2 (0.57) are 0.2 (L), 0.1 (M), 0.136 (L), and (0.1 (M), 0.136 (L)), (0.1 (M), 0.358 (H)) as shown in Table 2. 0.358 (H), respectively, as presented in Figure 8 and Table 2. With these four values (0.2 (L), 0.1 (M), Based on the fuzzy rules of Table 1 and assuming that we use min rule, we can obtain the four 0.136values (L), and 0.358 (H)), we 2. can four0.2combinations of ((0.2 with (L), 0.136 (L)), (0.2 (L), 0.358 as shown in Table Forobtain example, (VH) can be obtained the second combination of (H)), (0.1 (M), (L)),(H)). (0.10.1 (M), (H)) as shown (0.2 0.136 (L), 0.358 (M)0.358 can be obtained with in theTable fourth2.combination of (0.1 (M), 0.358 (H).
(a)
(b)
Figure 8. The first example for output of membership functions for fuzzy system: outputs by (a) F1
Figure 8. The first example for output of membership functions for fuzzy system: outputs by (a) F1 and (b) F2. and (b) F2 . Table 2. The first example for fuzzy rules and min rule based on the characteristics of the
Tablebackground 2. The firstand example for fuzzy rules and min rule based on the characteristics of the background input images. and input images. f1(·) f2(·) Value 0.2 (L) 0.136 (L) 0.136 (L) f 1 (¨¨ ) f 2 (¨¨ ) Value 0.2 (L) 0.358 (H) 0.2 (VH) 0.2 0.136 0.136 (L) 0.1(L) (M) 0.136 (L) (L) 0.1 (M) 0.2 (L) 0.358 (H) 0.2 (VH) 0.1 (M) 0.358 (H) 0.1 (M) 0.1 (M) 0.136 (L) 0.1 (M) 0.1 (M) 0.358 (H) 0.1 (M)
Based on the fuzzy rules of Table 1 and assuming that we use min rule, we can obtain the four values as shown in Table 2. For example, 0.2 (VH) can be obtained with the second combination of (0.2 (L), 0.358 (H)). 0.1 (M) can be obtained with the fourth combination of (0.1 (M), 0.358 (H)). With these four values of (0.136 (L), 0.2 (VH), 0.1 (M), 0.1 (M)), we can define the region (R depicted by bold black lines of Figure 9) for obtaining the fuzzy output value. As shown in Figure 9, in the proposed method, center of gravity (COG) is used for the defuzzification method [44–46]. From the output membership function, which is illustrated in Figure 7 (as presented in Section 2.3.2), an output value called the output optimal threshold (p of the Equation (11)) is calculated as the gravity position of the region (R). As an example in the second instance, we assume that f 1 (M), f 1 (H), f 2 (L), and f 2 (H) obtained by output values for F1 (0.65) and F2 (0.4) are 0.25 (M), 0.125 (H), 0.394 (L), and 0.104 (H), respectively, as presented in Figure andoptimal Table 3. With these values (0.25 (M), 0.125 (H), 0.394 (L), Figure 9. The first example for10 output threshold basedfour on the COG defuzzification method. and 0.104 (H)), we can obtain four combinations of ((0.25 (M), 0.394 (L)), (0.25 (M), 0.104 (H)), (0.125 (H), 0.394 (L)), (0.125 (H), 0.104 (H))) as shown in Table 3. Based on the fuzzy rules of Table 1 and assuming that we use min rule, we can obtain the four values as shown in Table 3. For example, 0.25 (M) can be obtained with the first combination of (0.25 (M), 0.394 (L)). 0.125 (H) can be obtained with the third combination of (0.125 (H), 0.394 (L)). With these four values of (0.25 (M), 0.104 (M), 0.125 (H), 0.104 (VL)), we can define the region (R depicted by bold black lines of Figure 11) for obtaining the fuzzy output value. Based on the COG
Figure 8. The first example for output of membership functions for fuzzy system: outputs by (a) F1 and (b) F2. Table 2. The first example for fuzzy rules and min rule based on the characteristics of the 12 of 31 background and input images.
Sensors 2016, 16, 453
With 0.1 (M)), we can define the region (R 12 of 31 f1(·) (L), 0.2 (VH), f2(·) 0.1 (M), Value Sensors 2016, 16, 453 these four values of (0.136
depicted by bold black lines of Figure output value. As shown in Figure 9, 0.2 (L)9) for obtaining 0.136 (L) the fuzzy 0.136 (L) in the proposed method, center 0.2 of gravity (COG) is used for the (L) 0.358 (H) 0.2 (VH) defuzzification method [44–46]. From the output membership function, which is illustrated in Figure 7 (as presented in Section as the (M) 0.136 (L) defuzzification method, the output0.1optimal threshold (p0.1 of(M) the Equation (11)) is calculated 2.3.2), an output value called the0.1 output optimal threshold0.1 ( (M) of the Equation (11)) is calculated as (M) 0.358 (H) gravity position of the region (R), as illustrated in Figure 11. the gravity position of the region (R). As an example in the second instance, we assume that f1(M), f1(H), f2(L), and f2(H) obtained by output values for F1 (0.65) and F2 (0.4) are 0.25 (M), 0.125 (H), 0.394 (L), and 0.104 (H), respectively, as presented in Figure 10 and Table 3. With these four values (0.25 (M), 0.125 (H), 0.394 (L), and 0.104 (H)), we can obtain four combinations of ((0.25 (M), 0.394 (L)), (0.25 (M), 0.104 (H)), (0.125 (H), 0.394 (L)), (0.125 (H), 0.104 (H))) as shown in Table 3. Sensors 2016, 16, 453 12 of 31 With these four values of (0.136 (L), 0.2 (VH), 0.1 (M), 0.1 (M)), we can define the region (R depicted by bold black lines of Figure 9) for obtaining the fuzzy output value. As shown in Figure 9, in the proposed method, center of gravity (COG) is used for the defuzzification method [44–46]. From the output membership function, which is illustrated in Figure 7 (as presented in Section 2.3.2), an output value called the output optimal threshold ( of the Equation (11)) is calculated as the gravity position of the region (R). As an example in the second instance, we assume that f1(M), f1(H), f2(L), and f2(H) obtained by output values for F1 (0.65) and F2 (0.4) are 0.25 (M), 0.125 (H), 0.394 (L), and 0.104 (H), respectively, as presented in Figure 10 and Table 3. With these four values (0.25 (M), Figure 0.125 (H), 0.394 and for 0.104 (H)), we can obtain based four combinations of ((0.25 (M), 0.394 (L)), 9. The first (L), example output optimal threshold on the COG defuzzification method. Figure 9. The first example for output optimal threshold based on the COG defuzzification method. (0.25 (M), 0.104 (H)), (0.125 (H), 0.394 (L)), (0.125 (H), 0.104 (H))) as shown in Table 3.
(a)
(b)
Figure 10. The second example for output of membership functions for fuzzy system: outputs by (a) F1 and (b) F2.
Based on the fuzzy rules of Table 1 and assuming that we use min rule, we can obtain the four values as shown in Table 3. For example, 0.25 (M) can be obtained with the first combination of (0.25 (M), 0.394 (L)). 0.125 (H) can be obtained with the third combination of (0.125 (H), 0.394 (L)). Table 3. The second example for fuzzy rules and min rule based on the characteristics of the background and input images.
f1(·) f2(·) Value (b) 0.25 (M) 0.394 (L) 0.25 (M) Figure 10. The second example for(M) output of0.104 membership functions for fuzzy system: outputs by (a) 0.25 (H) 0.104 (M) Figure 10. The second example for output of membership functions for fuzzy system: outputs by (a) F1 F1 and (b) F2. 0.125 (H) 0.394 (L) 0.125 (H) and (b) F2 . 0.125 (H) 0.104 (H) 0.104 (VL) Based on the fuzzy rules of Table 1 and assuming that we use min rule, we can obtain the four values as shown in Table 3. For example, 0.25 (M) can be obtained with the first combination of (0.25 (M), 0.394 (L)). 0.125 (H) can be obtained with the third combination of (0.125 (H), 0.394 (L)). (a)
Table 3. The second example for fuzzy rules and min rule based on the characteristics of the background and input images.
f1(·) 0.25 (M) 0.25 (M) 0.125 (H) 0.125 (H)
f2(·) 0.394 (L) 0.104 (H) 0.394 (L) 0.104 (H)
Value 0.25 (M) 0.104 (M) 0.125 (H) 0.104 (VL)
Figure 11. The second example for output optimal threshold based on the COG defuzzification
Figure 11. The second example for output optimal threshold based on the COG defuzzification method. method. Table 3. The second example for fuzzy rules and min rule based on the characteristics of the background and input images. f 1 (¨¨ )
f 2 (¨¨ )
Value
0.25 (M) 0.25 (M) 0.125 (H) 0.125 (H)
0.394 (L) 0.104 (H) 0.394 (L) 0.104 (H)
0.25 (M) 0.104 (M) 0.125 (H) 0.104 (VL)
Figure 11. The second example for output optimal threshold based on the COG defuzzification method.
Sensors 2016, 16, 453
13 of 31
With these four values of (0.25 (M), 0.104 (M), 0.125 (H), 0.104 (VL)), we can define the region (R depicted Sensors 2016, 16,by 453 bold black lines of Figure 11) for obtaining the fuzzy output value. Based on 13 ofthe 31 COG defuzzification method, the output optimal threshold ( of the Equation (11)) is calculated as the gravity position of the region (R), as illustrated in Figure 11. 2.3.4. Generating a Difference Image 2.3.4. Generating a Difference Image After extracting the optimal threshold by defuzzification, the threshold for human detection is calculated on the After based extracting theEquation optimal (11): threshold by defuzzification, the threshold for human detection is Θth “ p¨ α ` β (11) calculated based on the Equation (11): (11)β = system + and it has the range from 0 to 1; α and where: p is the optimal threshold from the fuzzy are constants determined experimentally; and Θth is a threshold used to create the difference image where: is the optimal threshold from the fuzzy system and it has the range from 0 to 1; and presenting candidate human areas. The range of Θth is from β to α ` β. The operation for generating are constants determined experimentally; and is a threshold used to create the difference a difference image is presented in Equation (12): image presenting candidate human areas. The range of is from to + . The operation for # generating a difference image is presented in Equation (12): 1 i f |Ik pi, jq ´ B pi, jq| ą Θth Dt pi, jq “ (12) ) − ( , )| > | (0, otherwise (, )= 1 (12) 0 ℎ where: B(i, B(i, j)j)and andIkI(i, arethe the gray level values at the position a generated background are gray level values at the position (i, j)(i,ofj)aof generated background and k (i,j) j) and input image, respectively; D (i, j) is a binarized image called a difference image in the proposed input image, respectively; Dt(i, j) t is a binarized image called a difference image in the method; and t indicates the frame number of of the the input input image image in in the the sequence. sequence. As illustrated thethe difference images presenting candidate human areas are illustrated in inFigures Figures12 12and and1313, difference images presenting candidate human areas correctly created by using the adaptive threshold. Even though the intensity of a human is darker than are correctly created by using the adaptive threshold. Even though the intensity of a human is that of the the candidatethe area can be presented as presented shown in Figure 13. in Figure 13. darker thanbackground, that of the background, candidate area can be as shown
(a)
(b)
(c) Figure 12. 12. Example Exampleofofa difference a difference image: (a) input image; (b) background (c) difference Figure image: (a) input image; (b) background image;image; (c) difference image. image.
Sensors 2016, 16, 453
14 of 31
Sensors 2016, 16, 453
14 of 31
(a)
(b)
(c) Figure 13. 13. Example Exampleofofa difference a difference image: (a) input image; (b) background (c) difference Figure image: (a) input image; (b) background image;image; (c) difference image. image.
2.4. Confirmation of Human Region 2.4. Confirmation of Human Region In the process of confirming a human area from a candidate area, several methods are used. First, In the process of confirming a human area from a candidate area, several methods are used. morphological operation (dilation and erosion) and labeling are applied to the difference image to First, morphological operation (dilation and erosion) and labeling are applied to the difference reduce incorrect human areas. Because the morphological operation including dilation and erosion on image to reduce incorrect human areas. Because the morphological operation including dilation and the candidate area can reduce the small-sized noises and combine the incorrectly separated regions [40]. erosion on the candidate area can reduce the small-sized noises and combine the incorrectly Through the component labeling, the pixel positions of isolated candidate area can be located. Then, separated regions [40]. Through the component labeling, the pixel positions of isolated candidate the pixel number of the candidate area can be counted [40]. Based on the pixel number, small or large area can be located. Then, the pixel number of the candidate area can be counted [40]. Based on the areas (which is difficult to be regarded as human area) can be removed. pixel number, small or large areas (which is difficult to be regarded as human area) can be removed. Then, separated small areas can be connected, and information concerning the candidate area can Then, separated small areas can be connected, and information concerning the candidate area be more distinctive. However, when two or more people are connected, it is defined as one candidate can be more distinctive. However, when two or more people are connected, it is defined as one region. Therefore, a histogram is used to separate the regions which include two or more humans candidate region. Therefore, a histogram is used to separate the regions which include two or more region (see the details in Section 2.4.1). humans region (see the details in Section 2.4.1). 2.4.1. Vertical and Horizontal Separation of Candidate Region 2.4.1. Vertical and Horizontal Separation of Candidate Region If the condition (width, height, size, and ratio of height to width) of the candidate region is not If the condition (width, size,procedure and ratioisofperformed height to width) of the candidate region is not satisfied with thresholds, the height, separation to the region. The position where the satisfied with thresholds, the separation procedure is performed to the region. The position where procedure of separation should be performed is determined by information in the histogram, as shown theFigures procedure of separation be performed determined by information in the histogram, in 14b and 15b. If the should minimum value of theishistogram is lower than a parameter, separation as is shown in Figures 14b and 15b. If the minimum value of the histogram is lower than a parameter, performed at the position and the candidate region is divided into two regions. Using Equations (13) separation is horizontal performedand at the position and the are candidate regionpresented is divided into two regions. Using and (14), the vertical histograms respectively [37,47,48]: Equations (13) and (14), the horizontal and vertical histograms are respectively presented [37,47,48]: Dty ´1
Hx rxs “ [ ]=
ÿ y “0
F pDt px, yqq ( ( , ))
(13) (13)
Sensors 2016, 16, 453 Sensors 2016, 16, 453
15 of 31 15 of 31
Hy rys “
Dÿ tx ´1
[ ]= ( ( , )) F pDt px, yqq
(14) (14)
x“0
where: Dt(x, y) is the pixel value at a location (x, y) of the candidate region, such that if Dt(x, y) is true, where: DF(·) is the pixel value a locationto(x, y) of candidate region, such if Dand is true, t (x,isy)assigned t (x, y) to one andatotherwise zero; Dtythe and Dtx are respectively thethat height width of the F(¨ ) is assigned to one and otherwise to zero; D and D are respectively the height and width ty 15b, where tx candidate region, as in Figures 14b and Cx and Cy are the location x and of y the of the candidate region, as in Figures andrespectively; 15b, where Cx Cy are the x and yofofthe theinput candidate candidate region in the 14b image, andand t indicates thelocation frame number image in region inthe thesequence. image, respectively; and t indicates the frame number of the input image in the sequence.
(a)
(b)
(c)
Figure 14. Separation of the candidate region within an input image based on the horizontal Figure 14. Separation of the candidate region within an input image based on the horizontal histogram: histogram: (a) input image and detected candidate region; (b) detected candidate region and its (a) input image and detected candidate region; (b) detected candidate region and its horizontal horizontal histogram; and (c) the division result of the candidate region. histogram; and (c) the division result of the candidate region.
If the minimum value of Hx[x] or Hy[y], which is illustrated in Figures 14b or 15b, is lower than If the of Hx[x]region or Hy[y], which is illustrated in Figure 14b or Figure 15b, is lower theminimum threshold,value the candidate is divided into two regions at the position, as illustrated in Figure than the14c threshold, the15c. candidate region is divided regions at the as illustrated or Figure However, if two people into are two located closely in position, the diagonal direction or in Figure 14c or Figure 15c.down, However, if two people are located closely in the diagonal direction by overlapped up and it is detected as one candidate region, which may not be separated horizontal vertical information. shown in Figurewhich 16b, the minimum of Hy[y] or overlapped up or and down,histogram it is detected as one As candidate region, may not be value separated is higher than a threshold, Cy, which is position of the 16b, minimum value of Hy[y], by horizontal or vertical histogramand information. Asthe shown in Figure the minimum value is of not located near the middle position of the candidate region, even though the region includes Hy[y] is higher than a threshold, and Cy, which is the position of the minimum value of Hy[y], is nottwo people. this case, if the conditions of region, the Equation (15) arethe satisfied, the candidate region is located near the In middle position of the candidate even though region includes two people. separated two partsofhorizontally at the positionthe of candidate the candidate region as illustrated In this case, if the as conditions the Equation (15)middle are satisfied, region is separated as in Figure 16c, else the candidate region is not separated:
Sensors 2016, 16, 453
16 of 31
two parts horizontally at the middle position of the candidate region as illustrated in Figure 16c, else theSensors candidate is not separated: 2016, 16,region 453 16 of 31 Sensors 2016, 16, 453
16 of 31
pDty ą thr1 q and pDtx ă thr2 q and pDty ˆ Dtx ą thr3 q and pDty {Dtx ą thr4 q ( D thr ) and ( D thr ) and ( D D thr ) and ( D / D thr ) ( Dtyty thr11 ) and ( Dtxtx thr22 ) and ( Dtyty Dtxtx thr33 ) and ( Dtyty / Dtxtx thr44 ) where: DD Dtxtxare arerespectively respectively height and width of the candidate tytyand where: and D the the height and width of the candidate region. region.
(15) (15) (15)
where: Dty and Dtx are respectively the height and width of the candidate region.
(a) (a)
(b) (b)
(c) (c)
Figure 15. Separation of the candidate region within an input image based on the vertical histogram: Figure 15.15. Separation an input inputimage imagebased basedononthe the vertical histogram: Figure Separationofofthe thecandidate candidate region region within within an vertical histogram: (a) input image and detected candidate region; (b) detected candidate region and its vertical (a)(a) input image and detected candidate region; (b) detected candidate region and its vertical input image and detected candidate region; (b) detected candidate region and its histogram; vertical histogram; and (c) the division result of the candidate region. and (c) the division of the result candidate histogram; and (c)result the division of theregion. candidate region.
(a) (a) Figure 16. Cont. Figure 16. 16. Cont. Figure Cont.
Sensors 2016, 16, 453
17 of 31
Sensors 2016, 16, 453
17 of 31
Sensors 2016, 16, 453
17 of 31
(b)
(c)
Figure 16. Separation of the candidate region within an input image based on the width, height, size Figure 16. Separation of the candidate region within an input image based on the width, height, size and ratio: (a) input image and detected candidate region; (b) detected candidate region and its and ratio: (a) input image and detected candidate region; (b) detected candidate region and its vertical (b) of the candidate region. (c) vertical histogram; and (c) the division result histogram; and (c) the division result of the candidate region.
Figure 16. Separation of the candidate region within an input image based on the width, height, size In order to consider the cases of three or more people in the similar place with occlusion, if the and ratio: (a) input image and detected candidate region; (b) detected candidate region and its horizontal width of thethe detected boxthree is larger thanpeople threshold, our algorithm whether there In order to consider or more in the similar placechecks with occlusion, if the vertical histogram; andcases (c) the of division result of the candidate region. exist two or more than two values of Hx[x] of the Equation (13) which are lower than the threshold. horizontal width of the detected box is larger than threshold, our algorithm checks whether there exist order to consider of three ordivided more people the similar place withthree occlusion, if theat the If so, theIncandidate regionthe is cases horizontally into in three or more than regions two or more than width two values Hx[x] box of the Equation whichour arealgorithm lower than thewhether threshold. If so, the horizontal of the of detected is larger than (13) threshold, checks there positions of the values of Hx[x]. Same method is applied based on Hy[y] of the Equation (14) for the exist two orismore than two values of Hx[x] the Equation which areregions lower than the threshold. candidate region horizontally divided into of three or more(13) than three at the positions of the vertical division of the detected box. Figure 17divided showsinto the examples of separation ofregions one detected box If Hx[x]. so, the Same candidate regionisisapplied horizontally more than at thedivision values of method based on Hy[y] ofthree the or Equation (14)three for the vertical into three or four ones by our method. positions of the values of Hx[x]. Same method is applied based on Hy[y] of the Equation (14) for the of the detected box. Figure 17 shows the examples of separation of one detected box into three or four vertical division of the detected box. Figure 17 shows the examples of separation of one detected box ones by our method.
into three or four ones by our method.
(a) (a)
(c)(c)
(b) (b)
(d)(d)
Figure 17. The examples of separation of one detected box into three (a,c) or four (b,d) ones by our
Figure Theexamples examples of of separation boxbox intointo three (a,c)(a,c) or four (b,d) (b,d) ones by ourby Figure 17.17.The separationofofone onedetected detected three or four ones method. ourmethod. method.
Sensors 2016, 16, 453
18 of 31
Sensors 2016, 16, 453
18 of 31
Our novel ideas in this Section are to segment two people with occlusion in diagonal direction 16) and handle cases two of three more peopleinindiagonal the similar place Our(Figure novel ideas in this Sectionwith are tothe segment peopleorwith occlusion direction (Figure 16) and handle17). withAll thethe cases of three or more in the similar placethe withdetected occlusion with occlusion (Figure parameters used inpeople the method of dividing box (Figure 17). All the parameters used in the the method dividing thenot detected box all were were experimentally determined with images of (which were used for theexperimentally experiments of determinedmeasurements with the images (which were not for allWith the these experiments performance shown in Section 3) inused advance. images, of theperformance ground-truth measurements shown in Section 3) in advance. With these images, the ground-truth areas human areas of human were manually depicted. In addition, according to various parameters, of the human were manually depicted. In addition, according to various parameters, the human areas could areas could be automatically detected. With the ground-truth and automatically detected areas,bewe With theand ground-truth detectedOptimal areas, we can calculate canautomatically calculate thedetected. PPV, Sensitivity, F1-score ofand theautomatically Equations (17)–(19). parameters were the PPV, Sensitivity, and F1-score of the Equations (17)–(19). Optimal parameters were determined, determined, with which the highest PPV, Sensitivity, and F-score of human detection were obtained. with which the highest PPV, Sensitivity, and F-score of human detection were obtained. In addition, In addition, the same parameters were used for all the experiments in Section 3. the same parameters were used for all the experiments in Section 3. 2.4.2. Confirmation of Human Area Based on Camera Viewing Direction 2.4.2. Confirmation of Human Area Based on Camera Viewing Direction To remove the incorrect human areas, component labeling [40] and size filtering are applied to To remove the incorrect human areas, component labeling [40] and size filtering are applied to the binarized image. If the size of the candidate region is too small or large, the region is determined the binarized image. If the size of the candidate region is too small or large, the region is determined to be an incorrect human area and removed. Then, candidates for human areas remain, but some parts to be an incorrect human area and removed. Then, candidates for human areas remain, but some are separated and located closely as shown in Figure 18a (blue ellipse). To define the regions as one parts are separated and located closely as shown in Figure 18a (blue ellipse). To define the regions as object, a procedure, which connects the regions, is applied to the binarized image based on the size, one object, a procedure, which connects the regions, is applied to the binarized image based on the thesize, horizontal and diagonal distances between center center positions of the of two and the camera the horizontal and diagonal distances between positions theobjects, two objects, and the viewing direction. camera viewing direction.
(a)
(b) Figure 18. Separation of the candidate region within an input image: (a) input image and candidate Figure 18. Separation of the candidate region within an input image: (a) input image and candidate human regions; (b) result of connecting separated regions. human regions; (b) result of connecting separated regions.
If the size of a region and distances are satisfied with the conditions, these two regions are If the size a region distances areshown satisfied with 18b. the conditions, these regions connected andofare definedand as one region, as in Figure In general, the sizetwo of the humanare connected areupper defined region,isas shown in Figure 18b.human In general, the of the human capturedand at the areaas ofone the image smaller than that of the located in size the bottom area, captured at the upper area of the image is smaller than that of the human located in the bottom area, as shown in Figure 19 due to the Z distance between the object and camera. Therefore, if there are as shown Figure 19 due to the Z distance between and for camera. Therefore, there are small smallinparts located in the upper area of the image, the the object procedure connection is not if performed. On parts located in the upper area of the image, the procedure for connection is not performed. On the
Sensors 2016, 16, 453 Sensors 2016, 2016, 16, 16, 453 453 Sensors
19 of 31 19 of of 31 31 19
other hand, if there areare small parts located inin lower are is performed. the other other hand, there are small parts located in lower areofof ofthe theimage, image,the theprocedure procedure is is performed. the hand, ifif there small parts located lower are the image, the procedure This procedure is implemented iteratively to all detected candidate regions. This procedure procedure is is implemented implemented iteratively iteratively to to all all detected detected candidate candidate regions. regions. This
Figure 19. 19. Example of of different different sizes of human areas because 19. Example different sizes sizes of of human human areas areas because because of of camera camera viewing viewing direction. direction. Figure of camera viewing direction.
Figure 20 20 shows shows the the example example of of human human detection detection by by this this procedure. procedure. After After the the blobs blobs are are merged, Figure Figure 20 shows the example of human detection by this procedure. After the blobs are merged, merged, the method is not applied again to the obtained blobs, and final detected area of human is obtained the method is is not not applied appliedagain againto tothe theobtained obtainedblobs, blobs,and andfinal finaldetected detected area human is obtained the method area of of human is obtained as as shown shown in Figure Figure 20d. Our algorithm can handle handle with the cases cases that more more than(multiple) two (multiple) (multiple) as in algorithm the that than two shown in Figure 20d.20d. OurOur algorithm can can handle withwith the cases that more than two blobs blobs should should be merged. merged. blobs be should be merged.
(a) (a)
(b) (b)
(c) (c)
(d) (d)
Figure 20. 20. Example Example of of procedures procedures for for detecting detecting human human regions: regions: (a) (a) input input image; image; (b) (b) binarized binarized image image Figure Figure 20. Example of procedures for detecting human regions: (a) input image; (b) binarized image by background background subtraction; subtraction; (c) (c) result result of of connecting connecting separated separated candidate candidate regions; regions; and and (d) (d) Final Final result result by by background subtraction; (c) result of connecting separated candidate regions; and (d) Final result of of detected detected human human area. area. of detected human area.
Sensors 2016, 16, 453 Sensors 2016, 16, 453
20 of 31
20 of 31
3. Experimental Results
3. Experimental Results 3.1. Dataset Description 3.1. Dataset For Description the experiments, 15 thermal databases collected by us were utilized. The databases for the experiments are captured a FLIR databases Tau 2 (in the wavelength 7.5–13.5The μm)databases thermal camera For the experiments, 15by thermal collected by usrange wereofutilized. for the [49] equipped with a 19 mm lens. In our research, we use the assumption that the camera is experiments are captured by a FLIR Tau 2 (in the wavelength range of 7.5–13.5 µm) thermal position camera [49] fixed. Our camera is tilted and set at the height of about 6~8 m from the ground. The distance equipped with a 19 mm lens. In our research, we use the assumption that the camera position is fixed. camera approximately 21~27 m. The of view the camera in the Ourbetween camera the is tilted andand set object at the is height of about 6~8 m from thefields ground. Thefor distance between the horizontal and vertical directions are 32° and 26°, respectively. These specifications of height and camera and object is approximately 21~27 m. The fields of view for the camera in the horizontal and distance have been widely used in conventional surveillance camera system, and we collected our vertical directions are 32˝ and 26˝ , respectively. These specifications of height and distance have been databases based on these specifications. The size of images is 640 × 480 pixels of 8 bits. Each widely used in conventional surveillance camera system, and we collected our databases based on these database contains between 905 and 5599 frames. The total number of images for all databases is specifications. The size of images is 640 ˆ 480 pixels of 8 bits. Each database contains between 905 and 45,546. The sizes of humans in width and height range from 28 to 117 and from 57 to 265 pixels, 5599respectively. frames. TheIntotal number of images for all databases is 45,546. The sizes of humans in width and our research, all the parameters of our system were set with the dataset of 985 height range from 28 to 117 and from 57the to 265 pixels, respectively. In our(Table research, all the parameters images. This dataset is different from 15 databases of 45,546 images 4) which are used for of our system were set with the dataset of 985 images. This dataset is different from the 15 databases testing our system. For validation of applicability of the proposed method to the various databases, of 45,546 imagesdatabases (Table 4) at which are used for testing system. For of times, applicability we captured different temperatures andour conditions, suchvalidation as different weatherof the conditions, proposed method to the various databases, we captured databases at different temperatures and views, and places. The frames in the databases include winter and summer conditions, such as different times, temperatures between −6 °C and weather 33 °C. conditions, views, and places. The frames in the databases ˝ C and 33 ˝ C. include In winter andthe summer between general, humantemperatures region is brighter than´6 that of the background in frames captured by a thermal camera. if the temperature of the ground too high, theinbrightness of humanby In general, theHowever, human region is brighter than that of theisbackground frames captured region is darker than that of the background in the frames. The reason for this is that the thermal a thermal camera. However, if the temperature of the ground is too high, the brightness of human region camerathan performs to create anframes. 8 bit image, presented in the of 0 camera to 255 is darker that ofautomatically the background in the The which reasonisfor this is that therange thermal pixel value. performs automatically to create an 8 bit image, which is presented in the range of 0 to 255 pixel value. Databases I–VI, VIII–XI, are obtained a thermal camera placed 6 m above the Databases I–VI, VIII–XI, andand XIIIXIII are obtained by a by thermal camera placed 6 m above the ground ground level, with the Z distance from object to camera being about 25 m in most cases. Database level, with the Z distance from object to camera being about 25 m in most cases. Database VII only VII only includes an indoor environment. Theisdatabase captured from placed a camera placed includes frames of anframes indoorofenvironment. The database capturedisfrom a camera 2m above 2 m above the ground level, with the Z distance from the object to the camera being 12 m. Databases the ground level, with the Z distance from the object to the camera being 12 m. Databases XII, XIV, and XII, XIV, and XV are obtained by a camera placed 4.5 m above the ground level, with the Z distance XV are obtained by a camera placed 4.5 m above the ground level, with the Z distance from the object from the object to the camera being approximately 18 to 35 m. There are various behaviors of to the camera being approximately 18 to 35 m. There are various behaviors of people in frames, such people in frames, such as walking, running, standing, sitting, waving, kicking, and punching. as walking, running, standing, sitting, waving, kicking, and punching. Motionless people, including Motionless people, including people standing or sitting, are presented in databases I–VIII and XIV. people standing or sitting, are presented in databases I–VIII and XIV. A detailed description for each A detailed description for each database is presented in Table 4, and the examples of the databases database is presented Table 4, and the examples of the databases are shown in Figure 21. are shown in Figurein21.
(a)
(b)
(c)
(d)
(e)
(f)
Figure 21. Cont. Figure 21. Cont.
Sensors 2016, 16, 453
21 of 31
Sensors 2016, 16, 453
21 of 31
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n)
(o)
Figure 21. Examples of databases: (a) database I; (b) database II; (c) database III; (d) database IV; (e) Figure 21. Examples of databases: (a) database I; (b) database II; (c) database III; (d) database IV; database V; (f) database VI; (g) database VII; (h) database VIII; (i) database IX; (j) database X; (k) (e) database V; (f) database VI; (g) database VII; (h) database VIII; (i) database IX; (j) database X; database XI; (l) database XII; (m) database XIII; (n) database XIV; and (o) database XV. (k) database XI; (l) database XII; (m) database XIII; (n) database XIV; and (o) database XV. Table 4. Descriptions of fifteen databases. Table 4. Descriptions of fifteen databases. Database Database
I (see in Figure 21a)
I (see in Figure 21a)
II (see in II Figure (see in 21b) Figure 21b)
III (see in
IIIFigure (see in 21c) Figure 21c)
(see IVIV (see in in Figure 21d) Figure 21d)
(see VV (see in in Figure 21e) Figure 21e) VI (see in Figure 21f)
Condition
Detail Description -The behaviors of the humans include walking, running, standing, and Detail Description sitting. The behaviors of the humans include walking, running, standing, during the day, snowy, wind 2 ˝ C, morning, average ´1 ˝ C -The sequence was captured during a little snowfall. and sitting. 3.6 mph during the day, snowy, -The -intensity of the human is influenced material of clothes. The sequence was captured duringby a little snowfall. wind 3.6 mph -The -behaviors of the humans include walking, running, standing, and The intensity of the human is influenced by material of clothes. −2 °C, night, average −3 °C sitting. The behaviors of the humans include walking, running, standing, during the day, wind 2.4 foursitting. people appear together in several frames. ´2 ˝ C, night, average ´3 ˝ C during-Three or and mph Three four people appear together in several the day, wind 2.4 mph -Example of thisordatabase is presented in Figure 21b. frames. Example of this database is presented in Figure 21b. -The -behaviors of the humans include walking, running, standing, and The behaviors of the humans include walking, running, standing, −1 °C, morning, average 3 °C sitting. and sitting. during the day, sunny -The brightness of the human is very different compared to that of the ´1 ˝ C, morning, averageafter 3 ˝C The brightness of the human is very different compared to that of rainy dawn duringatthe day, time, sunnywind after 4.0 rainy atbackground. the background. dawn time, wind 4.0 mph mph -The pixel value of the human is much higher than that of the The pixel value of the human is much higher than that of background. the background. -The -behaviors of the humans includeinclude walking, running, standing, and The behaviors of the humans walking, running, standing, −6°C, night, average −3°C sitting. and sitting. ˝ ˝ ´6 C, night, average ´3after C during-The intensity of the human is variously affected by temperature. during the day, sunny The intensity of the human is variously affected by temperature. the day, sunny after rainy at dawn rainy at dawn time, wind 4.0 -If a person appears from a building (indoors), the brightness of If ajust person just appears from a building (indoors), the brightness time, wind 4.0 mph ofisthe person is much greater other objects. The day mph the person much greater than otherthan objects. The day when thewhen was captured was too cold. database the wasdatabase captured was too cold. The behaviors of the humans walking, running, standing, -The -behaviors of the humans includeinclude walking, running, standing, and -2°C, night, average -2°C˝ sitting. and sitting. ˝ ´2 C, night, average ´2 C during There is a person wearing thick clothes. Therefore, the brightness during the day, sunny, wind -There is a person wearing thick clothes. Therefore, the brightness of the day, sunny, wind 4.9 mph of human is similar to the background because the intensity of 4.9 mph human isimage similar to the background because the intensity image captured by infrared camera depends on the of emission captured of byheat. infrared camera depends on the emission of heat. −1 °C, morning, average 2 °C -The behaviors of the humans include walking, running, standing, and during the day, sunny, wind sitting. Condition 2 °C, morning, average −1 °C
Sensors 2016, 16, 453
22 of 31
Table 4. Cont. Database
Condition
Detail Description -
VI (see in Figure 21f)
´1 ˝ C, morning, average 2 ˝ C during the day, sunny, wind 2.5 mph
-
VII (see in Figure 21g)
22 ˝ C, indoor, average ´12 ˝ C during the day outside, no wind
VIII (see in Figure 21h)
26 ˝ C, afternoon, average 21 ˝ C during the day, sunny, wind 1 mph
-
-
IX (see in Figure 21i)
X (see in Figure 21j)
˝ C,
˝C
14 morning, average ´18 during the day, sunny, wind 2.4 mph
28 ˝ C, afternoon, average ´23 ˝ C during the day, sunny, wind 5 mph
-
-
-
XI (see in Figure 21k)
18 ˝ C, night, average 19 ˝ C during the day, sunny after rainfall during the daytime, wind 2 mph
XII (see in Figure 21l)
27 ˝ C, afternoon, average 23 ˝ C during the day, sunny, wind 4.3 mph
XIII (see in Figure 21m)
27 ˝ C, night, average 29 ˝ C during the day, sunny after rainfall during morning, wind 2.4 mph
The behavior of humans is walking. The sequence is captured during a hot day. The intensity of the image is influenced by the camera module system. Therefore, the brightness of humans is much darker than that of the background. There are some occluded people that can be a cause of difficulty for detection of the proposed system.
The behavior of the humans is walking. There is region whose brightness is very similar to humans. Intensity of humans is reflected because of the fences. Not only the size but also intensity of the reflection is very similar to those of humans. The sequence is captured during a hot day. The intensity of the image is influenced by the camera module system. There is a slight brightness change during recording because of a large vehicle.
-
-
-
30 ˝ C, night, average 29 ˝ C during the day, sunny, wind 2.5 mph
The behavior of humans is waving. There are two or four people in the sequence. Their sizes are very different. The intensity of the humans is higher than that of background. There is also a watering ground.
-
-
XV (see in Figure 21o)
The behaviors of the humans include walking, sitting, and waving. The intensity of the humans is much lower than the background. The intensity of some background regions is also similar to that of human.
The behaviors of the human include kicking and punching. The person that appeared in the sequence is wearing short sleeves. The intensity of the human is a little higher than that of the background.
-
33 ˝ C, afternoon, average 29 ˝ C during the day, sunny, wind 3.5 mph
The behaviors of the humans include walking, running, standing, and sitting. The brightness of an image captured indoors is brighter than that of an image captured outside. The reflected region is located under the human region. The size of the region is same with the human. It is influenced by the material of the floor.
-
-
XIV (see in Figure 21n)
The behaviors of the humans include walking, running, standing, and sitting. The Halo effect is shown below the regions of humans. It is shown distinctive to the background. The brightness of the humans is much higher than that of background.
-
The behaviors of the humans include walking, waving, and punching. The intensity of the human is similar to that of the background. The detection result of the proposed method is affected by the little contrast between humans and the background. The behaviors of the humans include walking, running, standing, punching, and kicking. The sequence is captured during a heat wave. The humans that appeared in the sequence are wearing short sleeves. The brightness of the humans is darker than that of the background. There is a region whose brightness is very similar to the background. There are two crosswalks whose intensity is a little darker than the surrounding region. There is a slight brightness change during the recording because of a large vehicle. The behaviors of the human include walking, waving, kicking, and punching. The intensity of the human is much darker than the background. A human is shown relevant to the background region. There is a round iron piece in the middle of the images. There is a region whose brightness is very similar to the background. There are two crosswalks whose intensity is a little darker than the surrounding region.
Sensors 2016, 16, 453 Sensors 2016, 16, 453
23 of 31 23 of 31
Sensors 2016, 16, 453
23 of 31
3.2. Results Results of of Generating Generating Background Background Model Model 3.2. 3.2.As Results of Generating Background Model image from the proposed method is compared to those the first first experiment, background As the experiment, aa background image from the proposed method is compared to those from other other methods shown in Figures 22–24. Most research created a background As the first experiment, ainbackground image from theprevious proposed method is to those from methods asas shown Figures 22–24. Most previous research created acompared background image image by using a simple averaging operation with multiple frames. However, some ghost shadows from other methods as shown in Figures 22–24. Most previous research created a background by using a simple averaging operation with multiple frames. However, some ghost shadows can exist, canimage exist,by asusing illustrated in Figure Those ghost shadows from However, a high-level intensity ofincluded humans a simple averaging operation with multiple frames. some ghost shadows as illustrated in Figure 22. Those 22. ghost shadows are from aare high-level intensity of humans included in the frames. By using the median pixel values, more correct background images can by be can exist, as illustrated in Figure 22. Those ghost shadows are from a high-level intensity of humans in the frames. By using the median pixel values, more correct background images can be created created by our method. included in the frames. By using the median pixel values, more correct background images can be our method. created by our method.
(a) (a)
(b)
(b)
Figure 22.22.Comparisons ofofpreliminary background images with database The left figure (a) is Figure 22. Comparisons preliminary imageswith withdatabase databaseI. I. I.The Theleft left figure Figure Comparisonsof preliminary background background images figure (a)(a) is is by [33–35] and right figure (b) is by the proposed method, respectively. byby [33–35] and right respectively. [33–35] and rightfigure figure(b) (b)isisby bythe theproposed proposed method, method, respectively.
As shown in Figure 23, ififthere are motionless peoplein in allframes, frames, humanareas areas are shown in shown Figure23, 23,if thereare aremotionless motionless people people are shown in in AsAs shown ininFigure there in all all frames,human human areas are shown background To overcome overcomethis thisdrawback, drawback,previous previous research backgroundimages imagesby byaveraging averaging methods methods [33–35]. [33–35]. To research background images by averaging methods [33–35]. To overcome this drawback, previous research [24] [24] utilized to create createthe thecorrect correctbackground background image. [24] utilizedthe theaveraging averaging of of two two different different sequences sequences to image. utilized the averaging of two different sequences to create the correct background image. However, if However, if if there there isisaabrightness brightnesschange changeinin the created image However, thereisisa atree treeor orvehicle vehiclein in aa sequence, sequence, there the created image there is a tree or vehicle in a sequence, there is a brightness change in the created image compared to comparedtotothe theinput inputimages. images. This This brightness brightness change of of erroneous compared change can can influence influencethe thegeneration generation erroneous the input images. This brightness change can influence the generation of erroneous detection results detection resultsbybybackground background subtraction. subtraction. Therefore, Therefore, maintaining ofof a generated detection results maintainingthe thebrightness brightness a generated by background subtraction. Therefore, maintaining the brightness of a generated background image background image compared to the input image is important in the use of the background background image compared to the input image is important in the use of the background compared to the input image is important in the use of the background subtraction technique. subtraction technique. subtraction technique.
Figure 23. Comparisons of created background images with database I. The left-upper, right-upper, Figure 23. Comparisons of created background images with database I. The left-upper, right-upper, left-lower and right-lower figures are by [24,27–31,33–35] and database the proposed method, respectively. Figure 23.and Comparisons offigures created background images with I. The left-upper, right-upper, left-lower right-lower are by [24,27–31,33–35] and the proposed method, respectively. left-lower and right-lower figures are by [24,27–31,33–35] and the proposed method, respectively.
Sensors 2016, 16, 453 Sensors 2016, 16, 453
24 of 31 24 of 31
Figure 24. Comparisons of created background images with database III. The left-upper, right-upper,
Figure 24. Comparisons of created background images with database III. The left-upper, right-upper, left-lower and right-lower figures are by [24,27–31,33–35] and the proposed method, respectively. left-lower and right-lower figures are by [24,27–31,33–35] and the proposed method, respectively.
In other research [27–31], statistical modeling was used by calculating weighted means and variances of the sampled values to createmodeling a background these methods have problems In other research [27–31], statistical wasimage. usedAll byofcalculating weighted means and concerned with the inclusion of humans in the generated background image, as shown Figure 23. variances of the sampled values to create a background image. All of these methodsinhave problems On the other hand, humans are completely removed in a background images generated by our concerned with the inclusion of humans in the generated background image, as shown in Figure 23. method. Additional examples of comparison are presented in Figure 24.
On the other hand, humans are completely removed in a background images generated by our method. Additional examples of comparison are presented in Figure 24. 3.3. Detection Results In Figure 25, the detection results of the proposed method are presented. The square box 3.3. Detection Results indicates the detected region of a human. Despite the fact that there are humans located closely
In Figure the detection results25a,b,j,k,m), of the proposed method are presented. boxareas indicates (and with 25, a little overlap) (Figure various types of human areas, The suchsquare as human the detected region a human.(Figure Despite the factvehicles that there are 25l), humans located closely (and with darker than the of background 25h,j,l–o), (Figure similar intensities between humans and(Figure the background (Figure 25d,k,m), variousareas, typessuch of human behavior, as than a little overlap) 25a,b,j,k,m), various types and of human as human areas such darker walking, running, sitting, standing, waving, punching, and kicking (Figure 25a–o), are detected the background (Figure 25h,j,l–o), vehicles (Figure 25l), similar intensities between humans and the correctly.(Figure As shown in Figure complex scene notbehavior, affect the such human background 25d,k,m), and25, various types of does human as detection walking, because running,the sitting, image by thermal camera is not changed according to the complexity of scene but the temperature standing, waving, punching, and kicking (Figure 25a–o), are detected correctly. As shown in Figure 25, of the scene. complex scene does not affect the human detection because the image by thermal camera is not Next, for quantitative evaluation of the detection accuracies by the proposed method, we changed according to the complexity of scene but the temperature of the scene. manually set square boxes surrounding human areas as ground truth regions. The detection results Next, for quantitative evaluation of the detection accuracies by thearea proposed method, weand manually were evaluated with true or false positives by measuring the overlap of a ground truth a set square boxes surrounding human areas as ground truth regions. The detection results bounded box based on the PASCAL measure [50–52]. If the overlap area from a detected were evaluated withbox trueBdborand false positives by measuring the overlap areawe ofcounted a ground and bounding a ground truth box Bgt exceeded threshold, thetruth result as a bounded true positive which means a correct detection. The overlaparea is calculated Equation (16): box based oncase, the PASCAL measure [50–52]. If the overlap Odg fromusing a detected bounding box Bdb and a ground truth box Bgt exceeded threshold, we( counted the result as a true positive case, which ) ⋂ = (16) means a correct detection. The overlap is calculated( using (16): ) ⋃ Equation detected and the ground truth bounding boxes; and where: Bdb ⋂ Bgt presents the intersection of the areapB db X Bgt q Bdb ⋃ Bgt is their union [50–52]. BasedOon Equation (16), the number of true positive (TP) and false (16) dg “ areapBdb Y Bgt q positive (FP) are counted. The positive and negative samples represent the human and background areas, Therefore, TPs are the detection results and FPstruth are the incorrect boxes; cases. and where: Bdb respectively. X Bgt presents the intersection ofcorrect the detected and the ground bounding False negative (FN) are the number of humans not detected using the proposed method. That is, the Bdb Y Bgt is their union [50–52]. Based on Equation (16), the number of true positive (TP) and false total number of TP and FN is the total number of human regions in all the images.
positive (FP) are counted. The positive and negative samples represent the human and background areas, respectively. Therefore, TPs are the correct detection results and FPs are the incorrect cases. False negative (FN) are the number of humans not detected using the proposed method. That is, the total number of TP and FN is the total number of human regions in all the images.
Sensors 2016, 16, 453
25 of 31
Sensors 2016, 16, 453
25 of 31
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n)
(o)
Figure 25. Detection results with database (I–XV). Results of images in: (a) Database I; (b) Database II; Figure 25. Detection results with database (I–XV). Results of images in: (a) Database I; (b) Database II; (c) Database III; (d) Database IV; (e) Database V; (f) Database VI; (g) Database VII; (h) Database VIII; (c) Database III; (d) Database IV; (e) Database V; (f) Database VI; (g) Database VII; (h) Database VIII; (i) Database IX; (j) Database X; (k) Database XI; (l) Database XII; (m) Database XIII; (n) Database XIV; (i) Database IX; (j) Database X; (k) Database XI; (l) Database XII; (m) Database XIII; (n) Database XIV; and (o) Database XV. and (o) Database XV.
Based on this, the positive predictive value (PPV) (precision) and sensitivity (recall) are Based on the positive predictive value (PPV) (precision) sensitivity (recall) are obtained, obtained, as this, indicated in Equations (17) and (18) [15,53]. In theseand equations, the number of TP, FP, as and indicated in Equations (17) andas(18) [15,53]. In these the To number of TP, FP, and FN for cases FN cases are represented #TP, #FP, and #FN,equations, respectively. present a single value the as F1-score is obtained PPV and sensitivity Therefore, a higher value the areaccuracy, represented #TP, #FP, and #FN,by respectively. To present[54]. a single value for accuracy, thefor F1-score F1-score means a higher accuracy of human detection. The operation for obtaining the F1-score is presented in Equation (19):
Sensors 2016, 16, 453
26 of 31
is obtained by PPV and sensitivity [54]. Therefore, a higher value for the F1-score means a higher accuracy of human detection. The operation for obtaining the F1-score is presented in Equation (19): PPV “
#TP #TP ` #FP
Sensitivity “ F1 ´ Score “ 2 ˆ
(17)
#TP #TP ` #FN
(18)
Sensitivity ˆ PPV Sensitivity ` PPV
(19)
As indicated in Table 5, the detection accuracy of the proposed method with fifteen databases is presented. The PPV, sensitivity, and F1-score are 95.01%, 96.93%, and 95.96%, respectively. Database III, captured at early morning and at 0 ˝ C, shows the best results. The contrast between humans and backgrounds in frames is very clear. Therefore, the detection accuracies obtained with the database III are higher than other results. On the other hand, database XII captured on a hot summer day shows worse results. The temperature of the scene rises above 27 ˝ C, and the average temperature of that day was 23 ˝ C. Because of the temperature (around 25 ˝ C), humans appear to be much darker than the background. In addition, there are much darker areas than other regions, which are similar to human regions, even though the area is not a human area but rather a background area. This is due to the temperature of buildings and roads, which received heat. Moreover, there are some occluded humans in frames in the database. Because of these factors, the F1-score of the database is 80.33%, which is lower than other results, but still satisfactory. If the temperature at that time is above 27 ˝ C and average temperature is above 25 ˝ C, the area of the human is shown as being much darker than other areas. Therefore, the results from the databases XIII–XV are higher than the results from database XII. Table 5. Results of human detection by the proposed method with our database. Database No.
#Frames
#People
#TP
#FP
Sensitivity
PPV
F1-Score
I II III IV V VI VII VIII IX X XI XII XIII XIV XV Total
2609 2747 3151 3099 4630 3427 3330 1316 905 1846 5599 2913 3588 5104 1283 45,546
3928 4543 5434 4461 5891 3820 3098 1611 2230 3400 6046 4399 4666 7232 1924 62,683
3905 4536 5433 4368 5705 3820 3046 1505 1818 3056 5963 3407 4047 7036 1913 59,558
48 135 60 101 113 70 14 58 0 112 162 676 33 158 148 1888
0.9941 0.9985 0.9998 0.9792 0.9684 1 0.9832 0.9342 0.8152 0.8988 0.9863 0.7745 0.8673 0.9729 0.9942 0.9501
0.9879 0.9711 0.9891 0.9774 0.9806 0.9820 0.9954 0.9629 1 0.9646 0.9736 0.8344 0.9919 0.9780 0.9282 0.9693
0.9910 0.9846 0.9944 0.9783 0.9745 0.9909 0.9893 0.9483 0.8982 0.9306 0.9799 0.8033 0.9255 0.9755 0.9601 0.9596
In Table 6, the detection accuracy categorized by human behaviors is presented. The sitting case shows the best results. This means that the generated background image is created correctly for the background subtraction method. The walking case presents comparatively worse results. This is because there are occlusions in several frames. The PPV, sensitivity, and F1-score shows 90.46%, 93.78%, and 92.09%, which are lower than other results, but these results remain acceptable. Based on Tables 5 and 6 we can conclude that the proposed method can detect humans correctly given various environmental conditions and behaviors of humans.
Sensors 2016, 16, 453
27 of 31
Table2016, 6. Results Sensors 16, 453
of human detection categorized by human behaviors with our database. 27 of 31
6. Results of human detection by human behaviors with our database. F1-Score Behavior Table #Frames #People #TPcategorized #FP Sensitivity PPV
Walking Behavior 17,380 #Frames 22,315#People 20,186#TP Running Walking 6274 17,380 3864 22,315377620,186 Standing Running 5498 10,430 386410,3563776 6274 Sitting Standing 6179 11,417 10,43011,364 5498 10,356 Waving Sitting 1975 3611 11,417318111,364 6179 Punching 3932 5434 5117 Waving 1975 3611 3181 Kicking 4308 5612 5578 Punching 3932 5434 5117 Kicking 4308 5612 5578
1340 #FP 153 1340 67 153 3 67 03 96 0 229 96 229
0.9046 Sensitivity 0.9772 0.9046 0.9929 0.9772 0.9954 0.9929 0.8809 0.9954 0.9417 0.8809 0.9939 0.9417 0.9939
PPV 0.9378 F1-Score 0.93780.9611 0.9209 0.96110.9936 0.9536 0.99360.9997 0.9932 1 0.9975 0.9997 0.9816 1 0.9367 0.9606 0.9816 0.9612 0.9606 0.9770
0.9209 0.9536 0.9932 0.9975 0.9367 0.9612 0.9770
In Figure 26, we show the detection error cases by the proposed method. As shown in Figure 26a, Figure 26, we show the detection error cases by the proposed method. As shown in Figure 26a, there are two In people in the middle-left area of the image. However, because of the occlusion of the there are two people in the middle-left area of the image. However, because of the occlusion of the two humans, error cases indicated bybyaadrawn yellow square Further, there are, as shown two humans, erroroccur, cases occur, indicated drawn yellow square box. box. Further, there are, as shown in Figure 26b, two people in the upper-right area of the image. However, one candidate in Figure 26b, two people in the upper-right area of the image. However, one candidate region is region is detected, with asquare green square detected, with a green box.box.
(a)
(b)
Figure 26. Detection error cases: (a) result of the proposed method with database X; (b) result of the
Figure 26.proposed Detection error cases: (a) result of the proposed method with database X; (b) result of the method with database XIII. proposed method with database XIII. In next experiments, we performed the comparisons with existing methods [24,32,37]. Same databases used in Tables 5 and 6 were used for comparisons. The comparative results of human next experiments, we performed the comparisons with existing methods [24,32,37]. detections are shown in Tables 7 and 8.
In Same databases used in Tables 5 and 6 were used for comparisons. The comparative results of Table 7. Comparative results of human detection by the proposed method and previous ones human detections are shown in Tables 7 and 8. [24,32,37] with our database. Sensitivity PPV F1-Score Table 7.DB Comparative results of human detection by the proposed method and previous ones [24,32,37] Previous Method Previous Method Previous Method No. Ours Ours Ours with our database. [24] [32] [37] [24] [32] [37] [24] [32] [37]
DB No.
I II III IV V VI VII VIII IX X XI XII XIII XIV XV Avg
I 0.9941 0.9514 0.9351 0.9832 II 0.9985 0.9595 0.9406 0.9885 Sensitivity III 0.9998 0.9522 0.9366 0.9763 Method IV 0.9792 Previous 0.9386 0.9219 0.9698 Ours0.9684 0.9257 0.9085 0.9559 V [24]0.9601 [32] [37] VI 1 0.9441 0.9913 VII0.9941 0.98320.9514 0.94320.9351 0.9231 0.9832 0.9714 VIII 0.9342 0.9001 0.8792 0.9278 0.9985 0.9595 0.9406 0.9885 IX 0.8152 0.7653 0.7554 0.8049 0.9998 0.9522 0.9366 0.9763 X 0.8988 0.8509 0.8325 0.8811 XI0.9792 0.98630.9386 0.94140.9219 0.9225 0.9698 0.9709 XII0.9684 0.77450.9257 0.72780.9085 0.7105 0.9559 0.7592 XIII 1 0.86730.9601 0.81980.9441 0.8019 0.9913 0.8509 XIV 0.97290.9432 0.93090.9231 0.9113 0.9714 0.9599 0.9832
0.9342 0.8152 0.8988 0.9863 0.7745 0.8673 0.9729 0.9942 0.9501
0.9001 0.7653 0.8509 0.9414 0.7278 0.8198 0.9309 0.9502 0.9064
0.8792 0.7554 0.8325 0.9225 0.7105 0.8019 0.9113 0.9351 0.8896
0.9278 0.8049 0.8811 0.9709 0.7592 0.8509 0.9599 0.9825 0.9376
0.9879 0.9711 0.9891 0.9774 Ours 0.9806 0.9820 0.9954 0.9879 0.9629 0.9711 1 0.9891 0.9646 0.9774 0.9736 0.9806 0.8344 0.9919 0.9820 0.9780 0.9954
0.9544 0.8713 0.9621 0.9910 0.9529 0.9021 0.9725 0.9462 0.8623 PPV 0.9539 0.9846 0.9528 0.8998 0.9709 F1-Score 0.9515 0.8711 0.9597 0.9944 0.9519 0.9027 0.9679 Previous Method Method 0.9497 0.8698 0.9678 0.9783 0.9441 Previous 0.8951 0.9688 Ours 0.9605 0.8792 0.9681 0.9745 0.9428 0.8936 0.9620 [24] [32] 0.9696 [37] 0.9909 0.9563[24]0.9062 [32] [37] 0.9525 0.8712 0.9803 0.9644 0.8823 0.90220.9021 0.9714 0.9725 0.9544 0.8713 0.9713 0.9621 0.9893 0.99100.9537 0.9529 0.9399 0.8581 0.8685 0.9374 0.9462 0.8623 0.9473 0.95390.9483 0.98460.9196 0.9528 0.8998 0.9709 0.9731 0.8923 0.9815 0.8982 0.8568 0.8182 0.8845 0.9515 0.8711 0.9597 0.9944 0.9519 0.9027 0.9679 0.9327 0.8498 0.9409 0.9306 0.8899 0.8411 0.9100 0.9497 0.8698 0.9573 0.9678 0.9799 0.97830.9455 0.9441 0.9497 0.8612 0.89080.8951 0.9641 0.9688 0.9605 0.8792 0.8199 0.9681 0.8033 0.97450.7676 0.9428 0.8121 0.7193 0.71490.8936 0.7884 0.9620 0.9623 0.8802 0.83920.9062 0.9106 0.9803 0.9525 0.8712 0.9793 0.9696 0.9255 0.99090.8854 0.9563 0.9431 0.8621 0.88600.9022 0.9558 0.9714 0.9644 0.8823 0.9518 0.9713 0.9755 0.98930.9370 0.9537
0.9629 1 0.9646 0.9736 0.8344 0.9919 0.9780 0.9282 0.9693
0.9399 0.9731 0.9327 0.9497 0.8121 0.9623 0.9431 0.8976 0.9409
0.8581 0.8923 0.8498 0.8612 0.7193 0.8802 0.8621 0.8064 0.8573
0.9473 0.9815 0.9409 0.9573 0.8199 0.9793 0.9518 0.9056 0.9505
0.9483 0.8982 0.9306 0.9799 0.8033 0.9255 0.9755 0.9601 0.9596
0.9196 0.8568 0.8899 0.9455 0.7676 0.8854 0.9370 0.9232 0.9234
0.8685 0.8182 0.8411 0.8908 0.7149 0.8392 0.8860 0.8660 0.8731
0.9374 0.8845 0.9100 0.9641 0.7884 0.9106 0.9558 0.9423 0.9437
Sensors 2016, 16, 453
28 of 31
Table 8. Comparative results of human detection categorized by human behaviors by the proposed method and previous ones [24,32,37] with our database. (Behav.: Behavior, W: Walking, R: Running, St: Standing, Si: Sitting, Wav: Waving, P: Punching, K: Kicking). Sensitivity Behav. Ours W R St Si Wav P K
0.9046 0.9772 0.9929 0.9954 0.8809 0.9417 0.9939
PPV
Previous Method [24]
[32]
[37]
0.8612 0.9331 0.9474 0.9523 0.8371 0.9005 0.9492
0.8434 0.9193 0.9295 0.9378 0.8198 0.8837 0.9302
0.8923 0.9629 0.9735 0.9821 0.8656 0.9334 0.9793
Ours 0.9378 0.9611 0.9936 0.9997 1 0.9816 0.9606
F1-Score
Previous Method [24]
[32]
[37]
0.9084 0.9034 0.9652 0.9703 0.9702 0.9527 0.9311
0.8269 0.8203 0.8812 0.8903 0.8913 0.8702 0.8525
0.9175 0.9103 0.9713 0.9785 0.9798 0.9605 0.9556
Ours 0.9209 0.9536 0.9932 0.9975 0.9367 0.9612 0.9770
Previous Method [24]
[32]
[37]
0.8842 0.9180 0.9562 0.9612 0.8987 0.9259 0.9401
0.8351 0.8670 0.9047 0.9134 0.8541 0.8769 0.8897
0.9047 0.9359 0.9724 0.9803 0.9192 0.9468 0.9673
In addition, we performed the additional comparisons with existing methods [24,32,37] on other database (OSU thermal pedestrian database of object tracking and classification beyond visible spectrum (OTCBVS) benchmark dataset [32,55]). This database has been widely used as an open database for measuring the performance of object detection with the images by thermal camera. It includes ten categorized sequences of thermal images which were collected in different weather condition and different time. The comparative results of human detections are shown in Table 9. As shown in the Tables 7–9 we can confirm that our method outperforms the previous methods [24,32,37] with both our database and OTCBVS database. Table 9. Comparative results of human detection by the proposed method and previous ones [24,32,37] with OTCBVS database. (Seq. No.: Sequence Number). Sensitivity Seq. No. Ours
PPV
Previous Method [24]
[32]
[37]
Ours
F1-Score
Previous Method [24]
[32]
[37]
Ours
Previous Method [24]
[32]
[37]
1 2 3 4 5 6 7 8 9 10
1 1 0.99 1 1 0.99 1 1 1 1
1 0.99 0.99 1 1 1 1 1 1 0.97
0.97 0.94 1 0.98 0.89 0.96 0.98 0.76 1 0.98
1 1 0.98 1 1 0.98 1 1 1 1
1 1 0.99 0.99 1 1 1 1 1 1
1 1 0.98 1 1 1 1 0.99 1 0.97
1 1 0.99 0.99 1 1 1 0.99 1 0.97
1 1 0.99 0.97 1 1 1 1 1 1
1 1 0.99 0.9950 1 0.9950 1 1 1 1
1 0.9949 0.9850 1 1 1 1 0.9950 1 0.97
0.9848 0.9691 0.9950 0.9850 0.9418 0.9796 0.9899 0.8599 1 0.9750
1 1 0.9850 0.9848 1 0.9899 1 1 1 1
Avg
0.9980
0.9949
0.9459
0.9959
0.9980
0.9939
0.9936
0.9959
0.9980
0.9945
0.9680
0.9960
As explained in Figure 1 and Section 2.4.2, our method removes the incorrect human areas based on the size and ratio information (the ratio of height to width) of the detected box. Because the size and the ratio (of height to width) of the detected dog area are comparatively smaller than those of human area, respectively, the detected box of dog can be removed from the candidates of detected human region by our method. However, other animals whose size, ratio, and temperature are similar to those of human can be detected as incorrect human area. 4. Conclusions In this research, we presented a new method of detecting humans in thermal images based on the generation of a background image and fuzzy system under various environmental conditions. A correct background image was generated using a median image and through erasing methods of human
Sensors 2016, 16, 453
29 of 31
areas. A difference image was obtained using a fuzzy system, which is used to determine thresholds adaptively. Human candidate regions were divided based on histogram information. The regions were redefined based on the size and the ratio of humans, with camera view being taken into consideration. Based on the redefined candidate region, human areas were detected. Through experiments in various environments, we proved the effectiveness of the proposed system. In future work, we will study solutions for solving the problems caused by occlusion. In addition, we would expand the research in human behavior classification. Acknowledgments: This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2015R1D1A1A01056761). Author Contributions: Eun Som Jeon and Kang Ryoung Park implemented the overall system and wrote this paper. Jong Hyun Kim, Hyung Gil Hong, and Ganbayar Batchuluun helped the experiments. Conflicts of Interest: The authors declare no conflict of interest.
References 1. 2.
3. 4.
5. 6. 7.
8. 9. 10.
11.
12.
13.
14.
Ge, J.; Luo, Y.; Tei, G. Real-Time Pedestrian Detection and Tracking at Nighttime for Driver-Assistance Systems. IEEE Trans. Intell. Transp. Syst. 2009, 10, 283–298. Prioletti, A.; Mogelmose, A.; Grisleri, P.; Trivedi, M.M.; Broggi, A.; Moeslund, T.B. Part-Based Pedestrian Detection and Feature-Based Tracking for Driver Assistance: Real-Time, Robust Algorithms, and Evaluation. IEEE Trans. Intell. Transp. Syst. 2013, 14, 1346–1359. [CrossRef] Källhammer, J.-E. Night Vision: Requirements and Possible Roadmap for FIR and NIR Systems. In Proceedings of the SPIE—The International Society for Optical Engineering, Strasbourg, France, 6 April 2006; p. 61980F. Mehralian, S.; Palhang, M. Pedestrian Detection Using Principal Components Analysis of Gradient Distribution. In Proceedings of the Iranian Conference on Machine Vision and Image Processing, Zanjan, Iran, 10–12 September 2013; pp. 58–63. Jiang, Y.; Ma, J. Combination Features and Models for Human Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 240–248. Martin, R.; Arandjelovi´c, O. Multiple-Object Tracking in Cluttered and Crowded Public Spaces. Lect. Notes Comput. Sci. 2010, 6455, 89–98. Khatoon, R.; Saqlain, S.M.; Bibi, S. A Robust and Enhanced Approach for Human Detection in Crowd. In Proceedings of the International Multitopic Conference, Islamabad, Pakistan, 13–15 December 2012; pp. 215–221. Fotiadis, E.P.; Garzón, M.; Barrientos, A. Human Detection from a Mobile Robot Using Fusion of Laser and Vision Information. Sensors 2013, 13, 11603–11635. [CrossRef] [PubMed] Besbes, B.; Rogozan, A.; Rus, A.-M.; Bensrhair, A.; Broggi, A. Pedestrian Detection in Far-Infrared Daytime Images Using a Hierarchical Codebook of SURF. Sensors 2015, 15, 8570–8594. [CrossRef] [PubMed] Zhu, Q.; Yeh, M.-C.; Cheng, K.-T.; Avidan, S. Fast Human Detection Using a Cascade of Histograms of Oriented Gradients. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 17–22 June 2006; pp. 1491–1498. Li, Z.; Zhang, J.; Wu, Q.; Geers, G. Feature Enhancement Using Gradient Salience on Thermal Image. In Proceedings of the International Conference on Digital Image Computing: Techniques and Applications, Sydney, Australia, 1–3 December 2010; pp. 556–562. Chang, S.L.; Yang, F.T.; Wu, W.P.; Cho, Y.A.; Chen, S.W. Nighttime Pedestrian Detection Using Thermal Imaging Based on HOG Feature. In Proceedings of the International Conference on System Science and Engineering, Macao, China, 8–10 June 2011; pp. 694–698. Rajaei, A.; Shayegh, H.; Charkari, N.M. Human Detection in Semi-Dense Scenes Using HOG descriptor and Mixture of SVMs. In Proceedings of the International Conference on Computer and Knowledge Engineering, Mashhad, Iran, 31 October–1 November 2013; pp. 229–234. Bertozzi, M.; Broggi, A.; Rose, M.D.; Felisa, M.; Rakotomamonjy, A.; Suard, F. A Pedestrian De-Tector Using Histograms of Oriented Gradients and a Support Vector Machine Classifier. In Proceedings of the IEEE Conference on Intelligent Transportation Systems, Seattle, WA, USA, 30 September–3 October 2007; pp. 143–148.
Sensors 2016, 16, 453
15.
16.
17.
18.
19. 20. 21.
22. 23. 24.
25. 26.
27.
28.
29. 30. 31. 32.
33.
34.
30 of 31
Li, W.; Zheng, D.; Zhao, T.; Yang, M. An Effective Approach to Pedestrian Detection in Thermal Imagery. In Proceedings of the International Conference on Natural Computation, Chongqing, China, 29–31 May 2012; pp. 325–329. Wang, W.; Wang, Y.; Chen, F.; Sowmya, A. A Weakly Supervised Approach for Object Detection Based on Soft-Label Boosting. In Proceedings of the IEEE Workshop on Applications of Computer Vision, Tempa, FL, USA, 15–17 January 2013; pp. 331–338. Wang, W.; Zhang, J.; Shen, C. Improved Human Detection and Classification in Thermal Images. In Proceedings of the IEEE International Conference on Image Processing, Hong Kong, China, 26–29 September 2010; pp. 2313–2316. Takeda, T.; Kuramoto, K.; Kobashi, S.; Haya, Y. A Fuzzy Human Detection for Security System Using Infrared Laser Camera. In Prodeedings of the IEEE International Symposium on Multiple-Valued Logic, Toyama, Japan, 22–24 May 2013; pp. 53–58. Sokolova, M.V.; Serrano-Cuerda, J.; Castillo, J.C.; Fernández-Caballero, A. A Fuzzy Model for Human Fall Detection in Infrared Video. J. Intell. Fuzzy Syst. 2013, 24, 215–228. Nie, F.; Li, J.; Rong, Q.; Pan, M.; Zhang, F. Human Object Extraction Using Nonextensive Fuzzy Entropy and Chaos Differential Evolution. Int. J. Signal Process. Image Process. Pattern Recognit. 2013, 6, 43–54. Davis, J.W.; Sharma, V. Fusion-Based Background-Subtraction Using Contour Saliency. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition—Workshops, San Diego, CA, USA, 25 June 2005; pp. 1–9. Davis, J.W.; Sharma, V. Background-Subtraction Using Contour-Based Fusion of Thermal and Visible Imagery. Comput. Vis. Image Underst. 2007, 106, 162–182. [CrossRef] Davis, J.W.; Sharma, V. Robust Detection of People in Thermal Imagery. In Proceedings of the International Conference on Pattern Recognition, Cambridge, UK, 23–26 August 2004; pp. 713–716. Dai, C.; Zheng, Y.; Li, X. Layered Representation for Pedestrian Detection and Tracking in Infrared Imagery. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition—Workshops, San Diego, CA, USA, 25 June 2005; pp. 1–8. Dai, C.; Zheng, Y.; Li, X. Pedestrian Detection and Tracking in Infrared Imagery Using Shape and Appearance. Comput. Vis. Image Underst. 2007, 106, 288–299. [CrossRef] Latecki, L.J.; Miezianko, R.; Pokrajac, D. Tracking Motion Objects in Infrared Videos. In Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance, Como, Italy, 15–16 September 2005; pp. 99–104. Mahapatra, A.; Mishra, T.K.; Sa, P.K.; Majhi, B. Background Subtraction and Human Detection in Outdoor Videos Using Fuzzy Logic. In Proceedings of the IEEE International Conference on Fuzzy Systems, Hyderabad, India, 7–10 July 2013; pp. 1–7. Cucchiara, R.; Grana, C.; Piccardi, M.; Prati, A. Statistic and Knowledge-Based Moving Object Detection in Traffic Scenes. In Proceedings of the IEEE Conference on Intelligent Transportation Systems, Dearborn, MI, USA, 1–3 October 2000; pp. 27–32. Tan, Y.; Guo, Y.; Gao, C. Background Subtraction Based Level Sets for Human Segmentation in Thermal Infrared Surveillance Systems. Infrared Phys. Technol. 2013, 61, 230–240. [CrossRef] Cucchiara, R.; Grana, X.; Piccardi, M.; Prati, A. Detecting Moving Objects, Ghosts, and Shadows in Video Streams. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 25, 1337–1342. [CrossRef] Zheng, J.; Wang, Y.; Nihan, N.L.; Hallenbeck, M.E. Extracting Roadway Background Image: A Mode-Based Approach. J. Transp. Res. Rep. 2006, 1944, 82–88. [CrossRef] Davis, J.W.; Keck, M.A. A Two-Stage Template Approach to Person Detection in Thermal Imagery. In Proceedings of the IEEE Workshop on Applications of Computer Vision, Breckenridge, CO, USA, 5–7 January 2005; pp. 364–369. Dagless, E.L.; Ali, A.T.; Cruz, J.B. Visual Road Traffic Monitoring and Data Collection. In Proceedings of the IEEE Vehicle Navigation and Information Systems Conference, Ottawa, ON, Canada, 12–15 October 1993; pp. 146–149. Baf, F.E.; Bouwmans, T.; Vachon, B. Fuzzy Foreground Detection for Infrared Videos. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Anchorage, AK, USA, 23–28 June 2008; pp. 1–6.
Sensors 2016, 16, 453
35.
36.
37.
38. 39. 40. 41.
42. 43.
44. 45. 46. 47.
48. 49. 50. 51. 52. 53. 54. 55.
31 of 31
Shakeri, M.; Deldari, H.; Foroughi, H.; Saberi, A.; Naseri, A. A Novel Fuzzy Background Subtraction Method Based on Cellular Automata for Urban Traffic Applications. In Proceedings of the International Conference on Signal Processing, Beijing, China, 26–29 October 2008; pp. 899–902. Zheng, Y.; Fan, L. Moving Object Detection Based on Running Average Background and Temporal Difference. In Proceedings of the International Conference on Intelligent Systems and Knowledge Engineering, Hangzhou, China, 15–16 November 2010; pp. 270–272. Jeon, E.S.; Choi, J.-S.; Lee, J.H.; Shin, K.Y.; Kim, Y.G.; Le, T.T.; Park, K.R. Human Detection Based on the Generation of a Background Image by Using a Far-Infrared Light Camera. Sensors 2015, 15, 6763–6788. [CrossRef] [PubMed] Infrared. Available online: https://en.wikipedia.org/wiki/Infrared (accessed on 23 December 2015). Niblack, W. An Introduction to Digital Image Processing, 1st ed.; Prentice Hall: Englewood Cliffs, NJ, USA, 1986. Gonzalez, R.C.; Woods, R.E. Digital Image Processing, 1st ed.; Addison-Wesley: Boston, MA, USA, 1992. Bayu, B.S.; Miura, J. Fuzzy-based Illumination Normalization for Face Recognition. In Proceedings of the IEEE Workshop on Advanced Robotics and Its Social Impacts, Tokyo, Japan, 7–9 November 2013; pp. 131–136. Barua, A.; Mudunuri, L.S.; Kosheleva, O. Why trapezoidal and triangular membership functions work so well: Towards a theoretical explanation. J. Uncertain Syst. 2014, 8, 164–168. Zhao, J.; Bose, B.K. Evaluation of Membership Functions for Fuzzy Logic Controlled Induction Motor Drive. In Proceedings of the IEEE Annual Conference of the Industrial Electronics Society, Sevilla, Spain, 5–8 November 2002; pp. 229–234. Naaz, S.; Alam, A.; Biswas, R. Effect of Different Defuzzification Methods in a Fuzzy Based Load Balancing Application. Int. J. Comput. Sci. Issues 2011, 8, 261–267. Nam, G.P.; Park, K.R. New Fuzzy-Based Retinex Method for the Illumination Normalization of Face Recognition. Int. J. Adv. Rob. Syst. 2012, 9, 1–9. Leekwijck, W.V.; Kerre, E.E. Defuzzification: Criteria and Classification. Fuzzy Sets Syst. 1999, 108, 159–178. [CrossRef] Lee, J.H.; Choi, J.-S.; Jeon, E.S.; Kim, Y.G.; Le, T.T.; Shin, K.Y.; Lee, H.C.; Park, K.R. Robust Pedestrian Detection by Combining Visible and Thermal Infrared Cameras. Sensors 2015, 15, 10580–10615. [CrossRef] [PubMed] Vezzani, R.; Baltieri, D.; Cucchiara, R. HMM Based Action Recognition with Projection Histogram Features. Lect. Notes Comput. Sci. 2010, 6388, 286–293. Tau 2 Uncooled Cores. Available online: http://www.flir.com/cores/display/?id=54717 (accessed on 23 December 2015). Olmeda, D.; Premebida, C.; Nunes, U.; Armingol, J.M.; Escalera, A.D.L. Pedestrian Detection in Far Infrared Images. Integr. Comput. Aided Eng. 2013, 20, 347–360. Dollar, P.; Wojek, C.; Schiele, B.; Perona, P. Pedestrian Detection: An Evaluation of the State of the Art. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 743–761. [CrossRef] [PubMed] Everingham, M.; Gool, L.V.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [CrossRef] Sensitivity and Specificity. Available online: http://en.wikipedia.org/wiki/Sensitivity_and_specificity (accessed on 23 December 2015). F1-Score. Available online: https://en.wikipedia.org/wiki/Precision_and_recall (accessed on 23 December 2015). OTCBVS Benchmark Dataset Collection. Available online: http://vcipl-okstate.org/pbvs/bench/ (accessed on 28 January 2016). © 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).