第 55 卷 第 1 期
2020 年 1 月
JOURNAL OF SOUTHWEST JIAOTONG UNIVERSITY
Vol. 55 No. 1
Feb. 2020
ISSN: 0258-2724 DOI:10.35741/issn.0258-2724.55.1.32
Research article
Computer and Information Sciences
A
UTOMATIC
L
EFTOVER
W
EIGHT
P
REDICTION IN
T
RAY
B
OX
U
SING
I
MPROVED
I
MAGE
S
EGMENTATION
C
OLOR
L
IGHTING
C
OMPONENT
使用改进的图像分段颜色照明组件自动预测纸盒中的残留物重量
Yuita Arum Sari a, Ratih Kartika Dewi b, Jaya Mahar Maligan c, Luthfi Maulana d, Sigit Adinugroho a a Computer Vision Research Group, Faculty of Computer Sciences, University of Brawijaya
Veteran St., 65145, Malang, Indonesia, yuita@ub.ac.id, sigit.adinu@ub.ac.id b
Mobile, Game and Multimedia Reseacrh Group, Faculty of Computer Sciences, University of Brawijaya Veteran St., 65145, Malang, Indonesia, ratihkartikad@ub.ac.id
c Food Nutrition Program, Faculty of Agricultural Technology, University of Brawijaya Veteran St., 65145, Malang, Indonesia, maharjay@gmail.com
d
Faculty of Computer Sciences, University of Brawijaya Veteran St., 65145, Malang, Indonesia, luthfi_m@student@ub.ac.id
Abstract
The problem of food waste is experienced by many countries, including Indonesia. In the previous Comstock model, estimating food scraps required the expertise of the estimator, but this method has drawbacks because of subjective perspective of even skilled observers. Another weakness occurred when the observers were exhausted, which in turn negatively affected the measurement of leftover estimation. Therefore, in this paper, we propose an approach for automatic weight prediction using image processing in order to minimize the error forecasting caused by humans. Improved lighting component in image segmentation is also utilized. We apply this framework in the tray box images and estimate each compartment. Two types of tray box backgrounds are tested: gray and black backgrounds. The first part of the proposed method takes a lighting component from each color channel of LAB, HSV, YcbCr, YUV, and LUV. Each of those color channels are applied in contrast limited adaptive histogram equalization to adjust the contrast of each image. After that, the Otsu segmentation is applied, and some formulas to calculate leftover automatically are also presented. The result shows remarkable results when applied in the black background of the tray box with root mean square error around 6.67 using an L lighting component of LAB and Y lighting color component as well YcbCr and YUV. The proposed method is good for leftover forecasting since the estimation is not significantly different from one done by human observers.
Keywords: Leftover Estimation, Image Segmentation, Food Image Segmentation, Color Channel, Histogram Equalization
食物残渣需要估计者的专业知识,但是由于即使是熟练的观察者也只能凭主观观点来估计这种方 法的缺点。当观察员精疲力尽时,另一个弱点发生了,这反过来又对剩余估计的度量产生了负面 影响。因此,在本文中,我们提出一种使用图像处理进行自动体重预测的方法,以最大程度地减 少人为引起的误差预测。还利用了图像分割中改进的照明组件。我们在托盘盒图像中应用此框架 ,并估计每个隔间。测试了两种类型的托盘盒背景:灰色和黑色背景。该方法的第一部分从实验 室,单纯疱疹病毒,六价铬,YUV 和轻型车的每个颜色通道中获取照明分量。那些颜色通道中的 每一个都在对比度受限的自适应直方图均衡中应用,以调整每个图像的对比度。之后,应用大津 分割,并提出了一些自动计算剩余量的公式。当使用实验室的大号照明分量和 Y 照明颜色分量以 及六价铬和 YUV 应用于托盘盒的黑色背景且均方根误差为 6.67 左右时,结果显示出显着效果。 所提出的方法对于剩余物的预测是有益的,因为该估计与人类观察者所做的估计没有显着差异。 关键词: 剩余估计,图像分割,食物图像分割,颜色通道,直方图均衡
I. I
NTRODUCTIONFood is an important aspect of human health because it contains nutrients and serves as a source of energy. It is needed by humans to survive and promotes the development and growth of the body. Nutritional intake is influenced by lifestyle changes in the modern era. Many societies require long working hours, which indirectly changes eating patterns and changes nutritional intake because consumption activities are not only a biological need, but rather become a lifestyle, which can be related to identity, class, or social group. This can lead to eating behavior that is more likely to leave food waste, which translates to a loss of nutritional intake.
Food waste is a big problem worldwide, including in Indonesia. According to Food Sustainability Index from The Economist Intelligence Unit Zero Waste Restaurant Research Executive [31], Indonesia produces 300 kg food waste per person per year. A report from the media Liputan 6, dated February 19, 2019, the Minister of National Planning and Development, Bambang Borodjonegoro reported that Indonesia is the largest producer of food waste in the world after Saudi Arabia, even though Indonesia also has many people who are malnourished.
Regarding food nutrition, if a lot of nutrients are lost, the quality of human resources will also be reduced. According to the Food and Agriculture Organization of the United Nations [1], 30% of food is wasted, which yielded around 1.3 billion tons per year. Global food loss per year is about 30% for cereals, 40-50% for plants, fruits, and vegetables, 20% for oil, meat, and milk, and 35% for fish. The percentage of food that is not consumed and disposed of as waste is referred to as food waste [2]. In general, food
waste data are used as an evaluation tool in nutrition counseling, the administration and service of food, and the adequacy of food consumption in groups or individuals [3]. The amount of leftovers can be considered as a loss of nutrients that should be consumed.
In general, a method of analyzing food waste is the Comstock method, which can visually estimate the amount of leftovers of each type of food. The advantage of this method is that it can be easily employed by involving an expert observer and does not require much time. However, a trained, skilled, and conscientious observer is needed because both over- and underestimation often occurs, especially for people who experience fatigue [4]. For this reason, a modification of the Comstock method with digital image processing was carried out in this study. A computational approach can be used to develop the concept of visualization by using camera sensors.
In digital image processing, to get information from captured images taken by particular devices, preprocessing is required. In general, preprocessing greatly improves the result [5]. For primary data where images are captured in uncontrolled lighting condition, the preprocessing phase becomes an important part of segmentation process to work properly, which has an impact on further processing.
The image segmentation process involves a variety of techniques to separate the image of the object being observed with its background so that only the core part of the image is processed. In previous studies, color channel-based segmentation with V-Otsu can produce image clustering results with an error of less than 1% [6]. Research related to color channel-based segmentation can also be applied directly with a combination of two-color spaces, namely YUV and YcbCr, which is capable of producing
approximately 3% errors for all images taken from 3 different types of smartphone cameras [7]. Other research that uses objects in the form of food images taken with a smartphone camera is then processed using color segmentation from LAB color spaces. The color channel is segmented using Otsu thresholding and produces an accuracy of 95% for the classification of food images by using feature selection [8]. In this study, since the lighting condition was undefined and uncontrolled, the color channel that represents a lighting part were observed.
Besides, the state of the color channel is affected by contrast and can reveal noise, which can affect the results of segmentation. One method of dealing with this condition involves the use of histogram equalization, which can enhance the quality of the image [9]. However, the intensity of an image cannot be handled adaptively. To avoid the increasing of illumination over brightness, the contrast should be limited. Contrast Limited Adaptive Histogram Equalization (CLAHE) is an improved method of histogram equalization that provides good recognition results [10].
Based on the background information discussed above, color channel based on lighting condition for segmentation method is the focus of this research, which was designed to make predictions regarding the nutritional value of food in images before and after consumption. CLAHE was applied to enhance the character of an image before the segmentation process. The result of nutritional loss calculation is also influenced by the result of image segmentation in digital image processing.
II. R
ELATEDW
ORKSA. Leftover Food and Conventional Method
Leftover food is food that is not consumed. The food is usually disposed of as garbage and is usually referred to as food waste. According to Zhao, Georganas, and Petriu [10], food waste is the amount of food that is not eaten, and is divided into two types, namely waste and plate waste. Waste is the loss or damage of food ingredients during the preparation process or food processing. Plate waste is food that is wasted because consumers do not consume it after it is served.
The visual method or observational method is used to measure leftovers based on human eyes judgement. The results of estimated food scraps can be in the form of food weight expressed in grams or scores if using a measurement scale [11]. The method of visual estimation uses a
measurement scale developed by Comstock with the following criteria [12]: consumed entirely, ¼ portion remaining, ½ portion remaining, Scale 3: ¾ portion remaining, Scale 4: only consumed a little (1/9 portion), and whole or not consumed.
The advantages of this method are that it is easy to perform, does not require much time and is inexpensive [13]. However, a trained, skilled and meticulous estimator is needed, and often there are advantages or disadvantages involved with estimating [3].
B. Image Segmentation Approach for Leftover Estimation
The image processing method that will be carried out is food image segmentation. This segmentation process involves the Otsu thresholding method which is implemented on the color channel in the LAB, HSV, YCbCr, YUV, and LUV color spaces for subsequent analysis.
In image segmentation, a predetermined range of parameters is used to produce segmentation from under-segmented to over-segmented [14]. Otsu is a segmentation algorithm that automatically separates the object and background without having to input certain parameters. Otsu can recognize image histograms adaptively so that they can be used to separate variants for objects and variance for backgrounds. The combination of color channels and Otsu improves image segmentation so that it is better if it is used for analysis of the extent of food images.
After the food is segmented and identified, then an estimate of the weight of the food is carried out to determine the nutritional content. In general, there are two ways to determine the amount of food, namely by knowing the solids in food [15]. One method that can be used to calculate the extent of an image is to count every pixel detected as a food object. By assuming that the white area is a food area, then this area is referred to as the area of nutrition needed. Equations related to finding leftover food nutrition will be analyzed later through numerical equations which will be tested first.
III. R
ESEARCHM
ETHODOLOGYWe build a prototype called a Smart Nutrition Box to apply this framework and algorithm. The prototype is equipped with a camera and other components such as Raspberry Pi and LCD screen to present the results comparing the original measurement and estimation of food in a tray box before and after being eaten.
Figure 1. General phase of proposed method
The general phase of the methodology is presented in Figure 1. It starts from capturing an image inside a Smart Nutrition Box using a camera. After that, the captured images of food before and after being eaten are saved in a storage. Then manual cropping is applied to clean the dataset before being processed at a later stage. A detailed explanation of the clean dataset appears in the following subsection. Each tray box image is also cropped and separated into four compartments. Automatic estimation is calculated after the segmentation process of each compartment is completed. The last step of the evaluation is providing a strong analysis estimation, which is delivered by an observer and later released from the system based on the weight of the leftovers. In this paper, the evaluation method is measured by comparing the error between the original value and using the proposed algorithm to provide an estimate by means of the Root Mean Square Error (RMSE).
A. Dataset
Primary data are taken after the prototype—or at part of the prototype—has been built, particularly the Raspberry Pi and the camera to take food images that are later placed in the tray box. Web cameras are installed perpendicular to the food tray so that food capture is easier and can cover the whole area of the tray box. Then, all the components of the space in the tray box will be measured, and these measurements will be used to analyze the remaining nutritional food before and after consumption. Manual weighing is also carried out to ensure that the predicted results of the system output are close to the predicted results of the estimator.
There are 24 images of tray boxes, containing 96 images of compartments in total. Two background colors are used for the tray box: black and gray. Figure 2 depicts two types of menus for the different background colors of the tray box. The first row is titled Menu 1 and the following group is Menu 2.
Figure 2. Dataset. Menu 1 (first row), Menu 2 (last row)
B. Pre-Processing
Pre-processing is a necessary part of image processing which determines how the captured image appears in different lighting conditions. Figure 3 depicts images related to this general step of image pre-processing. The original image is converted to other color spaces. In this paper, we use color space which has a lighting component: LAB, HSV, YCbCr, LUV and HSL. In each of these color spaces, the lighting component is signified by the L of LAB, V of HSV, Y of YCbCr, L of LUV, and L of HSL.
Figure 3. Image pre-processing
Histogram equalization is applied to enhance the quality of primary dataset images. Some noise reduction methods are also utilized since the raw dataset contain unpredictable noise. Thus, we use double filtering consecutively: Median and Gaussian filtering.
Image segmentation is applied to distinguish between the main food compartment and background. Otsu thresholding is used, since it is relatively effective at distinguishing between the grouping object and background. The result of segmentation may create some noise. Therefore, erosion and dilation are applied to strengthen the area of defined food in each compartment.
C. Color Spaces
Color model, also known as color space or color system, builds a specification of colors according to some standard. The visual form of a color model is a coordinate system in which a color is represented as a dot in the coordinate [16]. Color models can be divided into two general categories: psychological color model
and mathematical/geometrical models. The former represents color perception of an object to the normal observer under certain parameters while the latter is represented by the measurement data in number to define the psychological space [17]. Nowadays, there are plenty color models available with their own informative properties.
1) RGB Color Model
In RGB color model, each color consists of a proportion of three primary color component of red, green, and blue. This model is depicted in a three dimensions Cartesian coordinate, where each primary color acts as an axis. Each color is portrayed as a vector extended from the origin of the coordinate [16]. RGB color model so far is the default color model for many applications in computer, especially for visualization since it does not require any transformation to be displayed on a screen [18]. However, RGB is not suitable for segmentation because of its high correlation between color channels [19].
2) LAB Color Model
LAB color model is also called CIELAB color model because it was originally developed by the International Commission on Illumination (CIE). The color model is composed of three main components: L*, a*, and b*. The first component, L*, express the lightness of the color. L*=0 means total dark while L*=100 represents the brightest white. a* displays green-red component, where negative a* shows green and positive a* shows red. The last component, b*, depicts blue-yellow component. Negative b* means blue while positive b* means yellow [20].
LAB color space is popular due to its ability to represent all colors visible to human eyes. On the other hand, RGB color space only delivers 90% of all visible colors [21]. The L* component also mimics human vision behavior. Also, color balance correction isable to be performed precisely by modifying a* and b* components. Similarly, altering brightness is done by changing the L* component [22]. In this paper, L as lighting component of LAB is used.
3) YcbCr Color Model
YCbCr color space is often used in digital photos and videos. The Y components encodes the luminance of an object, while the two others, Cb and Cr, preserve blue difference and red difference [14]. This color model separates luminance and saturation, it is suitable for color processing, including segmentation [22]. Since Y color component is luminance of an image, so in this paper we take Y of YcbCr for segmentation.
4) LUV Color Model
LUV is an independent color space which L color component represents good toward perceived lightness [23]. CIE 1976 color space CIELUV is a color space with a uniform scale defined by describing 3geometric coordinates L and U and V of CIEXYZ conversion formula. LUV is given meaning according to human perception, namely: L represents lightness or color brightness, U is strong color on the red-green axis, and V color channel is strong color on the yellow - blue axis [19,24]. In this paper, we observe L of LUV as lighting component.
5) YUV Color Model
YUV is derived from RGB color space containing three types of color channel, that are luminance (Y) and two-color channels of U and V [24]. Y of YUV combined with histogram equalization process can enhance the performance of detection as stated before [6]. Therefore, in this paper, we use Y as lighting component in YUV.
a
b
Figure 4. Gray scale image before and after adding CLAHE equalization
Figure 5. Image segmentation
D. CLAHE Equalization
Adaptive Histogram Equalization (AHE) is method for improving the contrast of image. It adjusts gray level of an image by using the
probability distribution. However, AHE method still reveal noises in particular regions of an image. Therefore, CLAHE as a part of histogram equalization is approached in order to avoid the increasing of contrast over the image. If an image has over-brightness or contrast, it will affect the information of an original image. So, that by distributing the uniform probability before histogram equalization utilized, the histogram presents contrast limit [25]. Figure 4 shows that the original image in gray level before and after CLAHE Equalization, while in Figure 5 depicts the differences of cumulative distribution function (cdf) before and after adding CLAHE Equalization. In Figure 5, an image before CLAHE Equalization presents over contrast in the middle value of histogram, but for CLAHE Equalization tends to be evenly distributed.
Algorithm 1.
CLAHE equalization in lighting component of color spaces Input: Original image
Output: Output image of CLAHE Equalization Steps:
1) Transform RGB image into certain color spaces 2) Take a color channel which represents lighting
component of color spaces
3) Call function of CLAHE using openCV with clipLimit = 0.5 and tile size of 2 x 2
4) Apply the result of CLAHE into a variable 5) The result of CLAHE image is presented
Algorithm 1 states the algorithm of CLAHE. In this paper, we use contrast limit (clipLimit) 0.5 to trace the item of food in each compartment need less contrast of the image. The tile size we use the smallest value of 2 x 2 small blocks which is equalized. If the noise happened in the small block, the contrast will be increased, so it can remove some noises in that part.
E. Filtering
Filtering is a part to smooth an image and remove some noises, so that the main part of an object can be extracted. Median filtering is used to decrease the number of noises in image considering of scanning an entire image using particular kernel. The middle value of certain size of kernel is replaced as a new value of pixel. Median filter is also proposed to enhance the quality of image segmentation [26]. In this paper, we experiment with 25 x 25 size of kernel.
On the other sides, adding other filtering method has better decreasing noise method and has good result for image processing, since by using the second step after first filtering result can smooth remaining noises [27]. To smooth the homogeneous regions containing some noises, so that Gaussian filter is also proposed to strengthen
the main object of an image [28]. Kernel size to be applied in this paper is 25 x 25.
F. Image Segmentation
Otsu is considered as adaptive thresholding without any input parameter. Otsu is also method which represents global thresholding. Otsu is not well applied in food image with multiple food in a frame [29]. Thus, in this paper, we use single food in each compartment of tray box. Figure 6 shows the result of segmentation food image before and after being eaten.
Figure 6. Image segmentation before and after being eaten
G. Erotion and Dilation
Erosion and dilation are applied to strengthen the result of image segmentation. After utilizing filtering, an image may still have a part, called as noise, appears in segmented image. Erosion and dilation are types of morphological function, in which erosion is generally followed by dilation. Erosion is used to remove main object unimportant boundaries whilst dilation is adding pixels around the boundaries of object. Erosion and dilation are generally applied in the image processing [30]. We apply kernel of 3x3 using three times of iteration.
H. Proposed Method of Automatic Leftover Estimation
In a tray box, an image has four compartments from two menus, that are Menu 1 and Menu 2 in both color background of tray box: gray and black. Figure 7 depicts an information related to the original size that is considered to calculate the estimation of leftover. In this research, we use an area in each compartment without considering about the volume in tray box. Equation 1 until 8 is explained to have an estimation value of leftover measurement in each compartment.
Each part of compartment is calculated by utilizing equation 1.
Figure 7. Original size in each compartment from a tray box
where is compartment of in Figure 7, while is height of compartment, and is weight of compartment.
Ratio between one compartment to another is utilized to calculate the differences among area of compartments. In this observation, we focus on area of compartment to be registered at first towards to other compartment. Constant is set manually by user, in this case we iterate from 1 until 9 to understand what type of constants which is best as constant for called as variable as iteration. So that, other constant's compartment can be calculated as stated in the equation 2.
(2) where is constant of compartment in tray box and is compartment of .
(3) in which for getting can be calculated as formula in the equation 4 where n is the number of compartments in each tray box.
(4) when the number of each equals to 1 for , so the formula of is changed into new ratio which is described in the equation 5.
(5) After that, finding the value of is based on formula which is stated in the equation 6.
(6) Variable means pixel area of segmented image from food before being eaten, while is
pixel area getting segmented food image after being eaten. Then the stage of estimation going to calculate the ratio of original measurement in certain compartment of i and the size of compartment un the i as stated in the equation 1. The syntax of that ratio is specified in the equation 7.
(7) where is original measurement of food item weight obtained from manual scales. Then, finding by multiplication and as stated in equation 8.
(8) The last part of computing automatic prediction of leftover is calculated by dividing between representing original measurement to pixel of food image and area of compartment as defined in the equation 9.
(9)
I. Evaluation
In this paper, we use Root Mean Square Error (RMSE). RMSE is the root of MSE. This evaluation method is used since the unit measurement from the original and estimation of weight are same, that is gram. Equation 10 depicts the measurement evaluation using RMSE.
(10) is the number of compartments in the tray box, so by that case, in this paper .
IV. R
ESULTS ANDD
ISCUSSIONThe experimental results section consists of three parts. The first experiment measured the effect of constants on each menu in each tray box. As stated before, the constant of d is iterated from 1 to 9. The second experiment measured the effect of a lighting component from a color channel in each color space, on the result of leftover estimation, using the optimal value of the constant. Both the first and second experiment were applied on each compartment of each tray box with a different menu. The last experiment was intended to compare RMSE evaluation of the whole tray box, including its compartments. Besides, RMSE was calculated in different lighting component in order to understand the
effect of a lighting component towards the leftover measurement.
A. The Effect of Constants on Each Menu in the Tray Box
Based on Figure 8, showing the constant of cd ranging between one and nine, leftover estimation in the first compartment of Menu 1 was almost good in every lighting color component using cd = 6 and 7 in the black background, however it was much better with a cd above six for the gray background. The leftover estimation yielded unsatisfactory prediction when the segmentation result was bad, so that, the model or formula depended on the segmentation result. The second compartment had good measurements since it was based on the formula used in the previous section, stating that if the original measurement of a blank compartment is 0 grams, the estimation is 0 gram as well. In the third compartment, the optimal values of the constants were cd = 5, 6, 7, and 8. The best constant value was cd = 5 for the black tray box background and cd = 8 for the gray tray box background. In the fourth compartment, the estimation of each cd was far from the original measurement, but it was still slightly different when applying it to the black background of a compartment rather than to the gray background.
It also happened in Menu 2, as stated in Figure 9, that the first compartment was good when using cd = 3 to 7, but it yielded the best results when using cd = 3 and 4 for the black tray box background and above cd = 6 for the gray background. A promising result occurred with leftover estimation in the second compartment, which had food before being eaten, but was blank after being eaten. It meant that the whole item of food in that compartment was eaten. In the second compartment the best constant was represented by cd =3 to 7. However, in both black and gray tray box images, the best cd was 7 which was the closest to the leftover estimation. The third compartment was blank, so the prediction of leftovers was the same as before being eaten. Finally, the last compartment had the same estimation in all constants where the black background was closest to the estimation from the original scales, rather than the gray background. (a) (b) (c) (d) (e)
(f)
(g)
(h)
Figure 8. Leftover estimation in Menu 1 for each constant of
cd. First until the last row represents 1st, 2nd, 3rd, and 4th
compartment. Black background of tray box is (a), (b, (c), (d), while gray background is (e), (f), (g), (h)
(a) (b) (c) (d) (e) (f)
(g)
(h)
Figure 9. Leftover estimation in Menu 2 for each constant of
cd. First until the last row represents 1st, 2nd, 3rd, and 4th
compartment. Black background of tray box is (a), (b, (c), (d), while gray background is (e), (f), (g), (h)
B. Leftover Estimation Results Based on Optimal Value of the Constant Variable in Different Lighting Components
In this experiment, we observed the effect of five lighting components, from five color spaces, on both Menu 1 and 2 using different tray box background colors: black and gray.
Tables 1, 2, 3, and 4 show the information related to leftover estimation from each compartment of Menu 1, both in black and background color. In the first compartment of Menu 1, L of LUV produces the result nearest the original measurement with 6.71 grams out of 7 grams. It is also slightly different with L of LAB with 6.70 grams. Overall, the estimation is well applied in the black background compared to the gray background. From the visual measurement, the segmentation result in the black background is good, but many parts of the compartment with the gray background are not covered very well. This also happened in the third compartment with cd = 5; the estimation is exactly the same as the original measurement using L lighting component of LAB with 6.00 grams. The small differences between the prediction and original measurement was 5.98 grams which happened in the black background of tray box using L component of LUV color space. However, the estimation does not achieve good results in the gray background of the tray box, since the
segmentation result affects the automatic leftover estimation. In the fourth compartment, the estimation is good in Y of YUV and Y of YCbCr, but it has a slightly different value with the other color lighting components, so it is not affected significantly when using other lighting components in those five color channels. In the fourth compartment, black background of the tray box still produces a good segmentation rather than in the gray background.
Menu 1 of compartment 2 shows that the segmentation result in each compartment does not significantly differ between different color lighting components. Since the compartment before being eaten is blank, the measurement is exactly 0 grams, even though there is still a number of pixels to be counted in the automatic leftover algorithm. However, it still proves from visual measurement that compartments in the black background yield a better segmentation than the gray background, but it does not affect the estimation.
The leftover estimation in Menu 2 is shown in Tables 4 to 8 for the 1st, 2nd, 3rd, and 4th compartment consecutively. In the first compartment, the measurement which is nearest to the original measurement is when using V values of HSV with an estimation of about 10.05 grams out of 10 grams. The other lighting components have slightly different values and are equal to 10.50 grams out of 10 grams. The difference is less than 0.5 gram and this indicates that the estimation algorithm is running well in this part.
In the second compartment of Menu 2, the black background of the tray box still performs better than the gray background. It produces the same for the whole lighting component which achieves 0.30 grams out of 0 grams in a black background. In this case, it means that the lighting component does not affect the estimation significantly. However, the gray background becomes close to the correct measurement, 0.26 gram compared to 0 gram of the ground truth. It also proves that the automatic leftover estimation algorithm can reach good results even when the segmentation is not very good. Both of segmentation result and the whole algorithm of estimation is considered to achieve the best fit of prediction. The third compartment is blank, so it can estimate exactly that the estimation goes to 0 gram. The fourth compartment shows that L lighting component of LAB is nearest measurement with 105.56 gram out of 118 grams. The number of estimations is quite big, but it still good in the black background since the result of
segmentation is the best rather than gray background.
In Menu 1 of compartment 2 shows that the segmentation result in each compartment does not significantly different from one color lighting component of color spaces rather than other lighting component. Since the compartment before being eaten is blank, so that the measurement is exactly 0 gram even there still number of pixels that be counted in the automatic leftover algorithm. However, it still proves from visual measurement that compartment in the black background yields good segmentation rather than gray background, but it does not affect the estimation.
The leftover estimation in Menu 2 shown in Tables 4 to 8 for 1st, 2nd, 3rd, and 4th compartment, consecutively. In the first compartment, the measurement which is nearest from original measurement when using V values
of HSV with the estimation about 10.05 gram out of 10 grams. The other lighting component have the slightly different value of them about above and equal to 10.50 gram out of 10 grams. Therefore, the differences were only under 0.5 grams. This indicates that it is sufficient to automatically apply the estimation algorithm in this part.
In the second compartment of Menu 2, the black background of the tray box still produces a good gray background. It produces the same for the whole lighting component, which achieves 0.30 grams out of 0 gram for the black background. In this case, it means that the lighting component does not affect the estimation significantly. However, in the gray background, it becomes smaller, approaching the original measurement, which was 0.26 grams out of 0 grams.
Table 1.
Compartment 1 of Menu 1 and original measurements is 7 grams
Color component
Black background Estimation
in black background
(gr)
Gray background Estimation in
gray background (gr) Before being eaten (segmented) After being eaten (segmented) Before being eaten (segmented) After being eaten (segmented) L-LAB 6.70 (cd = 5) 6.43 (cd = 6) 6.43 (cd = 7) 5.36 (cd = 8) 9.67 (cd = 5) 7.74 (cd = 6) 7.74 (cd = 7) 7.39 (cd = 8) V-HSV 7.45 (cd = 5) 6.10 (cd = 6) 6.10 (cd = 7) 5.09 (cd = 8) 9.04 (cd = 5) 7.23 (cd = 6) 7.23 (cd = 7) 7.34 (cd = 8) Y-YCbCr 6.56 (cd = 5) 6.28 (cd = 6) 6.28 (cd = 7) 5.23 (cd = 8) 10.07 (cd = 5) 8.06 (cd = 6) 8.06 (cd = 7) 6.72 (cd = 8) L-LUV 6.71 (cd = 5) 6.43 (cd = 6) 6.43 (cd = 7) 5.36 (cd = 8) 9.96 (cd = 5) 7.96 (cd = 6) 7.96 (cd = 7) 6.64 (cd = 8) Y-YUV 6.56 (cd = 5) 6.28 (cd = 6) 6.28 (cd = 7) 5.23 (cd = 8) 6.71 (cd = 5) 6.43 (cd = 6) 6.43 (cd = 7) 5.36 (cd = 8) Table 2. Compartment 2 of Menu 1 Color component Black backgorund Gray backgorund L-LAB V-HSV Y-YcbCr L-LUV Y-YUV
It is also proved that the automatic leftover estimation algorithm can attain good results; but the segmentation is not quite good. Both segmentation results and the whole algorithm of estimation are considered to achieve the best fit for prediction. The third compartment is blank, so it can exactly assess that the estimation reaches 0 grams. The fourth compartment shows that the L lighting component of LAB is the nearest measurement with 105.56 grams out of
118 grams. The number of estimations is quite large, but it is still good in the black background since the result of segmentation is the best
compared to the gray background of the tray box images.
Table 3.
Compartment 3 of Menu 1 and original measurement is 6 grams
Color component
Black background Estimation in
black background
(gr)
Gray background Estimation in
gray background (gr) Before being eaten (segmented) After being eaten (segmented) Before being eaten (segmented) After being eaten (segmented) L-LAB 6.00 (cd = 5) 5.52 (cd = 6) 5.52 (cd = 7) 9.95 (cd = 6) 9.95 (cd = 7) 7.46 (cd = 8) V-HSV 5.88 (cd = 5) 5.47 (cd = 6) 5.47 (cd = 7) 9.04 (cd = 6) 9.04 (cd = 7) 6.79 (cd = 8) Y-YCbCr 5.94 (cd = 5) 5.64 (cd = 6) 5.64 (cd = 7) 9.80 (cd = 6) 9.80 (cd = 7) 7.35 (cd = 8) L-LUV 5.98 (cd = 5) 5.50 (cd = 6) 5.50 (cd = 7) 10.01 (cd = 6) 10.01 (cd = 7) 7.51 (cd = 8) Y-YUV 5.94 (cd = 5) 5.64 (cd = 6) 5.64 (cd = 7) 9.80 (cd = 6) 9.80 (cd = 7) 7.35 (cd = 8) Table 4.
Compartment 4 of Menu 1 and original measurement is 65 grams Color component Black background Estimation in black background (gr) Gray background Estimation in gray background (gr) Before being eaten (segmented) After being eaten (segmented) Before being eaten (segmented) After being eaten (segmented) L-LAB 77.71 (cd = 5, 6, 7, 8) 157.09 (cd = 5, 6, 7, 8) V-HSV 77.40 (cd = 5, 6, 7, 8) 155.16 (cd = 5, 6, 7, 8) Y-YCbCr 77.33 (cd = 5, 6, 7, 8) 155.61 (cd = 5, 6, 7, 8) L-LUV 77.49 (cd = 5, 6, 7, 8) 155.17 (cd = 5, 6, 7, 8) Y-YUV 77.33 (cd = 5, 6, 7, 8) 155.61 (cd = 5, 6, 7, 8) Table 5.
Compartment 1 of Menu 2 and original measurement is 10 grams
Color component
Black background Estimation in black background (gr) Gray background Estimation in gray background (gr) Before being eaten (segmented) After being eaten (segmented) Before being eaten (segmented) After being eaten (segmented) L-LAB 10.49 (cd = 3) 10.49 (cd = 4) 8.54 (cd = 5) 6.53 (cd = 6) 16.10 (cd = 4) 12.08 (cd = 5) 9.77 (cd = 6) 9.77 (cd = 7) V-HSV 10.05 (cd = 3) 10.05 (cd = 4) 8.25 (cd = 5) 6.60 (cd = 6) 14.72 (cd = 4) 11.04 (cd = 5) 9.72 (cd = 6) 9.72 (cd = 7) Y-YCbCr 10.48 (cd = 3) 10.48 (cd = 4) 8.60 (cd = 5) 6.88 (cd = 6) 15.59 (cd = 4) 11.69(cd = 5) 9.97 (cd = 6) 9.97 (cd = 7)) L-LUV 10.50 (cd = 3) 10.50 (cd = 4) 8.54 (cd = 5) 16.05 (cd = 4) 12.03 (cd = 5) 9.98 (cd = 6)
6.84 (cd = 6) 9.98 (cd = 7) Y-YUV 10.50 (cd = 3) 10.50 (cd = 4) 8.60 (cd = 5) 6.88 (cd = 6) 15.59 (cd = 4) 11.69 (cd = 5) 9.97 (cd = 6) 9.97 (cd = 7) Table 6.
Compartment 2 of Menu 2 and original measurement is 0 gram
Color component
Black background Estimation in
black background
(gr)
Gray background Estimation in
gray background (gr) Before being eaten (segmented) After being eaten (segmented) Before being eaten (segmented) After being eaten (segmented) L-LAB 0.42 (cd = 5) 0.35 (cd = 6) 0.30 (cd = 7) 0.36 (cd = 5) 0.30 (cd = 6) 0.26 (cd = 7) V-HSV 0.44 (cd = 5) 0.37 (cd = 6) 0.32 (cd = 7) 0.35 (cd = 5) 0.30 (cd = 6) 0.25 (cd = 7) Y-YCbCr 0.41 (cd = 5) 0.34 (cd = 6) 0.30 (cd = 7) 0.36 (cd = 5) 0.30 (cd = 6) 0.25 (cd = 7) L-LUV 0.41 (cd = 5) 0.35 (cd = 6) 0.30 (cd = 7) 0.36 (cd = 5) 0.30 (cd = 6) 0.25 (cd = 7) Y-YUV 0.41 (cd = 5) 0.34 (cd = 6) 0.30 (cd = 7) 0.36 (cd = 5) 0.30 (cd = 6) 0.26 (cd = 7)
C. RMSE Value in Each Tray Box in Different Lighting Components
From each tray box, we estimate the original measurement. In the previous subsection, it was already explained that the lighting component affects the result of segmentation and measurement value for certain, since for each compartment, the result of the best lighting component can be different, even though some of them have a small value of measurement. In order to have information related to the evaluation in each tray box, we apply RMSE for each Menu in each different color background of tray boxes.
Table 7.
Compartment 3 of Menu 2
Color
component Black backgorund
Gray backgorund L-LAB V-HSV Y-YcbCr L-LUV Y-YUV Table 8.
Compartment 4 of Menu 2 and original measurement is 118 grams Color component Black background Estimation in black background (gr) Gray background Estimation in gray background (gr) Before being eaten (segmented) After being eaten (segmented) Before being eaten (segmented) After being eaten (segmented) L-LAB 105.56 (cd = 3, 4, 5, 6, 7, 8) 164.11 (cd = 3, 4, 5, 6, 7, 8) V-HSV 104.35 (cd = 3, 4, 5, 6, 7, 8) 262.28 (cd = 3, 4, 5, 6, 7, 8) Y-YCbCr 104.04 (cd = 3, 4, 5, 6, 7, 8) 162.87 (cd = 3, 4, 5, 6, 7, 8) L-LUV 105.27 (cd = 3, 4, 5, 6, 7, 8) 167.02 (cd = 3, 4, 5, 6, 7, 8) Y-YUV 104.90 (cd = 3, 4, 5, 6, 7, 8) 162.86 (cd = 3, 4, 5, 6, 7, 8)
lighting component Y of YCbCr and YUV produced the same RMSE for each constant of cd with a maximum of approximately 6.67. Compared to the gray background, there was a significant difference with RMSE above 46.00 in lighting component V of HSV. This indicates that the black background in Menu 1 gives remarkable results. This is also confirmed in the segmentation results for the black background is much better present good than the gray one. The RMSE of Menu 1 in the gray background is presented in Table 11.
While the RMSE in Menu 2 was better when applied in lighting component L of LAB with 6.85 in the cd = 3 and 4 in the black background, the other lighting component was not significantly different. Table 10 shows the result of RMSE in each lighting component in Menu 2 of the black background. The second nearest RMSE of L-LAB was L-LUV. L-LAB was still higher than the other lighting component since the fourth compartment of Menu 2 contributed the largest value nearest to the original measurement compared to the other lighting components. However, the gray background in Menu 2, as shown in Table 12, had a huge RMSE with the lowest reached by lighting component Y of YCbCr and YUV with 156.36 for all constants of cd. The lowest RMSE in the gray background could also be calculated from each compartment. In this case, the RMSE of each tray box lighting component is dependent on the segmentation result in each compartment. Lighting component L is good for Menu 2 in the black background since the fried rice is covered very well by segmentation using L-LAB rather than other color spaces. V-HSV is helpful when the segmentation is applied in the gray background for the lighter objects. YCbCr and Y-YUV is almost equally good in both black for Menu 1 and gray for Menu 2. L-LUV does not present the best but is still adequate to estimate leftover.
Table 9.
RMSE Menu 1 in black background
cd L-LAB V-HSV Y-CbCr L-LUV Y-YUV
5 7,01 6,89 6,73 6,96 6,73 6 6,94 6,86 6,67 6,89 6,67 7 6,94 6,86 6,67 6,89 6,67 8 7,06 6,98 6,79 7,00 6,79
Table 10.
RMSE Menu 2 in black background
cd L-LAB V-HSV Y-CbCr L-LUV Y-YUV
3 6,85 7,24 7,03 6,89 7,03 4 6,85 7,23 7,03 6,89 7,03 5 6,89 7,29 7,07 6,93 7,07 6 7,04 7,45 7,23 7,08 7,23 7 7,04 7,45 7,23 7,08 7,23 Table 11.
RMSE Menu 1 in gray background
cd L-LAB V-HSV Y-CbCr L-LUV Y-YUV
5 48,37 46,90 47,19 47,66 47,19 6 48,13 46,70 46,96 47,42 46,96 7 48,13 46,70 46,96 47,42 46,96 8 48,08 46,66 46,91 47,37 46,91
Table 12.
RMSE Menu 2 in gray background
cd L-LAB V-HSV Y-CbCr L-LUV Y-YUV
3 159,27 407,93 156,36 166,51 156,36 4 159,27 407,93 156,36 166,51 156,36 5 159,24 407,92 156,33 166,48 156,33 6 159,23 407,92 156,33 166,47 156,33 7 159,23 407,92 156,33 166,47 156,33
V. C
ONCLUSIONIn this study, the use of an automatic leftover estimation algorithm is proposed for the different lighting components of image segmentation in the tray box using CLAHE equalization. All datasets are taken using Smart Nutrition Box as our prototype. The optimal constant reach for cd = 5 in Menu 1 is the black background with RMSE of 6.73 using Y lighting component of YCbCr and YUV, while in Menu 2 it is the black background with constants of 3 and 4 with RMSE of 6.85 via L-LAB. In this case, the black background of each tray box shows good segmentation with less RMSE compared to the gray background.
For future research, we need to improve the estimation algorithm relative to some color channels for the other spaces in the white background since the default color of the tray box is white. The choice of color space is also one indicator of the success or failure of the segmentation process, because segmentation is one approach that can be used to estimate food zones. The automatic identification of food could also be determined in future studies.
A
CKNOWLEDGMENTThe authors would like to say thanks to those who contributes in this research, included research department in University of Brawijaya (LPPM) and PPIKID University of Brawijaya of giving fully supported to this research. This research is collaboration between Faculty of Computer Science and Nutrition Laboratory at Faculty of Agricultural Engineering, University of Brawijaya.