Measuring and Analyzing Food

(1)

Measuring and Analyzing Food

Intakes

Presented to NTU

2010.10.20

Hsing-Yi Chang

(2)

You are What you Eat

 But do you who you are?

(3)

OUTLINE



Food and Nutrients



Measuring diet and their associated problems

 Patterns vs. daily intakes

 _Examples



Validation of dietary measurements

 Assess and cope in analysis with the errors which are likely to arise in dietary measurement



Nature of variation in diet

 Estimation problems and corrections

References

Margetts, B.M. and Nelson, M. 2000, Design concepts in Nutri

tional Epidemiology

Willett, W. 1998, Nutritional Epidemiology

(4)

(5)

Nutrients vs. Foods

 Representing diet as specific compounds or

groups of compounds



The information can be directly related to our

fundamental knowledge



The structure is known, ready to be synthesized

and used for supplementation



Total intakes is easy to use and powerful in

testing hypothesis

(6)

Nutrients vs. Foods

 Analysis based on foods



Directly relate to dietary recommendations



Foods are an extremely complex mixture of

chemicals that may compete with, antagonize, or

alter the bioavailability of any single nutrient

contained in that food, it is not possible to predict

with certainty the health effects.

(7)

What Nutrition Epidemiologists Normally

Do After Collecting Data?

 Compute the contribution of nutrient intake

from various food groups

 Pattern analysis via factor analysis or

cluster analysis. Aggregate empirically

individuals with similar diets based on their

reported intake of foods.

(8)

Measuring diets/Intakes

(9)

Food Consumption



Commonly used methods for measuring nutrient

intakes

Chemical

analysis _Food Composition Tables

Records Interview Short cut

Duplicate portio n

Aliquot samplin g

Equivalent com posite

Precise weighing Weighed inventory Intake record

Recall

Dietary history

Record Recall

(10)

Individual Surveys – 24 hr. Recall

 Method for assessing present or recent diet



24-hour-recalls

Commonly used in cross-sectional studies

Interviewed or written information about the previous day’s intakes. Food items and quantities/portion sizes are recorded, if possible.

Advantages: quick and easy (NHANES III)

Subjects to day-to-day variation, miss-classification

• Repeated measurements are required to estimate the variation

The validity of the method has been assess by

comparing results with weighed records, diet histories, and biological markers. The results are not consistent.

(11)

Individual Surveys –24 hr. Recall

 Dietary records



Subjects are taught to describe and give an

estimate of the weight of food immediately before

eating in writing or verbal records.

Normally, literate and highly motivated subjects would complete the task.

May alter eating behaviors while recording

 Sources of errors



Respondent and recorder errors

Social desirability (obese subject under-report food or recording process alter eating behavior), forget to

record foods not eaten at home, etc.

(12)

Individual Surveys- Food Frequency



Food frequency questionnaire (FFQ)

 Most frequently used. Assessing usual eating habits, over recent month or year.

 Comprise a list of foods most informative about the

nutrients or foods of interests. Instruments vary from very short to a long list of over 100 food items.

 Frequency and portion sizes should be closed rather than open. (This is to minimize coding errors). Always contain the options ‘never’.

 Every questionnaire should be rigorously pre-tested and validated.

(13)

Approaches for Evaluating Dietary QN



Comparison of means with data from other

sources. This is done in the same population



Proportion of total intake accounted for by foods

included on the QN

 Inclusion of 24-hr-recall for the important food

 Compare the results from FFQ with 24-hr-recall

 Reproducibility, repeated in few days or weeks

 Validity: compare individual values with an independent measure of diet, say gold standard.

 Compare with biochemical markers

(14)

Examples

 _NAHSIT

 _{2009 NHIS}

 _HALST

(15)

Food Composition Data Sources and

Computation

 Exiting database: 食工所食物組成表



Extensive and comprehensive database for nutr

ient values of all foods that might be reported b

y subjects (for open questions like 24-hr-recall).



Computer software

 Laboratory analysis

(16)

Nature of Variation in Diet



We are interested in long-term diet. An

understanding of day-to-day variation in dietary

intake is essential in choosing an appropriate

method to assess diet and to interpret data

collected by different approaches.



Source of variation

 _Day-to-day

 Day of the week

 Day of the season



Degree of variation differs according to nutrients

 Macro nutrients, which contribute largely to caloric intake, are relatively stable

 Micro nutrients, which tend to concentrate in certain foods, vary a lot.

(17)

(18)

Within Individual Variation

(19)

Between Individual Variation

Average of 194 women

(20)

Different Number of Days

(21)

Seasonal Variation

(22)

Estimating Within Individual Variation



Repeated Measures are needed

where ε represents day-to-day variation



The ratio of within to among individual variance (or

intra/inter variability) is

It can be estimated by where  is the average

correlation coefficient between repeated

measurements.

2 2 b w



 



  

 _subjec

_i

NutrientY



 1

( Liu et al, 1979)

(23)

ρ’s estimated by different studies

(24)

Effect of Within-individual-variation on

regression

Observed

True

2 2

)

|

var(

u x

x u

W

x

Y  



_



 







   





₀ _x

X cannot be observed, instead W is observed. W is the measurement with error, x is the true value.

W=X+U (Fuller 1987)

(25)

Effect of Within-individual Variation on

Association Studies

(26)

Correction for Attenuation



For the following situations

 X is a scalar

 The measurement error is additive with error variation estimated by

 The covariates (X,Z,W) are jointly normally distributed

 The response is affected by a linear combination of the p redictor, namely . This can be linear, log istic, probit of loglinear regression



For estimating the effect of X, namely β

x

, the regr

ession calibration estimator is formed in three ste

ps

2

u

_ _ˆ

_u²







 

_x



_z^t



₀

(27)

The Steps



Let be the naïve estimator formed by

ignoring measurement error



Let be the regression squared error for a

linear regression of W on Z. This is the sample

variance of the W’s if there are no other

covariates Z;



The regression calibration estimator is

 ^ˆ

x

2

ˆ

_w|_z



ˆ )

/( ˆ

) ˆ

Navive

ˆ (

² ²

| 2

|z w z u

w

x

^ ^ ^

 

R. Carroll, D. Ruppert, and L.A. Stefanski, 1995

(28)

Estimating Population Distribution

 Subject to within individual variations

 Nusser et. Al, 1997, JASA



Transform data to normally distributed via Box-Co

x transformation



Estimate the within individual variation



Remove the within individual variation from the tot

al variance



Back transform the data to original scale based o

n normal with the new variance

(29)

Another Approach

Can be rewritten as

(30)

(31)

Estimation Steps



Estimate the within individual variance and the

ratio of within to among individual variance



Sampling weights



Estimate the parameters of the observed

distribution accounting for sampling weights



Estimate the adjusted variance by applying the

results of 1 to the results of 3



Reconstruct the adjusted distribution using the

results of 4

SAS program

(32)

Variation in 24-hr-recall (nutrients)

(33)

(34)

Ratio of Within to Among Variation

(35)

(36)

(37)

Nutrient 1% 25% 50% 75% 99% Energy (KCAL) 1662.1

(552.9) ^1996.0(1602.2) 2368.1 (2164.8) 2975.4

(2857.2) 4004.1 (8502.7) Protein (g) 52.2 ( 21.9) 78.2 ( 60.2) 90.9 ( 81.6) 105.4 ( 108.3) 145.7 ( 411.0) Fat (g) 39.7 ( 5.7) 66.7 ( 31.6) 80.7 ( 58.0) 97.0 ( 105.6) 145.4 ( 636.8) Carbohydrate

(g) 151.0 ( 49.0) 252.0 ( 213.2) 304.5 ( 292.3) 360.4 ( 370.4) 548.3 ( 814.3) Fiber (g) 1.7 ( 0.2) 3.8 ( 2.0) 5.0 ( 3.2) 6.5 ( 5.8) 11.2 ( 46.9) Calcium (mg) 220.7 ( 23.2) 382.2 ( 209.3) 467.9 ( 369.5) 564.0 ( 600.9) 851.4 (1951.0) Phosphorus

(mg) 630.9 (198.7) 1007.4

( 764.4) 1192.6 (1039.7) 1423.6

(1475.2) 2022.4 (5830.9) Iron (mg) 8.2 ( 1.7) 13.4 ( 8.4) 16.0 ( 11.9) 18.9 ( 18.7) 28.3 ( 117.7) Vitamin A (I. U.) 3251.7

( 80.5)

5630.0 (1533.3)

6843.7 (3188.7) 8260.3 (6885.6)

12932.0 (56173.6)

Vitamin B (mg) 0.8 ( 0.2) 1.2 ( 0.8) 1.5 ( 1.2) 1.7 ( 1.7) 2.5 ( 7.9) Vitamin B2 (mg) 0.7 ( 0.2) 1.2 ( 0.8) 1.4 ( 1.1) 1.7 ( 1.6) 2.4 ( 6.7) Niacin (mg) 9.4 ( 4.0) 15.7 ( 10.8) 19.0 ( 15.9) 23.0 ( 22.0) 34.5 ( 95.5) Vitamin C (mg) 47.2 ( 0.5) 119.5 ( 42.6) 161.9 ( 92.3) 214.7 ( 175.0) 400.0 (1425.9) SFA (g) 11.8 ( 1.7) 21.9 ( 10.4) 27.1 ( 17.9) 33.3 ( 34.9) 52.6 ( 218.5) PFA (g) 11.4 ( 1.0) 17.5 ( 7.8) 21.2 ( 14.4) 24.2 ( 28.1) 34.0 ( 110.0) Oleic acid (g) 11.9 ( 1.9) 23.7 ( 11.4) 30.3 ( 19.6) 37.6 ( 39.6) 61.3 ( 281.6) Cholesterol

(mg) 169.2 ( 3.7) 330.8 ( 175.1) 417.1 ( 393.8) 521.5 ( 620.9) 837.8 (1240.0)

(38)

Problem With Large Number of 0’s



There are two kinds of zero’s in the 24 hr. recall

data.

 One is that the individual never eats the food.

 The other is the individual did not eat the food 24 hours before the interview

.



The Food Frequency Questionnaire provides

information on whether an individual consume

certain food.

 We used this information to identify which kind of zero is observed in 24-hr-recall

(39)

Method

We use the Zero-Inflated Model to model usual food

intakes with large number of zero’s. The distribution

function of the amount of usual food intakes is

following:

if _F _{  } _x _ ₁ _ _p _ _ _pF

_a

_{ } _x

x

 ₀

Otherwise _F _{ } ₀ _ ₁ _ _p

is a random variable that represents the amount of

usual intake for some kind of food

is the probability that an individual consume this food

x

p

(40)

Correction for the effects of measurement error

 In estimating the usual intakes, the first task is to estimate the ratio of within-to-amon g variation. We use the method proposed by Delcourt (1994) that estimate the correl ation coefficient of the amount of intakes measured at two time point firstly.

 Consider that the Pearson correlation coefficient can be expressed as:

 Suppose that there are two time point measures. and are the standard deviati on for the first time measure and the second time measure. is the estimation of Pe arson correlation coefficient between measures.

 Let is used to estimate . Then ,

is a modified estimate of .

 When there are more than two time points, the average coefficient is used.

2 2

2 e s

s



 

 

s

2



₁² ₂²



²

2 _s _s

s  

_ 

_s²

 

_e²

_

2



s

r

s

²

s

1

(41)

 

^x ^pf

 

^x

f



_a ^x

^ ⁰

f

a

The density function is ^when

If

follow a gamma distribution with parameter α, β

   

 





x

a

^x ^x ^e

f

^ ^

  ¹

¹

The overall mean is pαβ, and the variance is

p

²

αβ

²

.

(42)

Example

 估計米飯類的飲食量分配

 _{G1 米類：}



G1_1 米類：糙米、糯米、胚芽米、小米等。



G1_2 加工製品類：新竹米粉、米糕、米漿等。

(43)

Example (data)

 24-Hour Dietary Recall



已經整理好的資料檔，欄位包含有性別、地區層

別、問卷權數、年齡、 G1~G27(27 種食物類別 )

和個案編號。

(44)

Example (data)

 _{重覆測量資料}



欄位有性別、問卷權數、重覆個案編號、年齡、

G1~G27(27 種食物取用量 )

(45)

Example (data)

 Food-Frequency Questionnaires (FFQ)



_{欄位包含有}

(46)

Example (sas code)

 _資料整理

 _{截取米飯類食用量} (24-Hour Dietary Recall) data food_G1; /* 製作一個新的檔案 */

set Tmp1.Big;

if G1=0 then delete; /* 當食用量不為 0 的時後 , 選入檔案 */ keep id agecate order sex stra agen qwt G1;

run;

 _{截取米飯類食用量} _{( 重覆測量資料 )}

data food_R_G1; set Tmp1.Rv_Big; if G1=0 then delete;

keep id agecate order sex stra agen qwt G1; run;

 _合倂檔案

proc sort data=food_G1; by id;

proc sort data=food_R_G1 by id;

run; data G1;

merge Food_g1 Food_r_g1; by id ;

if G1 and H1; run;

(47)

Example (sas code)

 _{計算參數估計值}

proc corr data=G1; var G1 H1;

run; Simple Statistics

Variable N Mean Std Dev Sum Minimum Maximum G1 534 139.46474 115.96585 74474 0.06940 1127 H1 534 126.63431 97.76724 67623 1.55232 643.85882

Pearson Correlation Coefficients, N = 534 Prob > |r| under H0: Rho=0 G1 H1 G1 1.00000 0.31677 <.0001 H1 0.31677 1.00000 <.0001

S₁ S₂ γ α*β α*β^2 α β

115.96 97.77 0.32 133.05 ^3680.9₁ 4.81 27.67

(48)

Example (sas code)

 _{計算食用比例估計值} _(FFQ)

proc freq data=Tmp1.Dall1; table fda1;

run;

 96.01/(96.01+2.02)=0.98

The FREQ Procedure

米飯類

Cumulative Cumulative fda1 Frequency Percent Frequency Percent --- 0 94 2.02 94 2.02 1 4385 94.00 4479 96.01 9 186 3.99 4665 100.00

(49)

Example (estimation of the distribution)



_利用 Excel 函數 GAMMAINV(probability,alpha,b

eta) 畫出食用量的分配圖形

 _相關參數

(50)

Example (estimation of the quantiles)

 _利用 Excel 函數 GAMMAINV(probability,al

pha,beta) 計算分配 quantiles 的估計值

(51)

Results

Rice

(52)

Green Vegetable

(53)

Much to Be Learned

(54)

1

(55)

2

(56)

Measuring and Analyzing Food