The Hamlyn Centre
The Institute of Global Health Innovation

Benny Lo, PhD 
The Hamlyn Centre
Department of Surgery and Cancer

Passive Dietary Monitoring - the use of 
wearable cameras and AI to quantify 
dietary intake


An innovative passive dietary monitoring system 

The Bill and Melinda Gates Foundation funded project “An Innovative passive dietary
monitoring system” aims to develop a passive dietary monitoring system for people living
in Low-or-Middle Income Countries (LMICs) which does not rely on individual participation
to record intake. This project focuses on both urban and rural areas in two African
countries, Uganda and Ghana. To capture individual dietary intake, wearable camera
technologies and fixed cameras are integrated into the system for capturing food
preparation and eating activities in kitchens and dining areas. Extensive studies and field
trials are being carried out in home settings in Uganda and Ghana.


Nutrition intake estimate

Context

Nutrient

Water (g) 146.92

Energy (kcal) 190

Protein (g) 3.44

Fat (g) 1.28

Carbohydrate (g) 46.5

Fiber (g) 3

Sugars (g) 38.16

Calcium (mg) 48

Iron (mg) 0.46

Magnesium (mg) 58

Phosphorus (mg) 42

Potassium (mg) 896

Sodium (mg) 4

Vitamin C (mg) 27.4

~200g

Volume 
estimation

Jackfruit
Food recognition

USDA National Nutrient Database


A wearable camera was mounted on a subject’s shoulder, at the same side 
as the subject’s dominant hand, to capture the entire eating episode

Food consumption

Qiu, J., Lo, F.P.W., Jiang, S., Tsai, C., Sun, Y. and Lo, B., 2020. Counting Bites and Recognizing Consumed Food from Videos for Passive Dietary Monitoring. IEEE Journal of Biomedical and Health Informatics.


Egocentric Video Clips

Snapshots of captured egocentric videos. 

• A new dataset was constructed, which has 1,022 egocentric video clips of dietary intake. 66
unique and visible food items were identified in the dataset

Qiu, J., Lo, F.P.W., Jiang, S., Tsai, C., Sun, Y. and Lo, B., 2020. Counting Bites and Recognizing Consumed Food from Videos for Passive Dietary Monitoring. IEEE Journal of Biomedical and Health Informatics.


Dataset

Meal statistics.

- 8 meal classes
- 18 fine-grained meal classes
- 66 unique food items

(food ingredients and drinks)


Results of Bite Counting
• 64.89% accuracy of counting bites directly from videos

Qiu, J., Lo, F.P.W., Jiang, S., Tsai, C., Sun, Y. and Lo, B., 2020. Counting Bites and Recognizing Consumed Food from Videos for Passive Dietary Monitoring. IEEE Journal of Biomedical and Health Informatics.


Results of General Food Recognition

• 97.55% accuracy of classifying a meal into 8 classes; 54.77% accuracy of classifying it
into 18 classes. 65% accuracy of recognizing visible food items.

Qiu, J., Lo, F.P.W., Jiang, S., Tsai, C., Sun, Y. and Lo, B., 2020. Counting Bites and Recognizing Consumed Food from Videos for Passive Dietary Monitoring. IEEE Journal of Biomedical and Health Informatics.


Results of Consumed Food Recognition
• 40.5% accuracy of recognizing food items consumed by the subjects

Qiu, J., Lo, F.P.W., Jiang, S., Tsai, C., Sun, Y. and Lo, B., 2020. Counting Bites and Recognizing Consumed Food from Videos for Passive Dietary Monitoring. IEEE Journal of Biomedical and Health Informatics.


Results
Ground Truth

0: chicken
1: water
2: rice
3: takuan
4: celery
5: green_bean

TSM (F1 59.5%) SlowFast (F1 65.0%)
0: chicken
1: rice
2: takuan
3: celery
4: green_bean

0: water
1: rice
2: takuan
3: celery
4: green_bean
5: pork_ribs

0: water
1: prawn
2: mussel
3: pasta
4: squid
5: tomato_sauce

0: water
1: prawn
2: mussel
3: pasta
4: squid
5: tomato_sauce

0: water
1: mussel
2: pasta
3: tomato_sauce

0: chicken
1: water
2: broccoli
3: rice
4: carrot
5: teriyaki_sauce

0: chicken
1: water
2: broccoli
3: rice
4: carrot
5: teriyaki_sauce

0: water
1: rice
2: carrot
3: teriyaki_sauce

0: chicken
1: rice
2: tofu
3: miso_soup
4: tomato
5: sushi_vegetable
6: soy_sauce
7: lettuce

0: rice
1: tofu
2: miso_soup
3: tomato
4: salmon
5: sushi_vegetable
6: soy_sauce
7: lettuce

0: rice
1: salmon
2: sushi_vegetable
3: soy_sauce
4: lettuce

0: rice
1: curry
2: pickled_radish

0: rice
1: curry
2: carrot

0: rice
1: curry

0: rice

0: baked_beans
1: hash_browns
2: scrambled_eggs

0: baked_beans
1: hash_browns
2: scrambled_eggs

0: baked_beans
1: scrambled_eggs

0: prawn
1: pasta
2: tomato_sauce

0: pasta
1: tomato_sauce

0: pasta
1: tomato_sauce

0: celery
1: green_bean

0: celery
1: green_bean

Sample Frames

TSM         (F1 37.8%) SlowFast (F1 40.5%)Two-Head

Recognized food items 
(ingredients and drinks). Top 4 
rows are samples of recognizing 
visible food items and bottom 4 
rows are samples of recognizing 
consumed food items in a clip. 
True positives are indicated using 
green color and false positives 
are in red color. 

Qiu, J., Lo, F.P.W., Jiang, S., Tsai, C., Sun, Y. and Lo, B., 
2020. Counting Bites and Recognizing Consumed Food 
from Videos for Passive Dietary Monitoring. IEEE Journal 
of Biomedical and Health Informatics.


Studies


Studies

• Study 1: Laboratory validation of food intake estimation devices
• Study 2: Acceptability and feasibility in the field
• Phase 1: Household food behavior
• Phase 2: Pre-field test data gathering prior to the preliminary field test:

• Acceptability of the devices
• Preliminary field test for acceptability, reliability and performance of recording devices

• Study 3: Field validation studies in Uganda and Ghana
• Phase 1: Preliminary field data (~4 households at each site (~ 16 in total) 

lasting one day)
• Phase 2: System validation in target populations (in ~22 households at each 

site (~88 in total) lasting three consecutive days)


Large datasets – Ghana study

• Study 1
• 700k images

• Study 2
• 2.9M images

• Study 3
• ~7M images

Food images captured by eButton


Clustering Egocentric Images in Passive Dietary 
Monitoring with Self-Supervised Learning

In passive dietary monitoring, wearable cameras continuously capture subjects' activities, which 
yields massive amount of data to be cleaned and annotated before analysis being conducted.

Peng et al, Clustering Egocentric Images in Passive Dietary Monitoring with Self-Supervised Learning, in IEEE International Conference on Biomedical and Health Informatics 
(BHI22), Ioannina, Greece, Sep 27-30 2022


Objective

We propose a novel self-supervised learning framework, named CM-Net to:
• Cluster the large volume of egocentric images into separate events
• Ease the data post-processing and annotation tasks for annotators and dietitians

The proposed pipeline for clustering raw egocentric images into separate events.

Peng et al, Clustering Egocentric Images in Passive Dietary Monitoring with Self-Supervised Learning, in IEEE International Conference on Biomedical and Health Informatics 
(BHI22), Ioannina, Greece, Sep 27-30 2022


Datasets

Mother
55206
21%

Father
21153
8%

Child
36297
14%

Mother
84402
33%

Father
37244
15%

Child
22570
9%

Urban
112656
44%

Rural
144216
56%

Figure 3: Statistics of Dataset-L (a) and Dataset-S (b)
(a) (b)

Dataset-L (Large): includes 256,872 unprocessed egocentric images (no labels) taken from various 
individuals, households, areas. Used for self-supervised pre-training.
Dataset-S (Small): includes 4954 images with 199 different dietary events. Each image is assigned with 
a label indicating which event it belongs to. Used for testing the performance of self-supervised 
learning frameworks.

Peng et al, Clustering Egocentric Images in Passive Dietary Monitoring with Self-Supervised Learning, in IEEE International Conference on Biomedical and Health Informatics 
(BHI22), Ioannina, Greece, Sep 27-30 2022


Results - Clustering 

CM-Net MAE

96&97

Outlier in 96&97
44 38

44

38

25&47&55&111

Our CM-Net is able to merge images from the same event into a cluster where MAE fails, and better separate events with similar 
images than MAE. For example, event 44 resembles event 38 as they contain similar actions and objects. They both depict eating with a 
bowl (yellow and orange, respectively). Our CM-Net recognizes this difference and separates these two events distinctly whereas MAE 
clusters them together.

Peng et al, Clustering Egocentric Images in Passive Dietary Monitoring with Self-Supervised Learning, in IEEE International Conference on Biomedical and Health Informatics 
(BHI22), Ioannina, Greece, Sep 27-30 2022


The Hamlyn Centre
The Institute of Global Health Innovation

Benny Lo, PhD
benny.lo@imperial.ac.uk