Behind the Mask


For a group project at General Assembly, we were instructed to find and use data to be used in emergency response situations. Collectively, we chose to build a neural network that would input social images to determine if the individual(s) was wearing a mask. We chose Twitter and used two hashtags, #maskup and #nomaskselfie, to represent the different groups that would or would not be wearing masks. We got some interesting images from these feeds and chose to filter to just photos of individual faces wearing or not wearing masks.

The technical stuff...

I created a Twitter crawler using the API Tweepy to scrape data from the mentioned hashtags. I then borrowed a lot of code from Mirza Mjutaba’s Kaggle notebook to build a MobileNetV2 image classifier to train on data obtained from a Kaggle dataset. This dataset had images and associated XML files that contained the bounding box coordinates of those with masks on. My model ended up with an F1 score of 82% after 2 runs of 20 epochs.


What was interesting?

It was challenging to build an image classifier since we hadn’t gone over any information about neural networks at this point in our instruction at General Assembly. I found the biggest roadblocks in condensing the Twitter images on the prediction side. In my code, I had issues with resizing after condensing to an array and went with np.expand_dims to overcome those issues.

Ultimately, my model had much higher accuracy scores in predicting people not wearing masks (89%) vs wearing them(50%). We hypothesized that was due to different mask patterns. Somebody wearing flower patterns, or a pattern that depicted a mouth may make it confusing for a neural network to understand. To solve this, I would like to train this model on more elaborate mask patterns to introduced that variety.

We also speculated on what could be commercial applications of this model. My favorite was to use it as a way of predicting areas that would have higher infection rates by mapping locations of higher anti-mask rates. The numbers presented in the accompanying notebook and pdf show a national average based on all the Twitter images. But, I could scrape tweets (using various hashtags) per state or county, put the images through the model, and map the percentage of mask vs non-mask wearers.