Project 8 : Kaggle Vision Competition Project - Part 3

October 11, 2022

Sign Language Image Classification part 3.

Part 3 of this Kaggle competition will explore the power of Ensembling.

The idea behind ensembles is that you train different model types on a dataset and then you average the results. This seems rather simple, the problem being that most online websites that provide access to free gpus set a limit on the amount of RAM(random access memory) the user can use at a given time. The bigger and more powerfull the pretrained model is the more RAM it will use on the gpu.

`Introduction: Business Problem`

Description: The sign language database of hand gestures represents a multi-class problem with 20 classes of letters and numbers (excluding some classes which require motion).

Evaluation: The evaluation metric for this competition is accuracy.

`Methodology`

The project will be executed by completing the following tasks:

Setup
Data manipulation
Memory and gradient accumulation

Clear GPU cache
Train function
checking memory use
mount google drive

Running models

convnext_large_in22k Model
vit_large_patch16_224 Model
swinv2_large_window_192_22k Model
swin_large_patch4_window7_224 Model

Ensembling

Loading TTA arrays
Loading specific models from gdrive
Ensemble calculation

Submit

Link to Github Repository

Link to Blog post