Project 8 : Kaggle Vision Competition Project - Part 3
Sign Language Image Classification part 3.
Part 3 of this Kaggle competition will explore the power of Ensembling.
The idea behind ensembles is that you train different model types on a dataset and then you average the results. This seems rather simple, the problem being that most online websites that provide access to free gpus set a limit on the amount of RAM(random access memory) the user can use at a given time. The bigger and more powerfull the pretrained model is the more RAM it will use on the gpu.
Introduction: Business Problem
Description: The sign language database of hand gestures represents a multi-class problem with 20 classes of letters and numbers (excluding some classes which require motion).
Evaluation: The evaluation metric for this competition is accuracy.
Methodology
The project will be executed by completing the following tasks:
- Setup
- Data manipulation
- Memory and gradient accumulation
- Clear GPU cache
- Train function
- checking memory use
- mount google drive
- Running models
- convnext_large_in22k Model
- vit_large_patch16_224 Model
- swinv2_large_window_192_22k Model
- swin_large_patch4_window7_224 Model
- Ensembling
- Loading TTA arrays
- Loading specific models from gdrive
- Ensemble calculation
- Submit
