An application for controlling the keyboard and mouse using gestures. (GitHub)[https://github.com/Samoed/MyFirstDatascienceProject]
An application for controlling the keyboard and mouse using gestures.
git clone https://github.com/Samoed/MyFirstDataScienceProject
pip install -r requirements.txt
If the camera window doesn't appear when launching the application, runpython ui_app.py
ui_app.py
with the -d
or --device
flag to switch the camera:
python ui_app.py -d 1
During the development of the application, the idea arose to build it in Docker using PyInstaller. However, a problem arose when using the Pynput library, which interacts with the keyboard and mouse in Linux using X-libs and evdev.
To work with Pynput inside a Docker container, it is necessary to install linux-headers. However, this needs to be done separately for each type of system. In the case of Debian-like systems, installing header files via apt is straightforward. However, for other systems like Arch-like ones, there are difficulties in adding linux-headers.
One possible solution is to copy the header files inside the container, but this doesn't seem to be the optimal solution.
The video stream from the camera is obtained using opencv
and passed to MediaPipe
, which then detects hand landmarks. These landmarks are then passed to a model that predicts the gesture. Based on the predicted gesture, a specific action is taken (mouse movement or mouse/keyboard button press).
The data was taken from the Kaggle dataset. From this dataset, 11 hand gesture classes were selected. Additionally, videos were recorded for 4 gestures for training purposes. In total, over 15,000 photos were collected, around 1,000 for each class.
Gesture photos were processed using Mediapipe
and saved as numpy
arrays.
Experiments were conducted with various models. The evaluation metric used was accuracy
, since the classes were balanced, but f1
was also considered. The following models were used:
The code for training the models is located in the experiments
folder. mlflow
was used to store the experiment results.
Optimal hyperparameters were tuned for each model using the optuna
library.
After training, SVM was chosen as the best model due to its performance.
Model | Accuracy | F1 |
---|---|---|
Logistic Regression | 0.942 | 0.942 |
Support Vector Machine | 0.991 | 0.99 |
Random Forest | 0.969 | 0.969 |
XGBoost | 0.989 | 0.989 |
CatBoost | 0.987 | 0.987 |
Neural Network | 0.978 | 0.978 |
Confusion matrix for SVM: 0. two_fingers_near
Test accuracy for the neural model:
Cookies help us deliver our services. By using our services, you agree to our use of cookies.