An application for controlling the keyboard and mouse using gestures.
git clone https://github.com/Samoed/MyFirstDataScienceProject
pip install -r requirements.txt
If the camera window doesn't appear when launching the application, run
ui_app.py with the
--device flag to switch the camera:
python ui_app.py -d 1
During the development of the application, the idea arose to build it in Docker using PyInstaller. However, a problem arose when using the Pynput library, which interacts with the keyboard and mouse in Linux using X-libs and evdev.
To work with Pynput inside a Docker container, it is necessary to install linux-headers. However, this needs to be done separately for each type of system. In the case of Debian-like systems, installing header files via apt is straightforward. However, for other systems like Arch-like ones, there are difficulties in adding linux-headers.
One possible solution is to copy the header files inside the container, but this doesn't seem to be the optimal solution.
The video stream from the camera is obtained using
opencv and passed to
MediaPipe, which then detects hand landmarks. These landmarks are then passed to a model that predicts the gesture. Based on the predicted gesture, a specific action is taken (mouse movement or mouse/keyboard button press).
The data was taken from the Kaggle dataset. From this dataset, 11 hand gesture classes were selected. Additionally, videos were recorded for 4 gestures for training purposes. In total, over 15,000 photos were collected, around 1,000 for each class.
Gesture photos were processed using
Mediapipe and saved as
Experiments were conducted with various models. The evaluation metric used was
accuracy, since the classes were balanced, but
f1 was also considered. The following models were used:
The code for training the models is located in the
mlflow was used to store the experiment results.
Optimal hyperparameters were tuned for each model using the
After training, SVM was chosen as the best model due to its performance.
|Support Vector Machine
Confusion matrix for SVM: 0. two_fingers_near
Test accuracy for the neural model: