An application for controlling the keyboard and mouse using gestures.
Founded 10 months ago

An application for controlling the keyboard and mouse using gestures. (GitHub)[]



An application for controlling the keyboard and mouse using gestures.


  1. Clone the repository
    git clone
  2. Install dependencies
    pip install -r requirements.txt
  3. Run the application
    If the camera window doesn't appear when launching the application, run with the -d or --device flag to switch the camera:
    python -d 1

Issue with Building the Application in Docker

During the development of the application, the idea arose to build it in Docker using PyInstaller. However, a problem arose when using the Pynput library, which interacts with the keyboard and mouse in Linux using X-libs and evdev.

To work with Pynput inside a Docker container, it is necessary to install linux-headers. However, this needs to be done separately for each type of system. In the case of Debian-like systems, installing header files via apt is straightforward. However, for other systems like Arch-like ones, there are difficulties in adding linux-headers.

One possible solution is to copy the header files inside the container, but this doesn't seem to be the optimal solution.

Working Principle

The video stream from the camera is obtained using opencv and passed to MediaPipe, which then detects hand landmarks. These landmarks are then passed to a model that predicts the gesture. Based on the predicted gesture, a specific action is taken (mouse movement or mouse/keyboard button press).



The data was taken from the Kaggle dataset. From this dataset, 11 hand gesture classes were selected. Additionally, videos were recorded for 4 gestures for training purposes. In total, over 15,000 photos were collected, around 1,000 for each class.

Gesture photos were processed using Mediapipe and saved as numpy arrays.

Model Selection

Experiments were conducted with various models. The evaluation metric used was accuracy, since the classes were balanced, but f1 was also considered. The following models were used:

The code for training the models is located in the experiments folder. mlflow was used to store the experiment results.

Optimal hyperparameters were tuned for each model using the optuna library.


After training, SVM was chosen as the best model due to its performance.

Logistic Regression0.9420.942
Support Vector Machine0.9910.99
Random Forest0.9690.969
Neural Network0.9780.978


Confusion matrix for SVM: img 0. two_fingers_near

  1. one ☝
  2. two ✌
  3. three
  4. four
  5. five
  6. ok 👌
  7. C
  8. heavy 🤟
  9. hang 🤙
  10. palm ✋
  11. L
  12. like 👍
  13. dislike 👎
  14. fist ✊

Test accuracy for the neural model: torch_train_acc


  1. Gesture Set
  2. Interaction with Miro

Cookies help us deliver our services. By using our services, you agree to our use of cookies.