Optimum Transformers
Founded 21 months ago

Accelerated NLP pipelines for fast inference on CPU and GPU

Optimum Transformers

Accelerated NLP pipelines for fast inference 


on CPU and GPU. Built with 


Transformers, Optimum and ONNX runtime.

GitHub stars


This project is my inspiration of Huggingface Infinity 3. And first step done by Suraj Patil.

@huggingface’s pipeline API is awesome!


, right? And onnxruntime is super fast !


. Wouldn’t it be great to combine these two?
– Tweet by Suraj Patil

It was under this slogan that I started doing this project!

And the main goal was to show myself to get into @huggingface team


How to use

Quick start

The usage is exactly the same as original pipelines, except minor improves:

from optimum_transformers import pipeline

pipe = pipeline("text-classification", use_onnx=True, optimize=True)
pipe("This restaurant is awesome")
# [{'label': 'POSITIVE', 'score': 0.9998743534088135}]
  • use_onnx- converts default model to ONNX graph
  • optimize - optimizes converted ONNX graph with Optimum 8

Optimum config

Read Optimum 8 documentation for more details

from optimum_transformers import pipeline
from optimum.onnxruntime import ORTConfig

ort_config = ORTConfig(quantization_approach="dynamic")
pipe = pipeline("text-classification", use_onnx=True, optimize=True, ort_config=ort_config)
pipe("This restaurant is awesome")
# [{'label': 'POSITIVE', 'score': 0.9998743534088135}]


With notebook

You can benchmark pipelines easier with benchmark_pipelines 3 notebook.

With own script

from optimum_transformers import Benchmark

task = "sentiment-analysis"
model_name = "philschmid/MiniLM-L6-H384-uncased-sst2"
num_tests = 100

benchmark = Benchmark(task, model_name)
results = benchmark(num_tests, plot=True)


Note: These results were collected on my local machine. So if you have high performance machine to benchmark, please contact me



Almost the same as in Inifinity launch video


AWS VM: g4dn.xlarge
128 tokens
2.6 ms

Resulting plot


With typeform/distilbert-base-uncased-mnli

Resulting plot


Resulting plot
Resulting plot

More results are available in project repository: GitHub 13.


Cookies help us deliver our services. By using our services, you agree to our use of cookies.