Optimum Transformers
Founded 11 months ago

Accelerated NLP pipelines for fast inference on CPU and GPU

Optimum Transformers

Accelerated NLP pipelines for fast inference 

:rocket:

on CPU and GPU. Built with 

:hugs:

Transformers, Optimum and ONNX runtime.

GitHub stars

Disclaimer

This project is my inspiration of Huggingface Infinity 3. And first step done by Suraj Patil.

@huggingface’s pipeline API is awesome!

:star_struck:

, right? And onnxruntime is super fast !

:rocket:

. Wouldn’t it be great to combine these two?
– Tweet by Suraj Patil

It was under this slogan that I started doing this project!

And the main goal was to show myself to get into @huggingface team

:hugs:

How to use

Quick start

The usage is exactly the same as original pipelines, except minor improves:

from optimum_transformers import pipeline

pipe = pipeline("text-classification", use_onnx=True, optimize=True)
pipe("This restaurant is awesome")
# [{'label': 'POSITIVE', 'score': 0.9998743534088135}]
  • use_onnx- converts default model to ONNX graph
  • optimize - optimizes converted ONNX graph with Optimum 8

Optimum config

Read Optimum 8 documentation for more details

from optimum_transformers import pipeline
from optimum.onnxruntime import ORTConfig

ort_config = ORTConfig(quantization_approach="dynamic")
pipe = pipeline("text-classification", use_onnx=True, optimize=True, ort_config=ort_config)
pipe("This restaurant is awesome")
# [{'label': 'POSITIVE', 'score': 0.9998743534088135}]

Benchmark

With notebook

You can benchmark pipelines easier with benchmark_pipelines 3 notebook.

With own script

from optimum_transformers import Benchmark

task = "sentiment-analysis"
model_name = "philschmid/MiniLM-L6-H384-uncased-sst2"
num_tests = 100

benchmark = Benchmark(task, model_name)
results = benchmark(num_tests, plot=True)

Results

Note: These results were collected on my local machine. So if you have high performance machine to benchmark, please contact me

:hugs:

sentiment-analysis

Almost the same as in Inifinity launch video

:hugs:

AWS VM: g4dn.xlarge
GPU: NVIDIA T4
128 tokens
2.6 ms

Resulting plot

zero-shot-classification

With typeform/distilbert-base-uncased-mnli

Resulting plot

token-classification

Resulting plot
Resulting plot

More results are available in project repository: GitHub 13.



 

Cookies help us deliver our services. By using our services, you agree to our use of cookies.