Data Fest 2026 | Belgrade, May 24, Offline Day

Data Fest & Yandex — May 24, Belgrade

11:00 — Guest Registration & Welcome

12:00 — Opening — Kirill Vlasov (PO AI Studio, Yandex Cloud) and Daniil Tkachenko (Head of ML, Yandex Lavka)

Agentic LLM & Inference

Timing	Talk Title		Speaker	Company
12:10	Как безопасно выкатывать новые версии продуктовых AI-агентов через систему автометрик	Написать агента — не проблема. Проблема — выкатить v2, не сломав то, что работало в v1. На примере продуктового агента Яндекс AI для турецкого рынка разберём, как мы выстроили систему авто-метрик, которая позволяет итерировать агента быстро и безопасно: фиксировать регрессии до прода, сравнивать версии и осознанно принимать решение о релизе. Практический кейс для тех, кто уже за пределами «Hello, Agent».	Dmitry Korshunov, Team Lead ML Ecom	Yandex
12:40	Hacks and Defenses in Automatic Kernel Generation	За последний год автогенерация GPU/TPU-кернелов превратилась из академического упражнения в гонку: KernelBench, Sakana AI CUDA Engineer, Kevin-32B, METR — все показывают впечатляющие speedup'ы, но за громкими цифрами часто скрываются reward hacking и эксплойты оценочных пайплайнов. Самый известный кейс — "100x ускорение" от Sakana, оказавшееся обходом проверок корректности через memory aliasing и кеширование результатов. В докладе разберу таксономию хаков, которые LLM-агенты находят при генерации CUDA/Pallas/Mojo-кернелов: от тривиального обхода numerical tolerance до изощрённых атак на timing-измерения и эксплуатации дыр в test harness'ах. Покажу, какие защиты реально работают, а какие создают ложное чувство безопасности. Доклад будет полезен тем, кто строит agentic-пайплайны для генерации кода, занимается ML-инфраструктурой или отвечает за оценку code-моделей.	Egor Konovalov, ML Engineer	Stealth
13:10	Better LLM pre-training in NVFP4	The NVFP4 lower-precision format, supported in hardware by NVIDIA Blackwell GPUs, promises to allow end-to-end fully quantized pre-training of massive models such as LLMs. Yet, existing quantized training methods still sacrifice some of the representation capacity of this format in favor of more accurate unbiased quantized gradient estimation, losing noticeable accuracy relative to standard FP16 and FP8 training. In my talk, I will introduce the problem of low-precision training of LLMs, provide an overview of existing methods and present Quartet II – a novel method for NVFP4 pre-LLM pre-training that we designed. I'll show how Quartet II achieves SOTA accuracy recovery for NVFP4 and explain the extensive hardware support we provide in the form of CUDA kernels tuned for Blackwell GPUs. These kernels are readily available for people to try and integrate into their training pipelines to enable maximum compute efficiency on modern AI hardware.	Andrei Panferov	Institute of Science and Technology Austria

13:40-14:40 — Lunch Break 🍲

Agentic LLM & Practical ML

Timing	Talk Title		Speaker	Company
14:40	HGRPO: Hierarchical Grouped Reward Policy Optimization for Multi-Turn Conversational Agents	Training conversational agents for multi-turn dialogues with Reinforcement Learning presents a fundamental challenge: how to correctly assign credit to individual actions when the reward signal comes only at the end of a dialogue. We present HGRPO, a novel modification of GRPO that introduces hierarchical step grouping for multi-turn dialogue agents. We applied HGRPO to train a booking agent for restaurants and beauty salons, deployed in production at Yandex smart assistant Alice. Results show 8.0 percentage points improvement in agent truthfulness and 10.7% reduction in dialogue length.	Karina Romanova, Senior LLM Research Engineer	Yandex
15:10	Borealis — how to train an audio LLM for the price of a MacBook	Practical story on training an audio LLM with a limited budget using the example of Borealis.	Aleksandr Nikolić	Independent Researcher
15:40	Как решаем оптимизационные задачи Яндекс Лавки с помощью Uplift-моделей	Расскажем, как в Яндекс Лавке используем uplift-модели не только для оценки эффекта отдельных воздействий, но и для решения бизнес-оптимизаций: от персональных скидок и скидок на доставку до показа продуктовых подборок. Кратко разберём, как формулируется uplift-задача, какие метрики помогают оценивать модели и как из предсказанных эффектов собрать полноценную policy: кому, когда и какое воздействие дать. Отдельно обсудим практические приёмы для задач с ограничениями: лагранжиан для баланса целевой и ограничивающей метрик, uplift-деревья с критерием разбиения по trade-off между метриками, а также оценку качества как отдельных моделей, так и итоговых политик. На примерах из Лавки покажем, где такие подходы работают, какие ошибки встречаются чаще всего и как их избежать.	Vyacheslav Kostrov, ML Engineer	Yandex

16:10-16:40 — Break ☕

CV & GenCV

Timing

Talk Title

Speaker

Company

16:40

Поиск по архивам: как мы переходим к осознанному распознаванию текста

Распознавание архивных документов существует в Яндексе уже давно — мы расшифровываем сложные рукописные тексты, чтобы люди могли искать информацию о своих родственниках. Бум больших мультимодальных моделей позволил нам перейти от обычной расшифровки к выделению смысловой информации. Я расскажу про два важных майлстоуна в нашей работе — новой архитектуре распознавания текста и извлечении структур. Наши нововведения делают поиск по архивам более человечным, позволяя искать не слова среди текста, а человека среди людей!

Darya Vinogradova, Computer vision Team lead

Yandex

17:10

Real-time video generation: where we are and what comes next

Real-time video generation: where we are and what comes next

Over the past year, video generation has made a leap from research prototypes to the first real product experiences. But how close are we, really, to real-time generation?

In this talk, I’ll break down the current landscape of video generation: which architectures define the state of the art today, and how they fundamentally differ from one another. The main focus will be on the challenges that emerge when you try to push video generation toward real time — from compute constraints and memory bottlenecks to the architectural trade-offs that become unavoidable.

I’ll also cover practical solutions used in production systems, both at the model level (distillation, caching, scheduling) and at the inference level. I’ll conclude with my perspective on where the field is heading over the next 2–3 years: what will become possible, what will remain unsolved, and which bottlenecks the industry is still largely ignoring.

Andrey Filatov, Member of technical staff

KREA AI

17:40 — Closing — Kirill Vlasov (PO AI Studio, Yandex Cloud) and Daniil Tkachenko (Head of ML, Yandex Lavka)

17:40-21:00 — Afterparty 🎉

Our website uses cookies, including web analytics services. By using the website, you consent to the processing of personal data using cookies. You can find out more about the processing of personal data in the Privacy policy

Learn More