Data Fest 2026 | Belgrade, May 31, Offline Day

Ended 3 months ago

DataFest — Belgrade, May 31

FON, University of Belgrade

12:00 — Guest Registration & Welcome

12:45 — Opening Remarks · Salavat Gariffullin, ODS Serbia

12:50 — Welcome Address

Prof. Dragan Vukmirović, PhD (FON University)
From Big Data to AI-native: 7V, Synthetic Data and the New Role of Data Science in Industry

13:00 — Stage 1: Agents & LLMs · Stage 2: RecSys ML

14:40 — Coffee Break

15:10 — Stage 1: Robotics & CV · Stage 2: Banking Language Models

16:25 — Lunch Break

17:25 — Stage 1: Data Quality · Stage 2: Voice ML

19:05 — Closing Remarks · Salavat Gariffullin, ODS Serbia

19:15 — After-party 🎉

Networking Lounge · Library

Open for networking throughout the event, especially during breaks.

🏛 Stage 1 · Amphitheater 1

Agents & LLMs

Time	Speaker	Company	Talk	Description
13:00	Ivan Bushmarinov	Perplexity	User-Guided LLM Answer Quality Evaluation	Leveraging thread-style user feedback and small trained models to evaluate frontier LLM answers and enable scalable benchmarking.
13:25	Ksenija Blažević	Lemana Tech	How not to build agentic AI: 4 (very common) anti-patterns and what to do instead	Common anti-patterns in agentic AI and how to replace them with leaner, cost-efficient architectures.
13:50	Dmitrii Krasnov	Zencoder	Orchestrating Coding Models	Comparison of sequential, parallel, and OSS orchestration for coding models and their impact on SWE-bench-like benchmarks.
14:15	Michael Diskin	HSE University	When Models Should Stay Silent	Measuring model uncertainty, calibrating confidence, and implementing rejection mechanisms for more reliable LLM systems in production.

14:40–15:10 — Coffee Break

Robotics & CV

Time	Speaker	Company	Talk	Description
15:10	Fedor Kurdov	Yandex	RL for Real-World Robot Motion Planning	How RL (without imitation learning) was built from scratch and deployed for motion planning in Yandex's sidewalk delivery rovers.
15:35	Dmitrii Iunovidov	LogicYield	Making Industrial CV Fly on Edge CPUs: A Neuro-Symbolic Benchmark for Dense Instance Segmentation	Running industrial computer vision on edge CPUs in harsh factory conditions using inference optimization and neuro-symbolic methods.
16:00	Aleksey Postnikov	Sber Robotics Lab	Physical AI: Status and the Road Ahead	Broad overview of Physical AI: synthetic data, sim-to-real, RL over behavior cloning, and learning policies from human videos.

16:25–17:25 — Lunch Break

Data Quality

Time	Speaker	Company	Talk	Description
17:25	Oleg Sekachev	Yandex	Agent for Data Labelling. LLM with Hammer and Ruler	Quality data labelling — faster and cheaper than humans, simpler and more accurate than a bare LLM.
17:50	Anastasiia Margolina	Banco Plata	How we (didn't) build an AutoEval	A story about evaluating AI when the answers are about real money — and how the autoeval we thought would be a single prompt turned into a methodology.
18:15	Stefan Hačko	Foursquare	LLM-Powered Harmonization of 100M+ Places	How Foursquare uses LLMs and vector embeddings to clean, match, and unify massive third-party venue datasets at scale.
18:40	Alexey Korotkov & Timofey Garaev	MIPT AI Institute	SHARP: Span-level Hallucination Annotation for Reasoning Paths	New span-level dataset for hallucination detection in LLM reasoning paths and why it yields better downstream quality for PRM models.

🏛 Stage 2 · Amphitheater 2

RecSys ML

Time	Speaker	Company	Talk	Description
13:00	Alexander Eroshenko	Yandex	LLM-Powered Item-to-Item Recs in Lavka	Practical case of deploying a compact LLM (Gemma ~270M) for item-to-item recommendations of substitutes and complements.
13:25	Vladimir Kukushkin	Independent Researcher	Beyond Funnels: Advanced UX Analytics	How to study user behavior deeper than traditional funnels using advanced UX analytics tools and user journey analysis.
13:50	Alexey Vasilev	Sber AI Lab	SplitLight: RecSys Evaluation Toolkit	Open-source toolkit for analyzing datasets and split strategies in RecSys to make offline evaluation transparent and reproducible.
14:15	Nikita Severin	Independent Researcher	Knowledge Transfer from Pre-trained LLMs to Recommender Models	Efficient knowledge transfer from pre-trained LLMs to recommender models without costly serving-time inference or architectural changes.

14:40–15:10 — Coffee Break

Banking Language Models

Time	Speaker	Company	Talk	Description
15:10	Boris Tseitlin	Banco Plata	Learning from Unstructured Sequences in 2026	Overview of self-supervised and foundation-model approaches to embeddings from transactions, events, and other unstructured sequences.
15:35	Mikhail Sysoev	Banco Plata	PV Models in Retail Lending	PV models and approaches to optimizing product parameters in card-based fintech products.
16:00	Victor Barbarich	Banco Plata	Transformers Replace Feature Engineering in Scoring	Moving from manual feature engineering in credit scoring to transformers that learn directly from raw account and employment histories.

16:25–17:25 — Lunch Break

Voice ML

Time	Speaker	Company	Talk	Description
17:25	Ilya Shigabeev	Langswap.app	AI video dubbing with open source — How we’ve built speech-to-speech pipeline and what we’ve learned from it	Challenges in AI dubbing: different text lengths after translation, preserving pauses from the source video, accent issues, and dependency hell — with a walkthrough of the open-source pipeline.
17:50	Pavel Mazaev	Yandex	Device-Directed Speech Detection for Alice	Production system for detecting speech directed at a smart device (Yandex Alice) to enable natural dialogue without a constant wake word.
18:15	Fedor Konovalenko	MIL Team (MIPT)	From Model Compression to Local Inference Platform	How a model compression tool evolved into a local GenAI inference platform with OpenAI-compatible API, multi-engine support, and observability.
18:40	Pavel Guliaev	Independent Researcher	Video2Text: Industry State & Practical Choices	State of the Video2Text industry — what works, current limitations, and how to pick and adapt solutions for production load and budget.

Our website uses cookies, including web analytics services. By using the website, you consent to the processing of personal data using cookies. You can find out more about the processing of personal data in the Privacy policy

Learn More