Ended 2 weeks ago

DataFest — Belgrade, May 31

FON, University of Belgrade


12:00 — Guest Registration & Welcome

12:45 — Opening Remarks · Salavat Gariffullin, ODS Serbia

12:50 — Welcome Address

Prof. Dragan Vukmirović, PhD (FON University)
From Big Data to AI-native: 7V, Synthetic Data and the New Role of Data Science in Industry

13:00 — Stage 1: Agents & LLMs · Stage 2: RecSys ML

14:40 — Coffee Break

15:10 — Stage 1: Robotics & CV · Stage 2: Banking Language Models

16:25 — Lunch Break

17:25 — Stage 1: Data Quality · Stage 2: Voice ML

19:05 — Closing Remarks · Salavat Gariffullin, ODS Serbia

19:15 — After-party 🎉



Networking Lounge · Library

Open for networking throughout the event, especially during breaks.



🏛 Stage 1 · Amphitheater 1

Agents & LLMs

TimeSpeakerCompanyTalkDescription
13:00Ivan BushmarinovPerplexityUser-Guided LLM Answer Quality EvaluationLeveraging thread-style user feedback and small trained models to evaluate frontier LLM answers and enable scalable benchmarking.
13:25Ksenija BlaževićLemana TechHow not to build agentic AI: 4 (very common) anti-patterns and what to do insteadCommon anti-patterns in agentic AI and how to replace them with leaner, cost-efficient architectures.
13:50Dmitrii KrasnovZencoderOrchestrating Coding ModelsComparison of sequential, parallel, and OSS orchestration for coding models and their impact on SWE-bench-like benchmarks.
14:15Michael DiskinHSE UniversityWhen Models Should Stay SilentMeasuring model uncertainty, calibrating confidence, and implementing rejection mechanisms for more reliable LLM systems in production.

14:40–15:10 — Coffee Break

Robotics & CV

TimeSpeakerCompanyTalkDescription
15:10Fedor KurdovYandexRL for Real-World Robot Motion PlanningHow RL (without imitation learning) was built from scratch and deployed for motion planning in Yandex's sidewalk delivery rovers.
15:35Dmitrii IunovidovLogicYieldMaking Industrial CV Fly on Edge CPUs: A Neuro-Symbolic Benchmark for Dense Instance SegmentationRunning industrial computer vision on edge CPUs in harsh factory conditions using inference optimization and neuro-symbolic methods.
16:00Aleksey PostnikovSber Robotics LabPhysical AI: Status and the Road AheadBroad overview of Physical AI: synthetic data, sim-to-real, RL over behavior cloning, and learning policies from human videos.

16:25–17:25 — Lunch Break

Data Quality

TimeSpeakerCompanyTalkDescription
17:25Oleg SekachevYandexAgent for Data Labelling. LLM with Hammer and RulerQuality data labelling — faster and cheaper than humans, simpler and more accurate than a bare LLM.
17:50Anastasiia MargolinaBanco PlataHow we (didn't) build an AutoEvalA story about evaluating AI when the answers are about real money — and how the autoeval we thought would be a single prompt turned into a methodology.
18:15Stefan HačkoFoursquareLLM-Powered Harmonization of 100M+ PlacesHow Foursquare uses LLMs and vector embeddings to clean, match, and unify massive third-party venue datasets at scale.
18:40Alexey Korotkov & Timofey GaraevMIPT AI InstituteSHARP: Span-level Hallucination Annotation for Reasoning PathsNew span-level dataset for hallucination detection in LLM reasoning paths and why it yields better downstream quality for PRM models.


🏛 Stage 2 · Amphitheater 2

RecSys ML

TimeSpeakerCompanyTalkDescription
13:00Alexander EroshenkoYandexLLM-Powered Item-to-Item Recs in LavkaPractical case of deploying a compact LLM (Gemma ~270M) for item-to-item recommendations of substitutes and complements.
13:25Vladimir KukushkinIndependent ResearcherBeyond Funnels: Advanced UX AnalyticsHow to study user behavior deeper than traditional funnels using advanced UX analytics tools and user journey analysis.
13:50Alexey VasilevSber AI LabSplitLight: RecSys Evaluation ToolkitOpen-source toolkit for analyzing datasets and split strategies in RecSys to make offline evaluation transparent and reproducible.
14:15Nikita SeverinIndependent ResearcherKnowledge Transfer from Pre-trained LLMs to Recommender ModelsEfficient knowledge transfer from pre-trained LLMs to recommender models without costly serving-time inference or architectural changes.

14:40–15:10 — Coffee Break

Banking Language Models

TimeSpeakerCompanyTalkDescription
15:10Boris TseitlinBanco PlataLearning from Unstructured Sequences in 2026Overview of self-supervised and foundation-model approaches to embeddings from transactions, events, and other unstructured sequences.
15:35Mikhail SysoevBanco PlataPV Models in Retail LendingPV models and approaches to optimizing product parameters in card-based fintech products.
16:00Victor BarbarichBanco PlataTransformers Replace Feature Engineering in ScoringMoving from manual feature engineering in credit scoring to transformers that learn directly from raw account and employment histories.

16:25–17:25 — Lunch Break

Voice ML

TimeSpeakerCompanyTalkDescription
17:25Ilya ShigabeevLangswap.appAI video dubbing with open source — How we’ve built speech-to-speech pipeline and what we’ve learned from itChallenges in AI dubbing: different text lengths after translation, preserving pauses from the source video, accent issues, and dependency hell — with a walkthrough of the open-source pipeline.
17:50Pavel MazaevYandexDevice-Directed Speech Detection for AliceProduction system for detecting speech directed at a smart device (Yandex Alice) to enable natural dialogue without a constant wake word.
18:15Fedor KonovalenkoMIL Team (MIPT)From Model Compression to Local Inference PlatformHow a model compression tool evolved into a local GenAI inference platform with OpenAI-compatible API, multi-engine support, and observability.
18:40Pavel GuliaevIndependent ResearcherVideo2Text: Industry State & Practical ChoicesState of the Video2Text industry — what works, current limitations, and how to pick and adapt solutions for production load and budget.

Our website uses cookies, including web analytics services. By using the website, you consent to the processing of personal data using cookies. You can find out more about the processing of personal data in the Privacy policy