These are, as of today (22 March 2026), the confirmed talks and workshops at PyCon Austria 2026.
The conference schedule will follow soon.
Linuxwochen Eisenstadt will take part on 18 and 19 April 2026 at the University of Applied Sciences Eisenstadt, our conference venue.
Changes may occur.
| name / photo | title / description | links / bio |
|---|---|---|
![]() Mingxuan Zhao |
Workshop: Learn to Unlock Document Intelligence with Open-Source AIclick to see descriptionMost organizational knowledge is still locked inside complex documents, making it difficult to extract and use the information effectively. Traditional tools often fail when working with real-world document formats, particularly PDFs. Tables lose their structure, figures get separated from captions, and multi-column layouts become unreadable text. These failures make it difficult to bring AI to document-heavy workflows. This workshop will give you hands on experience with Docling, an open source Python library that takes a different approach, using deep learning models to parse documents the way humans read them. It preserves hierarchy, extracts structured data through a consistent API, and supports 15+ file formats out of the box. All of Docling is MIT-licensed, enabling fully local execution, allowing you to keep sensitive data on-premise while delivering low-latency processing and ingestion. You'll be building a complete document intelligence pipeline from the ground up. We'll work through three progressive modules: first, converting documents and exploring Docling's enrichment features like table detection and image classification; second, chunking strategies that preserve document semantics for retrieval; and finally, building on all our other components using Docling, we will build a multimodal RAG pipeline with visual grounding, creating an application that can cite the exact page and location where it found an answer. No prior experience with Docling is required. Colab notebooks with hosted model endpoints will be provided, so you can follow along with just a browser. Attendees who prefer local execution should have Jupyter Notebook installed and the ability to download models from Hugging Face. Bring your own documents to experiment with, or use the samples provided. |
click here to see bioMing Zhao is an open source developer and Developer Advocate at IBM Research, where he helps IBM leverage open technologies while building impactful tools and growing vibrant open-source communities. He’s passionate about making open tech accessible to all and ensuring developers have the tools they need to succeed in the rapidly developing AI space. Ming now leads community efforts around Docling, IBM’s fastest-growing open source project, recently welcomed into the LF AI & Data Foundation. |
![]() Shahar Shporer |
Let Developers Write Python, Not YAMLclick to see descriptionGitOps promises simplicity, but for many Python developers, infrastructure still lives in a different world : full of YAML, cloud jargon, and tools they never signed up to learn. At scale, this disconnect slows teams down and turns platform engineers into full-time translators. In this talk, I’ll show how building internal Python SDKs transformed that relationship. Instead of asking developers to write YAML or understand Terraform, we exposed infrastructure and GitOps workflows as idiomatic Python libraries -with clear APIs, semantic versioning, type hints, and CI/CD integration. Developers could provision resources and deploy services using tools and patterns they already knew, while platform teams kept control, safety, and GitOps guarantees. You’ll learn how internal SDKs can: - Abstract infrastructure complexity without hiding important behavior - Feel like “normal Python code,” not infrastructure tooling - Integrate cleanly into GitHub Actions and existing developer workflows - Improve developer confidence, velocity, and overall experience This talk is for Python developers who want better ways to interact with infrastructure - and for platform engineers who want GitOps adoption without forcing everyone to “learn YAML first.” If you’ve ever heard “I don’t speak YAML”, this talk will show how Python SDKs can become the missing bridge between GitOps and real-world developer experience. |
click here to see bioShahar Shporer is a Platform architect with experience building secure, scalable infrastructure and developer-friendly CI/CD workflows across multi-cloud environments. She brings expertise in Python, Infrastructure as Code, and test-driven DevOps practices and automated solutions. Shahar thrives on translating complex engineering challenges into elegant, developer-friendly DevOps solutions that foster productivity and organizational growth. Outside the world of pipelines and platforms, Shahar is also a certified paramedical tattoo artist, creating 3D areola reconstruction tattoos and helping breast cancer survivors reclaim their bodies with dignity and art. |
![]() Marcel-Jan Krijgsman |
Building a data lakehouse in the European cloudclick to see descriptionIt's 2026 and all of a sudden your regular solution to build a data pipeline in Azure or AWS seem to have gone out of favour pretty fast. We want to store our data outside of the reach of the next autocrat. But what are the alternatives? In this session we'll discuss how you can build a data lakehouse in the European cloud. And we want more: we want PySpark, notebooks and data visualization. Is that all possible? For this data lakehouse solution we start with Kubernetes and object storage. You'll be surprised how many European cloud providers offer these products. Now we'll use Nessie as catalog and Trino as query engine. With Iceberg, our open table format, we can already create our first table. Next we'll run Jupyter Hub for shared notebooks. And now we can cooperate on writing PySpark code. Great, but can we still use (local) PowerBI for data visualisation? Yes, but that actually turns out a bit harder and for some reason expensive. We'll look at the alternatives for that. This presentation is also suitable for visitors who are not data engineers or who have little knowledge of Kubernetes. |
click here to see bioMarcel-Jan Krijgsman is a senior data engineer with 25 years experience in data. Learning Python when switching his career to data engineering, he used it to plot locations of cycling videos on a map, to land rockets in a computer game and automatically categorise space and astronomy news. |
Matúš Ferech |
Front-end for Pythonistasclick to see descriptionIn this talk, I will introduce htmx, a tiny library that makes server-rendered websites feel dynamic without building a JavaScript-heavy front end. It plays nicely with Python frameworks like Flask, Django, and others that use server-side rendering. If you want to add interactive elements to your website without rewriting your app to a front-end framework, this library might be just for you! |
click here to see bioSoftware engineer interested in systems programming, security, and privacy |
![]() Konrad Gawda |
Logging module adventuresclick to see descriptionLogging module seems a little bit odd. If you would like to understand its logic - join me on my journey into the depths of Python logging. I will share learnings from my own adventure, driven by curiosity and need to add context to long running tasks' logs. |
click here to see bioCloud Evangelist, Python developer and trainer. Host of "Porozmawiajmy o chmurze" videocast. Author of patents (in Orange R&D), experienced with Telco Cloud deployment and Public IaaS Cloud automation. Linux and Open Source believer. |
![]() Adrián Raso |
Reproducing Experiments in Python: What Went Wrong?click to see descriptionPython is one of the most prominent languages in the modern scientific computing stack. The main reason for its relevance in scientific workflows is its accessibility: Python has an extremely gentle learning curve and often presents syntax and semantics that closely resemble the formulation of algorithms, making code easier to read and reason about. In addition, Python offers one of the largest ecosystems of scientific resources available today, ranging from numerical methods libraries such as NumPy and SciPy, to data formats and processing tools like PyArrow or Polars, and, more recently, full deep learning ecosystems such as PyTorch and TensorFlow. However, despite its central role in scientific practice, the experience of using Python in research settings is not particularly friendly to the scientific method. Many factors in a Python project can alter results, undermine established methods, or even render them unusable, either over (sometimes short) time, as new versions of libraries and dependencies are released, or due to inherently fragile design choices in the surrounding ecosystem. This talk presents a short historical tour of Python, focusing on a series of episodes that have led the language to require such careful supervision by scientists. We will revisit its origins and its attempts to replace earlier scientific languages, examine how its reliance on external numerical libraries shaped its behavior, and explore how operating system policies, parallelism, and hardware acceleration introduced additional sources of variability. By looking at concrete examples and real-world anecdotes, the talk shows that many reproducibility issues in Python do not arise from user mistakes, but from historical trade-offs and system-level decisions that are often invisible to practitioners. Stories around numerical nondeterminism, versioning, BLAS compatibility issues, or dependency resolution problems, will be narrated and presented in a hands-on experience. The goal is to understand the nature of these difficulties, and provide context and mental models that help scientists get a superior grasp of the limits and workarounds of reproducibility in Python workflows. |
click here to see bioI'm a Data Science MSc student at TU Wien with background in Mathematics and Natural Language Processing. Currently, my main interests are information theory, statistics, ML and open-source development. I'm a contributor to the SciPy library, particularly to the statistics module, and I interact with the research community through venues like the International Conference of Statistics and Data Science (ICSDS). |
![]() Artem Sentsov |
The AI Blind Spot: Why Your Vector Search Needs Classic Python Algorithmsclick to see descriptionIn 2026, it is tempting to think Large Language Models and Vector Databases have completely solved text analytics. Just embed your text, run a cosine similarity check, and you’re done, right? Not quite. While vectors are incredible at understanding meaning, they have a massive blind spot: they are terrible at exactness. If you need to match specific product SKUs, catch slight typos in usernames, or prevent an LLM from hallucinating an ID number, modern AI will often fail where a 60-year-old math algorithm succeeds. In this beginner-friendly talk, we will explore this "Vector Blind Spot." You will learn why classic string metrics like Levenshtein and Jaro-Winkler are more critical than ever, how to implement them using blazing-fast, modern Python libraries like RapidFuzz. |
click here to see bioData Science professional with 8+ years of experience applying graph databases to solve complex business challenges. As Co-Founder and CTO of ClearPic.ai, I develop tools that map business relationships in Central Asia and the Caspian region, helping clients uncover hidden connections and compliance risks. My background combines data science with practical financial intelligence - from Deloitte and PwC to leading R&D at Urus Advisory where I specialized in high-risk market analysis. My passion is making complex data speak through network visualization. |
![]() Stefanie Molin |
(Pre-)Commit to Better Codeclick to see description# Abstract Maintaining code quality can be challenging, no matter the size of your project or number of contributors. Different team members may have different opinions on code styling and preferences for code structure, while solo contributors might find themselves spending a considerable amount of time making sure the code conforms to accepted conventions. However, manually inspecting and fixing issues in files is both tedious and error-prone. As such, computers are much more suited to this task than humans. Pre-commit hooks are a great way to have a computer handle this for you. Pre-commit hooks are code checks that run whenever you attempt to commit your changes with Git. They can detect and, in some cases, automatically correct code-quality issues *before* they make it to your codebase. In this tutorial, you will learn how to install and configure pre-commit hooks for your repository to ensure that only code that passes your checks makes it into your codebase. We will also explore how to build custom pre-commit hooks for novel use cases. # Description ## Section 1: Setting Up Pre-Commit Hooks After laying the foundation with an overview of Git hooks, we will discuss the use cases for hooks at the pre-commit stage (called pre-commit hooks), as well as a high-level explanation of how to set them up without any external tools. We will then introduce the `pre-commit` tool and disambiguate it from pre-commit hooks, before commencing a detailed walkthrough of the pre-commit hooks setup process when using `pre-commit`. ## Section 2: Creating a Pre-Commit Hook While there are a lot of pre-made hooks in existence, sometimes they aren't sufficient for the task at hand. In this section, we will walk step-by-step through the process of creating and distributing a custom hook. After wiring everything up, we will discuss best practices for sharing, documenting, testing, and maintaining the codebase. # Audience This tutorial is for anyone with intermediate knowledge of Python and basic knowledge of `git`. You must be comfortable writing Python code and working with `git` on the command line and using basic commands (`git clone`, `git add`, `git status`, `git commit`, `git push`). Attendees should have Python and `git` installed on their computers, as well as a text editor for writing code (e.g., Visual Studio Code). # Prerequisites - Comfort writing Python code and working with Git on the command line using basic commands (e.g., clone, status, diff, add, commit, and push) - Have Python and Git installed on your computers, as well as a text editor for writing code (e.g., Visual Studio Code) |
click here to see bio[Stefanie Molin](https://stefaniemolin.com) is a software engineer at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also a core developer of [numpydoc](https://github.com/numpy/numpydoc) and the author of “[Hands-On Data Analysis with Pandas: A Python data science handbook for data collection, wrangling, analysis, and visualization](https://www.amazon.com/Hands-Data-Analysis-Pandas-visualization/dp/1800563450),” which is currently in its second edition and has been translated into Korean and Chinese. She holds a bachelor’s of science degree in operations research from Columbia University's Fu Foundation School of Engineering and Applied Science, as well as a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers. |
![]() Haim Michael |
Python Decorators: From Syntax Sugar to Production-Grade Design Toolclick to see descriptionDecorators are one of Python’s most powerful and often misunderstood features. Beyond simple logging examples, decorators enable clean separation of concerns, cross-cutting behavior injection, and framework-level extensibility. In this talk, we will build a precise mental model of how decorators work at runtime. We will move from basic function decorators to parameterized decorators, class decorators, and decorator factories. Real production use cases will be demonstrated. We will conclude with an overview of the best practices to consider when developing new decorators. |
click here to see bioHaim Michael is a software development trainer, entrepreneur, and lecturer with nearly 30 years of experience. He founded life michael (lifemichael.com), delivering professional training in Java, Python, JavaScript, Scala, Kotlin, and more. Haim has lectured at leading universities, including Bar-Ilan, HIT, Shenkar, and Technion, and has trained developers at top tech companies. |
![]() Haim Michael |
The Combinator Pattern: Elegant Composition for Modern Python Developersclick to see descriptionFunctional programming encourages us to think in terms of composition - building complex logic from small, pure building blocks. In this talk, we’ll explore how the Combinator Design Pattern helps us achieve exactly that. Using Python, we’ll go beyond theoretical definitions and implement combinators step by step — starting from simple primitives and evolving them into elegant, reusable, and type-safe abstractions. Along the way, we’ll analyze how combinators enhance readability, maintainability, and alignment with the Single Responsibility Principle, while offering a modern alternative to conditional logic and inheritance-based designs. By the end, participants will understand not only how to implement combinators but also how to think compositionally in Python. The key takeaways are: + Understand the Combinator Pattern: Learn its conceptual foundation and its role in functional programming and software design. + Practical Implementation in Python: See how to build combinators step by step, leveraging lambdas, higher-order functions, and immutability. + Achieve Clean and Reusable Code: Discover how combinators promote clarity, separation of concerns, and compliance with SOLID principles. + Adopt a Compositional Mindset: Leave with concrete insights on writing expressive, declarative Python code that scales elegantly across real-world projects. |
click here to see bioHaim Michael is a software development trainer, entrepreneur, and lecturer with nearly 30 years of experience. He founded life michael (lifemichael.com), delivering professional training in Java, Python, JavaScript, Scala, Kotlin, and more. Haim has lectured at leading universities, including Bar-Ilan, HIT, Shenkar, and Technion, and has trained developers at top tech companies. |
![]() Manuel Alejandro Ledezma Falcon |
How to Automate Tests for LLMs That Never Answer the Same Way Twiceclick to see descriptionLarge Language Models rarely give the same answer twice, even when the meaning is exactly the same. In our case, this became a real production problem: automated tests kept failing, not because the model was wrong, but because it expressed the same idea using different words, synonyms, or sentence structures. Traditional assertions are built for deterministic systems. LLMs are not. The session focuses on practical lessons learned from real-world usage, how to test LLMs that paraphrase by design, and how semantic evaluation can turn flaky tests into reliable signals. |
click here to see bioManuel Ledezma, known in the tech community as Tester Testarudo, is a software testing and automation specialist with a strong commitment to delivering high-quality and reliable software. Over the past years, he has focused on mastering QA practices and automation strategies, working in agile and fast-paced environments. He has contributed to leading companies such as Mediktor, AXA, Telecom Argentina, Newfold, and Mojo Marketplace, where he implemented scalable testing solutions that improved product stability and user experience. Manuel currently serves as the QA Automation Lead at Mediktor in Barcelona, Spain, where he leads automation initiatives to ensure robust and impactful digital products. Beyond his professional work, Manuel empowers the QA community through Tester Testarudo, his educational project dedicated to helping newcomers learn testing in a clear, practical, and accessible way. |
![]() John Rooney |
How Python Powers Data Extraction: Scrapy in Productionclick to see descriptionEveryone's written a scraper but fewer people have kept one running reliably for months. This talk bridges that gap, taking you from "it works on my machine" to a production data extraction system built with Scrapy, Python's most battle-tested scraping framework. We'll cover four parts of running Scrapy in production: scheduling your spiders reliably, monitoring for failures before your data pipeline goes silent, scaling up, and wiring your output into a real data pipeline. If you've used Python and are curious how serious data extraction systems are built, this talk is for you. What attendees will take away: A mental model for thinking about scrapers as production services An overview of scheduling options (cron, Scrapyd, cloud schedulers) How to detect silent failures and slow spiders before they become a problem Where scraped data goes next — storage, pipelines, and downstream use, including powering RAG and AI systems |
click here to see bioJohn is a self taught Python developer and web scraping professional, who has been sharing data extraction content and help for the last 6 years via his own YouTube channel and now at Zyte. |
![]() Albert Dorador |
The "Flicker Effect": Why Your Model Audits Are Lying to Youclick to see descriptionHave you ever estimated feature importance in scikit-learn, changed the random_state, and watched your "Top 5" features swap places? This is the "Flicker Effect." For most Python developers, "shuffling" data (Permutation Importance) is the industry standard for explaining models. But in high-stakes environments like banking or healthcare, stochastic results are a liability. If you can’t get the same answer twice, can you really trust the audit? In this talk, we move beyond "random shuffling" toward Deterministic Model Auditing. We will explore: - A Beginner-Friendly Introduction to the Math of Stability: How a "single optimal permutation" makes model explanations 100% reproducible and 30x faster. - The Proxy Problem: How models "sneak in" biased data (like gender or race) through proxy variables, and how to detect this "signal leakage" using Systemic Variable Importance (SVI). - Forensic-Grade AI: How to move from "black-box" guesses to audits that hold up under regulatory scrutiny. Whether you are a data scientist building models or a developer curious about AI fairness, you will leave with a new framework for making your Python models truly accountable. |
click here to see bioAlbert Dorador is an Adjunct Professor of Statistics (BarcelonaTech) and Mathematics (Pompeu Fabra). He holds a PhD in Statistics from the University of Wisconsin–Madison and previously served at the European Central Bank, specializing in financial risk management and algorithmic auditing. Albert is the creator of the TRUST and Renet algorithms among others, focusing on the intersection of high-performance optimization and auditable, "human-scale" machine learning. His work centers on solving the "Interpretability Gap" in high-stakes regulatory environments, moving the industry toward deterministic and forensic-grade AI transparency. |
![]() Gabor Szarnyas |
Why would you "import duckdb" in your Python project?click to see descriptionDuckDB is an in-process database that can be imported as a library in all popular programming languages. Of course, that includes Python too – with about 3/4 of DuckDB's user base importing its Python client. But why would you use a database inside your Python process? First, DuckDB brings all the benefits of databases, including persistent storage, query optimization, and transaction handling, all without the hassle of setting up a database server. Second, DuckDB's Python client can seamlessly interact with other libraries such as Pandas, Polars, NumPy, and notebooks. Its Arrow-based deep integrations and Pythonic API allow you to gradually include DuckDB in a Python project, ranging from eliminating performance choke points to performing your entire workload in DuckDB. In this talk, I give a brief overview of DuckDB and demonstrate how you can use it to modernize your Python codebase. |
click here to see bioGábor Szárnyas is Developer Relations Advocate at DuckDB Labs. He obtained his PhD in software engineering in 2019 and spent 3 years as a post-doctoral researcher at CWI in Amsterdam, working on graph data management techniques. |
![]() Tabish Mazhari |
Stop Guessing: Finding and Fixing Python Performance Bottlenecksclick to see descriptionPython applications often become slow for reasons that are not immediately obvious. Developers frequently rely on intuition or trial-and-error when optimizing performance, which can lead to wasted effort and ineffective solutions. In this talk, we will explore a practical workflow for identifying and fixing real performance bottlenecks in Python services. Using a live demo of a deliberately inefficient recommendation API built with Python and FastAPI, we will investigate common performance problems such as inefficient algorithms, excessive database queries, blocking I/O, and missing caching strategies. Through profiling tools such as cProfile and py-spy, we will identify the true bottlenecks and apply targeted optimizations including algorithm improvements, query batching, caching with Redis, and asynchronous concurrency. By the end of the session, attendees will learn a systematic approach to diagnosing and improving the performance of Python applications, moving from guesswork to data-driven optimization. |
click here to see bioTabish Mazhari is a Senior Software Engineer at Red Hat with nine years of experience building backend systems and scalable developer platforms. His work focuses on designing workflows, building intelligent agents, and improving system reliability. His interests include performance optimization, automation, and practical engineering approaches that help teams build faster and more reliable Python services. |
![]() Stefan Trenkwalder |
AI-driven Software Engineering: TDD-Guardrails for the Age of Vibe Programmingclick to see descriptionAI coding tools are transforming how engineering and data science teams work — but speed without a structure creates technical debt, regressions, and code that's hard to trust. This workshop presents an alternative: a TDD-based guardrail framework for AI-assisted development that provides teams with a practical workflow to maintain code quality and reliability. Participants will learn to write tests that define intent before prompting, use the red-green-refactor loop to guide and validate AI output, and catch errors early. You'll leave with a repeatable approach that makes AI a reliable collaborator — whether you're building data pipelines, APIs, or analytical models. |
click here to see bioDr Stefan Trenkwalder is a Senior Software Engineer and advocate for software craftsmanship, with 15+ years of experience shipping production Python across fintech, automotive, and embedded systems. Stefan has introduced TDD, Extreme Programming, and trunk-based development into teams that had none, turning ad-hoc codebases into systems that are measurably more reliable and easier to maintain. He believes that good engineering practices — not just good intentions — are what separate software that lasts from software that doesn't. He holds a PhD in Robotics from the University of Sheffield and has taught software development at university level. He now brings that same rigour to hands-on workshops for working developers. |
![]() Florian Haas |
Open edX: the "other" open source LMSclick to see descriptionIn Europe, most people think of Moodle when they hear think about open source learning management systems. However, there's another! Open edX has been a solid, Python-based LMS for more than a decade, and it has an interesting past, present, and future. Having worked with up Open edX since 2015, I help run multiple Open edX platforms, develop courseware on Open edX, and am an active contributing community member. This talk explains Open edX, its architecture, and its community. |
click here to see bioI run Education and Documentation at Cleura, a Swedish cloud service provider. I am an active member of the Open edX, Ceph, and OpenStack communities. |
![]() Linda Kolb |
Ready, set, publish - Write your first Python Packageclick to see descriptionThis session walks through the full journey of creating your first Python package using Poetry - from project setup to publishing. We will explore how to write and structure tests, use pre‑commit hooks to automatically format and lint your code, and run everything inside reproducible Dev Containers. |
click here to see bioLinda has several years of experience using Python across automation, data science and modern tooling. If she is not busy building data flows, she is flowing on the yoga mat. |
![]() Michael Seifert |
Making sense of concurrency in Python 3.14click to see descriptionAsync/await, threads, subinterpreters, and multiprocessing—Do you know when to use which? Python 3.14 made subinterpreters available in the standard library and marked free-threaded Python as officially supported. This gives us a wider choice of concurrency mechanisms—but also more tradeoffs to consider. This talk develops a mental model in which the differences between Python’s concurrency mechanisms become apparent. Attendees will learn how to reason about async/await, threads, subinterpreters, and multiprocessing. They will be able to assess which approach to pick for a given problem, and how to combine concurrency models effectively in Python applications. |
click here to see bioMichael is a trainer and consulting software engineer who helps product teams develop Python software in the cloud. He enjoys deleting code more than writing it and is constantly looking for new ways to improve developer experience and the maintainability of software. Michael has been enthusiastic for free and open-source software since his teenage years and published his first project in 2006. Nowadays, he maintains the pytest-asyncio library. In his free time, Michael dances Shuffle or struggles with a hardware project. |
![]() Michael Seifert |
Code organization for non-engineersclick to see descriptionHave you ever opened a piece of code that seems to break just by looking at it—and noticed that your coworker wrote it? You don’t want to be *that* person. While tangled, hard-to-maintain code can emerge for many reasons, it should never be by accident. In this hands-on workshop, you will learn how to make code easier to maintain and to evolve. We will gradually refactor a messy Python application into a well-organized, testable software. You will develop a mental model for organizing code effectively and understand how its structure impacts code quality. Ultimately, this will inform future decisions on design and code organization. This workshop is specifically designed for people who don't identify as software engineers or don't perform typical software engineering tasks as part of their daily work. Participants should be familiar with basic Python programming and the concept of automated (unit) tests. |
click here to see bioMichael is a trainer and consulting software engineer who helps product teams develop Python software in the cloud. He enjoys deleting code more than writing it and is constantly looking for new ways to improve developer experience and the maintainability of software. Michael has been enthusiastic for free and open-source software since his teenage years and published his first project in 2006. Nowadays, he maintains the pytest-asyncio library. In his free time, Michael dances Shuffle or struggles with a hardware project. |
Johannes Werner |
Data validation with pointblankclick to see descriptionData validation comprises a crucial step in each data-centric project. This part enables the data scientist to understand which aspects to focus on during data cleaning. Furthermore, building and executing machine learning models as well as performing subsequent data analysis on clean data substantially improves the results. This workshop will provide some guidance to the participants to run data validation in Python with pointblank. Essential validation patterns are demonstrated while teaching best practices with regards to coding, configuration, environments and versioning in mind. Data validation is demonstrated for both notebook environments and CLI applications, for instance when workflow management tools are used for production-ready code. The participants should gain experience with data validation and feel confident integrating this step in their daily data analysis. This workshop is addressed to intermediate Python developers and data scientists. A general understanding of Python fundamentals is expected. A general understanding of data science workflows, e.g. as outlined by Hadley Wickham would be helpful. Version control, virtual environments, workflow management and a general understanding of data frame manipulation is beneficial as well. |
click here to see bioI received my PhD in bioinformatics in 2014 from the Max Planck Institute of Microbiology, Bremen and spent around 10 years at universities and research institutes in Germany focusing on microbiome analysis, cancer research, research data management and cloud infrastructure. Since four years, I am focusing on consulting and bioinformatic and biostatistical data analysis in early clinical trials. |
![]() Fabian Schindler |
Agents — What Do They Do?click to see descriptionAgents — what do they do? No, really, what do they do? Your agent fails mid-tool-call, returns garbage, burns your token budget — and your APM dashboard says everything's fine. Our users kept hitting this wall, so at Sentry we decided to solve it. I'll walk through the engineering decisions behind Sentry's open-source Agent Monitoring: why we landed on three span types — intent, reasoning, action — and how they plug into your existing tracing stack. We'll dig into the surprisingly tricky parts (token cost tracking that goes literally negative if you get it wrong), conversation tracking across agent invocations, and what it takes to instrument a Python agent. You'll walk out knowing how agents break in prod and how to catch it. |
click here to see bioBuilding observability tools |
![]() Gábor Mészáros |
From Imports to Innovation: The Dynamics Behind Python’s Evolutionclick to see descriptionWhat can millions of real Python code snippets tell us about how the language evolves? And why do the patterns we observe in Python look uncannily similar to patterns found in patents and scientific research — systems that seem to have nothing to do with software? This talk begins with a practical challenge: extracting structured signals from the chaotic world of Stack Overflow. We built a pipeline that scanned posts for Python code blocks, identified import statements, normalised package names, filtered noise, and reconstructed a time-ordered stream of collections, each composed of the packages used in that snippet. From this, we derived two simple indicators of innovation: • new packages appearing for the first time, and • new package pairs appearing together for the first time. Once these signals are extracted, a surprisingly coherent picture emerges. The Python ecosystem introduces brand-new packages less and less frequently over time, yet continues to generate new combinations of packages at a remarkably steady pace. Developers reuse familiar tools, but they also explore the space of possible pairings with a precision that looks — statistically — almost mechanical. To understand just how surprising this is, we compare Python’s behavior with two very different worlds. The first is the US patent system, where technology codes assigned to inventions can be analyzed the same way we analyze Python imports. A classic 2015 study by Youn et al. showed that while new technology codes appear at a slowing rate, pairs of codes accumulate almost linearly over two centuries of innovation. The second is a corpus of physics publications, which behaves in much the same way when one treats subject classification codes as ingredients. Across all three domains — software, science, and invention — the same pattern holds. Distinct components grow sublinearly (Heaps’ law), while distinct combinations grow close to linearly. This parallel is not only unexpected; it suggests that these systems share a deeper underlying mechanism, bound not by specific domain-specific details but by the very foundational patterns of human innovation. In the second half of the talk, we introduce the concept of "adjacent possible" and demonstrate its modelling via a simple stochastic model: a Pólya urn extended with the adjacent possible. The model assumes only two forces: reinforcement of frequently used components and occasional introduction of new ones. Despite its simplicity, it reproduces the empirical behavior of all three systems without requiring domain-specific rules. It shows how a stable exploration–exploitation balance can arise naturally, leading to predictable rates of combinatorial novelty even in rapidly changing ecosystems. The framework offers a new way to think about the ecosystem: not as a chaotic swarm of libraries, but as an innovation system governed by universal constraints. It sheds light on why certain libraries become dominant, why the combination space grows the way it does, and how the community collectively expands the “adjacent possible” of the language. Attendees of the talk will learn: • how to extract meaningful innovation signals from real Python code at scale, • how to measure novelty and combinatorial creativity in software ecosystems, • why Python’s long-term evolution aligns with empirical laws from patents and science, • and how simple generative models can help reason about complex developer behavior. The talk connects engineering, data analysis, and innovation theory to reveal an unexpected insight: Python grows the way many creative systems grow — slowly at the edges, rapidly in combinations, and always under the quiet guidance of reinforcement and the adjacent possible. |
click here to see bioMathematician turned software engineer (turned network scientist ... more on that later! ;)). Passionate about Python, powered mostly by coffee, and firmly in the tabs-over-spaces camp. |
![]() Thomas Aglassinger |
Multi-lingual advanced search in Django without Cloudclick to see descriptionDjango offers built-in support for classic full-text search (FTS), but sometimes it can be hard to understand the results. This approach also has limitations, some of which can be overcome with semantic search. This talks first explains the basics of full-text search and how the rank is computed and why it is so fast compared to field lookups like `icontains`. Next, we take a look at performing a semantic search, where search terms are found by their similar meaning. For example, a cat is closer to a dog than a house. For that, we use Ollama to vectorize texts with an embedding model, store it in the database using pgvector, and retrieve it sorted by similarity. Finally, full-text and semantic search are combined into a hybrid search, which give best of both worlds. The talk also covers how to search languages other than English. All this can be done on a laptop without the need for cloud services or your data leaving your premise. This allows for data sovereignty and keeps your operational costs predicable. |
click here to see bioThomas Aglassinger is a software developer and founder of Siisurit. He has worked in multiple sectors such as finance, e-commerce, or health. He has designed and developed multiple applications with a search feature using technologies like Django and PostgreSQL, but also Java and Solr. He is a casual open source developer and maintains a couple of PyPI packages such as pygount (count source code) and ebcdic (codecs for mainframes). In his free time, he likes to go on bike trips or play video games. |
![]() Vinayak Mehta |
Running Every Street in Paris with Python and PostGISclick to see descriptionIn 2006, Tom Murphy started a project of running every street in Pittsburgh (over 1,500 miles in total). He finished the project in 2022, covering 3661 miles in 269 runs. In this talk, we'll look at how we can do the same in our cities and track our progress, with Paris as an example. We'll explore how to extract street networks from OpenStreetMap, process GPS tracking data from running activities, and build a system to track progress toward covering every street in a city. We'll dive into challenges like handling GPS inaccuracies, matching runs to streets, and maintaining a database of covered streets. This talk is aimed at Python developers interested in working with geospatial data using Python libraries like `osmnx`, `shapely`, `geopandas`, and storing it for efficient querying in Postgres and PostGIS. |
click here to see bioWorking on open-source tools http://fleurmcp.com, camelot, present, excalibur, and many more. F20 @recursecenter. |
![]() Robina Mirbahar |
Hands-On: Building AI Applications in Pythonclick to see descriptionAI features are becoming common in Python applications, but many developers struggle to move from demos to real, maintainable code. In this hands-on workshop, participants will build a small AI-powered Python application step-by-step. The focus is on understanding how AI fits into a Python system: structuring inputs and outputs, integrating a language model, adding simple logic, and handling failure cases responsibly. Rather than relying on heavy frameworks or hidden abstractions, the workshop uses clear, minimal Python code to demonstrate patterns that participants can reuse in their own projects. A modern language model (such as Gemini) is used as an example, but the concepts apply to any AI-backed Python application. No prior AI or machine learning experience is required. |
click here to see bio💻 Multi-Cloud Architect | AWS | Azure | Google Cloud 🤖 Generative AI Specialist | LLMs | AI Content Generation | AI for Business 🎤 Tech Speaker & Mentor | Google Cloud Innovator Champion | Women Techmakers Ambassador 👩💻 Founder of SheCloud | Empowering women in tech through education & mentorship 🎨 AI Art & Design Enthusiast | AI-powered creative solutions 📚 Lifelong Learner | Cloud Native | Kubernetes | AI-Powered Automation |
![]() Charlie Lin |
Tying Up Loose Threads: Making your Project No-GIL Readyclick to see descriptionIf you messed around with Python's command line options or read the official documentation, you might wonder what the -Xgil option or the PYTHON_GIL environment variable did to your scripts, and whether setting either affects performance. The hubbub on popular wheels such as pyo3, python-zstandard, numpy, uv, cffi, and cython supporting the free-threaded interpreter is no passing fad either. For Pythonistas that don't read PEPs in their spare time or contribute to the cpython project itself, an adventure that delves into a less known, yet jaw-dropping aspect of Python awaits! Python's Global Interpreter Lock, which determines which single thread can execute native Python code and call C API functions, simplifies writing multithreaded code. However, sticking with this execution model leaves out extra performance afforded by modern multicore CPUs with hyperthreading, as automatic locking and unlocking of the GIL does not scale well with thread counts, especially in performance-sensitive workloads. The newfangled free-threaded interpreter promises salvation when running either pure Python code or with compiled extensions. General multithreading rules apply (prefer thread-local variables, using locks to prevent simultaneous access of shared data), but when dealing with projects containing compiled extensions that directly or indirectly interface with Python's C API, more porting rules also apply. Key porting tips, including projects using the Limited API, include: port native code away from C API functions that avoid borrowed references because they aren't thread-safe; modify unit tests to catch concurrency bugs arising from assuming the presence of the GIL; and extend CI coverage of Python interpreters both for testing and to build free-threaded compatible wheels. Outline: * Introduction (2-3 min.) * What is the -Xgil option? * What is the GIL? * What is the free-threaded interpreter? (6-8 min.) * Global Interpreter Lock: downsides of automatic serialization of parallel workloads * How to try out the free-threaded interpreter * Increased parallelism with the no-GIL interpreter with multi-core CPUs * Porting tips (15-18 min) * Adding a trove classifier in pyproject.toml * Marking your extension module as supporting no-GIL * Limited API (and PEP 803) * Bumping key dependencies, including FFI wheels * Using locks, mutexes, and atomics in native code to prevent concurrency bugs * Including pytest-run-parallel to catch threading bugs * Closing Remarks (2 min.) * Q&A (2 min.) |
click here to see bioI am a free-lance OSS contributor, and a graduate from a little-known private arts college known as Rollins College. I am rather fond of subjecting myself to testing bleeding-edge unstable software in the following ways: * Dual-booting Windows Insider Canary and Fedora Rawhide (this is where do most of my development) - Booting Fedora with the latest unstable kernel snapshots * Compiling software from source using unstable toolchains, such as GCC snapshots for C/C++, CPython prereleases, and Rust nightly * Using non-ASCII usernames so that I have to test software for full Unicode support (and many bug reports were filed and fixed on GH) Most of my family has worked in restaurants throughout their entire lives, and I (with high certainty) am the very first family member interested in deep-dives on software in general, and software development. Beyond programming, I attend broad game nights at a pizzeria (preferring strategy games), am a casual Trekkie, and consider myself somewhat astute in attaining as much online privacy as possible. In particular, I use privacy.sexy for debloating my Windows install and route all DNS requests to servers that support encrypted DNS over HTTPS. |
![]() Ishan Jain |
What Did My Agent Do? Observability and Accountability for AI Agentsclick to see descriptionGenerative AI systems and AI agents behave very differently from traditional software. Their non-deterministic nature and ability to act across multiple steps make debugging and accountability harder, which increases the need for better observability. Beyond latency and error rates, teams need insight into prompts, responses, and agent actions to understand what an agent did and why. In this session, I will show how to instrument AI agents using OpenTelemetry and the GenAI Semantic Conventions, with OpenLIT as the native SDK. Through a live demo, I will demonstrate how to capture agent interactions alongside performance telemetry using Prometheus and Jaeger, while keeping sensitive data separate to reduce risk and cost. I will also show how telemetry can support ongoing evaluations, helping teams reason about agent behavior over time without logging everything. This talk is for engineers building AI agents who want to improve trust and accountability without oversharing data. |
click here to see bioI’m a Developer Experience Engineer at Grafana Labs, where I spend most of my time helping people make observability easier and more practical. I maintain the Grafana Ansible Collection and the Grafana Operator, projects that have grown to over 5 million downloads, and I love working with open source. I started my career as an SRE, and that hands-on background still shapes how I think about systems today. I enjoy sharing what I learn, which has led me to speak at conferences about eBPF, Kubernetes, and AI, and to write guides that help engineers run real-world infrastructure more confidently. |
![]() Cherno Basiru Jallow |
Building Deep Learning Systems with Python: For Problems in Health, Agriculture, and Climateclick to see descriptionDeep learning is often introduced through idealized examples: clean datasets, powerful hardware, and benchmark driven results. In practice, most real world problems are messier, constrained, and driven by decisions rather than scores. This talk focuses on how to build deep learning systems with Python that work under conditions, using examples from health, agriculture, and climate data. The session presents a practical, system oriented approach to deep learning. Instead of focusing on complex architectures, it emphasizes how to frame problems correctly, work with imperfect data, build reliable baselines, and evaluate models in ways that support real world use. 1. From Problem to System (Why framing matters) Moving from “can we train a model?” to “what decision should this system support?” Defining success metrics based on context (e.g., sensitivity vs. accuracy) Why many deep learning projects fail before modeling even begins 2. Data Reality Check Working with limited, noisy, and imbalanced datasets Common data issues in health, agriculture, and environmental data Practical strategies for inspection, validation, and preprocessing using Python 3. Building the Baseline in Python Why simple models sometimes matter more than complex ones early on Establishing strong baselines before scaling complexity A reusable Python workflow: data loading, training loop, evaluation 4. Model Design Under Constraints Choosing architectures that match the problem and resources Training on CPUs or limited hardware When transfer learning helps and when it does not 5. Evaluation Beyond Accuracy Selecting metrics that reflect costs and risks Understanding failure modes through error analysis Using interpretability tools to inspect model behavior 6. Case Studies Across Domains Health: image based disease detection and triage support Agriculture: crop disease detection from visual data Climate: pattern detection in environmental and geospatial data What stayed the same across domains, and what changed 7. From Experiment to Deployment Thinking Reproducibility and documentation What makes a model usable outside a notebook Common pitfalls when moving toward real world use 8. Key Takeaways A practical framework for building deep learning systems with Python How to apply the same workflow across different domains How to think critically about data, models, and evaluation in real settings This talk is aimed at developers and data practitioners who already know Python and basic machine learning concepts and want to move beyond demos toward building systems that actually support decisions. |
click here to see bioCherno Basiru Jallow is a Gambian machine learning engineer and computer science student. Previous Data scientist Intern at the Medical Research Council Unit The Gambia at the London School of Hygiene & Tropical Medicine, former Lead AI/ML intern at Obentas Global Technology Company, Scientific researcher (published preprints), public speaker (2x Google Devfest, Pycon Senegambia, + other tech events), Hackathons winner ($$ cash prizes), and tech content creator (Youtube, tiktok, Linkedin, Instagram). He started coding early in life and today, he design and build AI systems that tackle problems in healthcare, education, and communities across Africa, with a focus on computer vision, deep learning, and research you can deploy in the real world. He regularly gives tech talks across the Senegambia region and loves creating videos to share knowledge and inspire others. |
![]() Claudia Ng |
How I Built a RAG-Powered AI Assistant With Pythonclick to see descriptionMost of my readers ask similar questions about data science careers, AI, and breaking into the industry without a CS degree, but the answers are usually buried across 50+ of my substack blog posts (https://aiweekender.substack.com). So I built a Python-based AI assistant (https://assistant.ds-claudia.com) that can answer their questions directly, using my past writing as the knowledge base. In this talk, I’ll walk through how I used Supabase, OpenAI, and Streamlit to build a lightweight retrieval-augmented generation (RAG) system that: 1. Retrieves relevant posts, 2. Generates personalized responses, 3. Helps readers discover content they would have missed. This talk is a practical, end-to-end walkthrough of building a RAG AI assistant on top of existing content. I’ll cover: - Parsing real-world text from RSS feeds and HTML - Converting posts into embeddings for semantic search - Storing and querying embeddings with Supabase and pgvector - Generating personalized answers with OpenAI LLMs - Streaming responses in Streamlit, including source citations - Logging queries to understand reader needs and improve the system Attendees will leave with a clear picture of how to build a RAG-based assistant for their own blogs, documentation, or newsletters. This is a practical, end-to-end look at turning messy real-world text into a useful assistant that people actually rely on, all without fine-tuning or heavy infrastructure. |
click here to see bioClaudia Ng is a machine learning engineer with six years of experience building scalable ML systems in Silicon Valley fintech startups. She has deep expertise in credit modeling, fraud detection, and AI product design, and now focuses on building and writing about AI projects and tools. She holds a Master’s in Public Policy from Harvard University and a Bachelor’s in International Business from Northeastern University. Fun fact: She is a polyglot who speaks 9 languages. |
![]() Cheuk Ting Ho |
Debug smarter, not harder - all you need to know about debugging in Pythonclick to see descriptionThis talk tackles the common, yet ultimately limiting, practice of using print statements for debugging in Python. We will explore why relying on print statements often becomes inefficient and cumbersome as applications grow in complexity. The presentation will guide attendees through a transition to professional-grade debugging tools, beginning with a detailed look at the built-in Python debugger, pdb, including essential commands and workflows. Next, I will demystify the powerful debugging capabilities integrated into modern Integrated Development Environments (IDEs), specifically demonstrating debugpy and its seamless application within popular tools like VS Code and PyCharm. Finally, the talk will introduce debug logging as a robust, scalable alternative to temporary print statements, covering best practices for when and how to implement a logging framework to manage application state effectively. By contrasting these strategies, this session aims to empower developers to choose the right tool. Be it a dedicated debugger, an IDE feature, or a logging framework, for any challenge. Enabling smarter, faster, and more effective code remediation in their daily work. Outline: - Why not use “print”? - Situation when you can use “print” to debug - Situation when using “print” is not helpful - Debuggers used in Python - pdb : why use pdb not “print” - Debugging in IDEs - Debugpy: what is it? - Debugpy used in VS Code and PyCharm - Debug logging - When should you use debug logging? - Logging vs “print” - How to manage debug logs? - Conclusion and summary - Use the right tools and strategies for the situation - Summarise the debugging strategies that have been introduced |
click here to see bioAfter having a career as a Data Scientist and Developer Advocate, Cheuk dedicated her work to the open-source community. Currently, she is working as a developer advocate for JetBrains. She has co-founded Humble Data, a beginner Python workshop that has been happening around the world. Cheuk also started and hosted a Python podcast, PyPodCats, which highlights the achievements of underrepresented members in the community. She has served the EuroPython Society board for two years and is now a fellow and director of the Python Software Foundation. |
![]() Cheuk Ting Ho |
Are we free-threaded ready? Looking at where free-threaded Python failsclick to see descriptionFree-threaded Python aims to significantly improve performance, allowing multiple native threads to execute Python bytecode concurrently. In this talk, we will explore the current state of Python's free-threading initiative and assess its practical readiness for widespread adoption. We begin by exploring the background of free-threaded Python, summarising its origins, current status, and the technical differences distinguishing it from standard Python implementations. A key focus will be examining the compatibility landscape, specifically investigating how many popular third-party libraries are currently prepared for free-threading. We will distinguish between generic pure Python wheels and explicitly free-threaded wheels and I’ll explain how the community can contribute to compatibility verification. We then critically discuss free-threaded Python's necessity, weighing the disadvantage of increased thread safety concerns (and verification methods) against the promised advantage of speed (including multithreaded profiling).Will free-threaded Python become a critical future direction for the language? How can you contribute? If and how specific projects can immediately benefit from it? Let’s find out together! Outline: - Background about free-threaded Python - Where it started and where are we now - What are the differences between standard and free-threaded Python - Looking at how many popular libraries are free-threaded ready - free-threaded wheels vs generic pure Python wheel - Testing the library built for free-threading compatibility and how you can help - Do we really need free-threaded Python? - Disadvantage: thread safety - how to verify thread safety of a library - Advantage: speed up - how to perform multithreaded profiling - Conclusion - Free-threaded Python is the future, and how you can contribute - Check if you can take advantage of it and equip yourself Join the discourse and voice out your thoughts |
click here to see bioAfter having a career as a Data Scientist and Developer Advocate, Cheuk dedicated her work to the open-source community. Currently, she is working as a developer advocate for JetBrains. She has co-founded Humble Data, a beginner Python workshop that has been happening around the world. Cheuk also started and hosted a Python podcast, PyPodCats, which highlights the achievements of underrepresented members in the community. She has served the EuroPython Society board for two years and is now a fellow and director of the Python Software Foundation. |
![]() Cheuk Ting Ho |
(Workshop) Do you know how well your model is doing? Evaluate your LLMsclick to see descriptionPrerequisites: - Have experience coding in Python (with Python installed in the local machine) - Basic understanding of machine learning and LLMs - Experience with Hugging Face Transformers preferred but not necessary - A Hugging Face Hub account (sign up for free) - A modern computer that can fine-turn small LLMs locally Description: Large Language Models (LLMs) are becoming central to modern applications, yet effectively evaluating their performance remains a significant challenge. How do you objectively compare different models, benchmark the impact of fine-tuning, or ensure your LLM responses adhere to safety guidelines (guard-railing)? This hands-on workshop addresses these critical questions. We will begin with an essential revision of the Hugging Face Transformers library, covering basic LLM inference and fine-tuning. The core of the workshop will introduce and provide deep practice with Lighteval, an efficient and powerful LLM evaluation framework. Participants will learn how to leverage Lighteval to compare various LLMs available on the Hugging Face Hub using a range of pre-built tasks and metrics. Finally, we will delve into advanced evaluation techniques, focusing on creating custom tasks and metrics tailored to unique, real-world application requirements. Participants will learn how to prepare custom datasets on the Hugging Face Hub and integrate them into Lighteval for precise, domain-specific evaluation. By the end of this workshop, you will possess the practical skills to rigorously evaluate, benchmark, and fine-tune your LLMs with confidence. Outline: Part 1 - Presentation: The importance of evaluation of LLMs - Compare performance of LLM for specific tasks - Benchmark the fine-tuning performance - Rail guard the LLM responses - Coding exercise: Introduction and revision of Hugging Face Transformers - Revision of using Transformers for LLM influence - Fine tuning a LLM with transformers Part 2 - Presentation: Introduction of Lighteval - What is Lighteval and what can it do - Different tasks and metrics available in Lighteval - Coding exercise: Using Lighteval to compare LLMs - Familiar the use of Lighteval - Compare two LLMs on Hugging Face Hub - Experiment with different tasks and metrics Part 3 - Presentation: Advance use of Lighteval - Introduction of custom tasks and metrics - What is needed for creating custom tasks and metrics - How to put custom tasks and metrics together - Coding exercise: Practice with custom tasks and metrics - Uploading datasets to Hugging Face Hub - Creating custom tasks and metrics - Using custom tasks and metrics to compare LLMs |
click here to see bioAfter having a career as a Data Scientist and Developer Advocate, Cheuk dedicated her work to the open-source community. Currently, she is working as a developer advocate for JetBrains. She has co-founded Humble Data, a beginner Python workshop that has been happening around the world. Cheuk also started and hosted a Python podcast, PyPodCats, which highlights the achievements of underrepresented members in the community. She has served the EuroPython Society board for two years and is now a fellow and director of the Python Software Foundation. |
![]() Gábor Mészáros |
From Imports to Innovation: The Dynamics Behind Python’s Evolutionclick to see descriptionWhat can millions of real Python code snippets tell us about how the language evolves? And why do the patterns we observe in Python look uncannily similar to patterns found in patents and scientific research — systems that seem to have nothing to do with software? This talk begins with a practical challenge: extracting structured signals from the chaotic world of Stack Overflow. We built a pipeline that scanned posts for Python code blocks, identified import statements, normalised package names, filtered noise, and reconstructed a time-ordered stream of collections, each composed of the packages used in that snippet. From this, we derived two simple indicators of innovation: • new packages appearing for the first time, and • new package pairs appearing together for the first time. Once these signals are extracted, a surprisingly coherent picture emerges. The Python ecosystem introduces brand-new packages less and less frequently over time, yet continues to generate new combinations of packages at a remarkably steady pace. Developers reuse familiar tools, but they also explore the space of possible pairings with a precision that looks — statistically — almost mechanical. To understand just how surprising this is, we compare Python’s behavior with two very different worlds. The first is the US patent system, where technology codes assigned to inventions can be analyzed the same way we analyze Python imports. A classic 2015 study by Youn et al. showed that while new technology codes appear at a slowing rate, pairs of codes accumulate almost linearly over two centuries of innovation. The second is a corpus of physics publications, which behaves in much the same way when one treats subject classification codes as ingredients. Across all three domains — software, science, and invention — the same pattern holds. Distinct components grow sublinearly (Heaps’ law), while distinct combinations grow close to linearly. This parallel is not only unexpected; it suggests that these systems share a deeper underlying mechanism, bound not by specific domain-specific details but by the very foundational patterns of human innovation. In the second half of the talk, we introduce the concept of "adjacent possible" and demonstrate its modelling via a simple stochastic model: a Pólya urn extended with the adjacent possible. The model assumes only two forces: reinforcement of frequently used components and occasional introduction of new ones. Despite its simplicity, it reproduces the empirical behavior of all three systems without requiring domain-specific rules. It shows how a stable exploration–exploitation balance can arise naturally, leading to predictable rates of combinatorial novelty even in rapidly changing ecosystems. The framework offers a new way to think about the ecosystem: not as a chaotic swarm of libraries, but as an innovation system governed by universal constraints. It sheds light on why certain libraries become dominant, why the combination space grows the way it does, and how the community collectively expands the “adjacent possible” of the language. Attendees of the talk will learn: • how to extract meaningful innovation signals from real Python code at scale, • how to measure novelty and combinatorial creativity in software ecosystems, • why Python’s long-term evolution aligns with empirical laws from patents and science, • and how simple generative models can help reason about complex developer behavior. The talk connects engineering, data analysis, and innovation theory to reveal an unexpected insight: Python grows the way many creative systems grow — slowly at the edges, rapidly in combinations, and always under the quiet guidance of reinforcement and the adjacent possible. |
click here to see bioMathematician turned software engineer turned network scientist. I explore how structure and behavior emerge in complex systems — from code ecosystems to large graphs. Passionate about Python, powered mostly by coffee, and firmly in the tabs-over-spaces camp. |
![]() Vyom Gupta |
Schema-First AI: Type-Safe LLM Outputs with PydanticAI (Tested in CI, Offline with Ollama)click to see descriptionOn Friday, your team ships an “LLM-powered” feature. On Monday, the dashboard looks… haunted. One row says years_experience = “5–7”, another says “about five-ish”, a user’s email is “john at gmail dot com”, and your database is now storing what can only be described as creative writing. Nothing is “down” — but your data is drifting, your analytics are lying, and your code is becoming a scrapbook of quick fixes. If you store “5–7” in an integer column, your pipeline breaks—or worse, silently coerces and corrupts your metrics. This is the quiet failure mode of AI features: LLMs return plausible text, not guaranteed data. And many projects accidentally return to the old “stringly-typed” era: prompt → paragraph → regex → json.loads() → try/except → more try/except. It works in demos… until real-world inputs arrive: ranges (“5–7”), multiple numbers (“worked on 3 projects in 2 teams”), mixed languages, inconsistent keys, wrong types, invalid formats, and outputs that change shape between runs. Python teams spent years building trust through schemas, validation, and tests. We define request/response contracts for every API. We validate before writing to the database. We ship with test suites and Continuous Integration (CI). But when the LLM enters the stack, many teams throw that discipline away and “hope the output behaves.” This talk brings it back with a practical approach: Schema-First AI — treat LLM output like an API response with a non-negotiable contract. What you’ll see (and copy into your own projects) We’ll build a workflow that turns messy text into reliable, typed objects: Define the contract first using a Pydantic model (types + constraints) Example: years_experience is an integer (0–50), email must be a valid email, skills must be a list, not prose. Enforce structured output using PydanticAI The model must return output that matches your schema. If it doesn’t, validation fails (and we handle it cleanly). Validate before storage Bad-but-plausible outputs get caught before they corrupt your database and metrics. Test without calling the model (fast, $0, CI-friendly) Unit-test business logic using typed Pydantic objects (no model calls), add contract tests for schema guarantees, and use Hypothesis (property-based testing) to generate hundreds/thousands of edge cases in seconds. Live demo Unstructured output → typed result → validation catching errors → a test suite running in under a second → a small production-style pipeline + dashboard that makes the reliability visible. The full demo runs offline using Ollama (no API keys), and the same architecture works with hosted providers too. Why this is useful (beyond the demo) You’ll reduce brittle parsing code dramatically You’ll prevent silent data corruption (often worse than a crash) You’ll get a repeatable pattern for extraction, classification, routing, and “AI-as-a-service” internal tools You’ll have a testing strategy that lets AI code ship with confidence Takeaways A repeatable Schema-First blueprint for extraction/classification workflows A CI-ready testing strategy for AI features (including $0 model-call tests) Practical patterns for validation, retries, and schema evolution A reference repo + checklist you can apply to your next LLM feature Audience: Beginner → Intermediate Python developers. No AI background required. |
click here to see bioHi, I’m Vyom, an SDE II at Cisco Systems (India) working on data center networking/back-end systems. I enjoy turning messy, real-world problems into reliable engineering workflows (observability, testing, automation, and developer experience). Outside work, I’m into adventure travel, puzzles/brain-teasers, and building small projects that turn “this should be simpler” into a usable tool. |
![]() Manuel Alejandro Ledezma Falcon |
Fair AI from QA: How Testers Can Prevent Algorithmic Biasclick to see descriptionIn this interactive talk, we will explore how artificial intelligence systems can be affected by hidden biases in data and models, impacting critical decisions such as hiring, loan approvals, or medical diagnoses. Attendees will learn what algorithmic bias is, how to detect it from a QA perspective, and practical strategies to mitigate it. The session includes a hands-on activity analyzing real datasets, helping participants identify discriminatory patterns and reflect on the ethical role of testers in the AI era. This talk is essential for QA professionals and developers who want to ensure technology is not only functional but also fair, safe, and transparent. |
click here to see bioManuel Ledezma, known in the tech community as Tester Testarudo, is a software testing and automation specialist with a strong commitment to delivering high-quality and reliable software. Over the past years, he has focused on mastering QA practices and automation strategies, working in agile and fast-paced environments. He has contributed to leading companies such as Mediktor, AXA, Telecom Argentina, Newfold, and Mojo Marketplace, where he implemented scalable testing solutions that improved product stability and user experience. Manuel currently serves as the QA Automation Lead at Mediktor in Barcelona, Spain, where he leads automation initiatives to ensure robust and impactful digital products. Beyond his professional work, Manuel empowers the QA community through Tester Testarudo, his educational project dedicated to helping newcomers learn testing in a clear, practical, and accessible way. |
![]() Fiona Ebner |
Free Software Is All About Freedomclick to see descriptionFree software (often also called open-source software) plays an essential role for almost all modern digital infrastructure. The Python language and most of the Python ecosystem are also free software. But what does that actually mean? Learn about the four essential freedoms, why they matter, and how they lead to the success of free software. Learn about the motivation and values behind free software. Software is everywhere in modern society, and we all rely on software every day. We all depend on software every day. Big tech companies often abuse our dependence and force unwanted features on us. We should care that people are in control of their own devices and digital lives. Free software communities work hard to provide alternatives and the possibility to escape from vendor lock-in. Let's get political! Free software is ever more important for democracy and digital sovereignty. Learn about the initiatives of the FSFE (Free Software Foundation Europe), fighting to benefit users and society for 25 years now. |
click here to see bioI work as a software engineer and maintainer at Proxmox. I mostly work with Perl and C, so I don't know much Python, but somehow managed to get some tests written in Python for QEMU accepted. I studied mathematical logic, but I've long been interested in the intersection and interactions of technology and society. I've been using Linux and free software for most of my life. I'm a member of the Team Austria of the FSFE (Free Software Foundation Europe). I'm a member of the C3W (Chaos Computer Club Wien), not to be confused with the W3C. |
Gregor Horvath |
click to see description |
click here to see bioFreelance Software Developer / Consultant using Python since ~25 years https://gregor-horvath.com/ |
![]() James Donahue |
Feature Selection: What your model can't tell youclick to see descriptionThere are several algorithms for selecting features. However, they rely on the data and the correlations between features. Is there a place for the saying "correlation is not causation?" This talk focuses on exactly that. Including what economists call "bad controls" can mask effects and create spurious correlations, ending in reduced model performance. Using an entertaining mix of anecdotes and simulated data, I explain how feature selection can benefit from the causal inference literature. The colorful cast of characters includes selection bias, confounders, and Simpson's Paradox. While widely applicable, this talk is not overwhelmingly technical. The target audience is anyone who uses data, but the math involved is limited to linear regression. In fact, I hope that anyone, regardless of background, can enjoy the nuances of data interactions with me. |
click here to see bioRaise in Nashville, Tennessee, settled in Hamburg, Germany, with a few other continents in between. My academic background is macroeconomics, with a focus on Bayesian methods and labor market dynamics. I am currently working with AWS Cloud applications, trying to find a good compost system, and still drinking too much coffee. |






























