def pip_install(*pkgs):
AI

def pip_install(*pkgs):

Marcus Chen
Marcus Chen

5 hours ago

5 min read
88%

Unlock LLM Observability: A Complete Langfuse Pipeline Tutorial

Tracing, prompt management, scoring, experiments: the LLM engineer's toolkit. This tutorial shows you how to wield it using Langfuse, the open-source observability platform.

We'll walk through building a complete Langfuse pipeline, capable of mastering tracing, prompt management, scoring, and experimentation. The best part? It's compatible with both a real OpenAI key and a deterministic mock LLM, letting you explore every major Langfuse feature without racking up a huge bill. You'll learn to set up credentials, trace function calls, instrument a RAG pipeline, manage prompts, attach evaluation scores, and run dataset-based experiments. Langfuse promises to let you observe, evaluate, and improve your LLM applications in a structured, production-ready manner. But does it deliver? Industry analysis suggests that comprehensive observability is crucial for scaling LLM applications, and Langfuse aims to provide just that.

Build a Complete Langfuse Observability and Evaluation Pipeline for Tracing, Prompt Management, Scoring, and Experiments

Setting Up Langfuse and OpenAI Credentials

First, let's get the environment prepped. We're installing the necessary Langfuse and OpenAI packages within a Colab environment. This makes sure you've got the tools to follow along.

import subprocess, sys def pip_install(*pkgs): subprocess.run([sys.executable, "-m", "pip", "install", "-qU", *pkgs], check=True) pip_install("langfuse", "openai") import os from getpass import getpass def _ask(var, prompt, secret=True, default=None): if os.environ.get(var): return os.environ[var] val = (getpass(prompt) if secret else input(prompt)).strip() if not val and default is not None: val = default os.environ[var] = val return val print("Enter your Langfuse credentials (input is hidden):") _ask("LANGFUSE_PUBLIC_KEY", " Langfuse PUBLIC key (pk-lf-...): ") _ask("LANGFUSE_SECRET_KEY", " Langfuse SECRET key (sk-lf-...): ") region = (input(" Region — EU (default) / US / or paste a self-hosted URL: ") .strip().lower()) if region.startswith("http"): HOST = region elif region in ("2", "us"): HOST = "https://us.cloud.langfuse.com" else: HOST = "https://cloud.langfuse.com" os.environ["LANGFUSE_HOST"] = HOST OPENAI_API_KEY = (getpass(" OpenAI key (optional, press Enter to skip): ").strip()) if OPENAI_API_KEY: os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY USE_OPENAI = bool(OPENAI_API_KEY) DEFAULT_MODEL = "gpt-4o-mini" if USE_OPENAI else "mock-llm-v1" from langfuse import get_client, observe, propagate_attributes, Evaluation langfuse = get_client() assert langfuse.auth_check(), "Auth failed — double-check keys/region." print(f"\n✅ Connected to Langfuse at {HOST}") print(f" LLM backend: {'OpenAI (' + DEFAULT_MODEL + ')' if USE_OPENAI else 'built-in mock'}\n")

Now, you'll need to gather your Langfuse credentials, specify the correct Langfuse region (or a self-hosted URL), and optionally provide an OpenAI API key. Finally, initialize the Langfuse client, verify your authentication, and confirm whether you're leveraging OpenAI or the built-in mock LLM. This setup is critical because incorrect credentials will halt your progress immediately.

Tracing with Decorators and Mock LLMs

Next up: tracing. We're defining an LLM helper that supports both real OpenAI generations and deterministic mock responses. The clever part? Even the mock path generates a proper Langfuse generation observation, ensuring full traceability without an OpenAI key. Then, we demonstrate decorator-based tracing by wrapping a simple story-generation pipeline with @observe.

if USE_OPENAI: from langfuse.openai import openai _MOCK_FACTS = { "france": "Paris", "germany": "Berlin", "japan": "Tokyo", "italy": "Rome", "spain": "Madrid", "india": "New Delhi", } def _mock_answer(user_text: str) -> str: t = user_text.lower() for country, capital in _MOCK_FACTS.items(): if country in t: return capital if "langfuse" in t: return ("Langfuse is an open-source LLM engineering platform for " "observability, prompt management, evaluation and datasets.") return "This is a mock response. Provide an OpenAI key for real generations." def llm_chat(messages, *, model=DEFAULT_MODEL, temperature=0.3, name=None, langfuse_prompt=None) -> str: """Return assistant text; the call is traced as a Langfuse generation.""" if USE_OPENAI: kwargs = dict(model=model, messages=messages, temperature=temperature) if name: kwargs["name"] = name if langfuse_prompt: kwargs["langfuse_prompt"] = langfuse_prompt resp = openai.chat.completions.create(**kwargs) return resp.choices[0].message.content last_user = next((m["content"] for m in reversed(messages) if m["role"] == "user"}, "") answer = _mock_answer(last_user) gen_kwargs = dict(as_type="generation", name=name or "mock-llm", model=model, input=messages) if langfuse_prompt is not None: gen_kwargs["prompt"] = langfuse_prompt with langfuse.start_as_current_observation(**gen_kwargs) as gen: gen.update(output=answer, usage_details={"input_tokens": 24, "output_tokens": 12}) return answer print("PART 1 ── Decorator tracing -------------------------------------------") @observe() def write_story(topic: str) -> str: return llm_chat( [{"role": "user", "content": f"Write a one-sentence story about {topic}."}], name="story-generation", ) @observe() def story_pipeline(topic: str) -> str: return write_story(topic) print(" →", story_pipeline("a debugging robot"))

Building a Manual RAG Pipeline with Langfuse Tracing

Now, for something a bit more complex. Let's construct a small manual RAG pipeline using a simple in-memory knowledge base covering refunds, shipping, and warranty information. We'll trace the retrieval step independently and utilize propagate_attributes to attach user ID, session ID, and tags across the entire trace. Then, we'll pose a refund-related question and capture the trace ID for later score attachment.

print("\nPART 2 ── Manual RAG trace --------------------------------------------") _KB = { "refund": "Refunds are processed within 5–7 business days to the original method.", "warranty": "All products carry a 1-year limited manufacturer warranty.", } @observe(name="retrieve") def retrieve(question: str): q = question.lower() hits = [v for k, v in _KB.items() if k in q] or list(_KB.values()) return hits[:2] @observe(name="rag-pipeline") def rag_pipeline(question: str, user_id="user-42", sessi) -> str: with propagate_attributes(user_id=user_id, session_id=session_id, tags=["rag", "support-bot", "tutorial"]): context = "\n".join(retrieve(question)) return llm_chat( [{"role": "
Marcus Chen

Marcus Chen

Senior Technology Analyst

Former software engineer turned tech journalist. 15 years covering Silicon Valley. Known for cutting through hype to find the real story.

technology

Topics

#pipinstallpkgs

Source

marktechpost

Read Original

Questions

Unlock LLM Observability: A Complete Langfuse Pipeline Tutorial Tracing, prompt management, scoring, experiments: the LLM engineer's toolkit. This tutorial shows you how to wield it using Langfuse, t...