Mastering LLM Observability: A Hands-On Guide to Langfuse and OpenTelemetry Comparison

Oleh Dubetcky
5 min readMar 1, 2025

--

Large language models (LLMs) are revolutionizing applications, from chatbots to content generation, but their complexity demands robust monitoring to track performance, costs, and behavior. Two tools stand out in this space: Langfuse, an open-source platform designed for LLM observability, and OpenTelemetry, a versatile, industry-standard framework for general-purpose monitoring. In this article, we’ll walk you through setting up Langfuse locally with Docker, then compare it to OpenTelemetry to help you decide which fits your LLM project — or how to leverage both.

Photo by Tobias Tullius on Unsplash

What is Langfuse?

Langfuse is an open-source tool built specifically for LLM applications. It offers tracing, metrics, and analytics to monitor model interactions, including latency, token usage, API costs, and user feedback. With native integrations for frameworks like LangChain and OpenAI, plus a built-in dashboard, it’s ideal for developers seeking a tailored, lightweight solution. You can self-host it for full control or use the cloud version for quick testing.

Key Features:

  • Traces the lifecycle of LLM requests and responses.
  • Tracks token counts, latency, and costs in real-time.
  • Provides a UI for debugging and insights.
  • Supports self-hosting via Docker.

What is OpenTelemetry?

OpenTelemetry, a CNCF-backed open-source framework, is a general-purpose observability tool. It collects traces, metrics, and logs from any application, making it perfect for distributed systems — not just LLMs. With broad language support and flexible exporters (e.g., to Prometheus or Jaeger), it’s highly customizable but requires more setup than Langfuse.

Key Features:

  • Distributed tracing across services.
  • Metrics like latency and error rates.
  • Logging for detailed debugging.
  • Integrates with a vast ecosystem of tools.

Getting Started with Langfuse Locally Using Docker

Let’s set up Langfuse on your local machine with Docker — a simple, reproducible way to monitor an LLM app. We’ll include a refactored Python example to demonstrate tracing with the OpenAI API.

Step 1: Prerequisites

  • Docker: Installed and running (includes Docker Compose).
  • Python: For the example app (version 3.8+ recommended).
  • OpenAI API Key: For testing (set as an environment variable: OPENAI_API_KEY).

Step 2: Deploy Langfuse Locally

Clone the Repository:

git clone https://github.com/langfuse/langfuse.git
cd langfuse

Start Langfuse:

docker compose up -d

This launches Langfuse at http://localhost:3000. Wait a minute for it to initialize.

Access the UI:

Step 3: Integrate Langfuse into Your Python App

Install Dependencies:

pip install langfuse ollama

Create a Project and Get Keys in the Langfuse UI:

  • Create Organization
  • Create a new project (e.g., “Local LLM Test”).
  • Go to Settings > API Keys, generate a public key (e.g., pk-lf-…) and secret key (e.g., sk-lf-…).

Python Code for Ollama + Langfuse:

import ollama
from langfuse import Langfuse
import time
import re

# Initialize Langfuse (local Docker instance)
langfuse = Langfuse(
public_key="pk-lf-c96f9e80-bafb-42e4-9bdb-2353a79fe0e7", # Replace with your key from Langfuse UI
secret_key="sk-lf-a13af08d-7501-4749-a240-876357ea705e", # Replace with your key
host="http://localhost:3000" # Local Langfuse server
)
# Define constants
MODEL = "llama3"

# Function to predict ISCO code with tracing and printing
def predict_isco_code(title):
# Start a trace
trace = langfuse.trace(name="ISCO Code Prediction")

# Print initial trace details
print(f"Initial Trace ID: {trace.id}")

# Prepare the prompt
prompt = f"Given a dataset with ISCO titles and codes, predict the ISCO code for the title: '{title}'. Provide the code and a brief explanation."
messages = [{"role": "user", "content": prompt}]


# Call Ollama
try:
response = ollama.chat(model=MODEL, messages=messages)
result = response["message"]["content"]
except Exception as e:
result = f"Error: {str(e)}"
error_data = {"error": result}
print(f"Error Data to Log: {error_data}")
error_event = trace.event(name="Ollama Error", outputs=error_data)
print(f"Error Event Added - ID: {error_event.id}")
langfuse.flush()
raise

# Prepare and log response event
prediction_inputs = {"title": title, "prompt": prompt}
print(f"Prediction Inputs to Log: {prediction_inputs}")


# Log input and output
trace.generation(
name="Model Interaction",
input=prediction_inputs,
output=result,
model="llama3"
)

# Add a basic score
isco_code_match = re.search(r"\b\d{4}\b", result)
score = 1.0 if isco_code_match else 0.5
trace.score(
name="Prediction Confidence",
value=score,
comment="Score based on presence of a 4-digit ISCO code"
)
print(f"Score Added - Name: Prediction Confidence, Value: {score}")

# Flush to ensure data is sent
langfuse.flush()
print("Trace flushed to Langfuse server.")

return result

# Test it
if __name__ == "__main__":
job_title = "Software Developer"
try:
print("Calling Ollama with Langfuse tracing...")
prediction = predict_isco_code(job_title)
print(f"Predicted ISCO Code for '{job_title}':\n{prediction}")
print("Check Langfuse at http://localhost:3000 for traces and scores.")
time.sleep(2) # Wait for flush to complete
except Exception as e:
print(f"Error: {e}")

Run the Script

  1. Ensure Langfuse (docker-compose up -d) and Ollama (ollama serve) are running.
  2. Save the script as isco_predict.py.
  3. Run it: python isco_predict.py.
  4. Check http://localhost:3000 for the “ISCO Code Prediction” trace, showing the title, prompt, and response.

Step 4: Explore the Dashboard

  • Refresh http://localhost:3000.

Tips for Local Use

  • Verify Docker: If localhost:3000 fails, check Docker logs: docker-compose logs.
  • Scale Locally: Add a local PostgreSQL instance in docker-compose.yml for persistence (see Langfuse docs).
  • Debugging: Use docker-compose down to stop and docker-compose up — build to restart fresh.

OpenTelemetry vs Langfuse

Now that Langfuse is running, how does it stack up against OpenTelemetry?

Key Differences

AspectOpenTelemetryLangfuseScopeGeneral-purpose for all appsLLM-specific observabilityFocusTraces, metrics, logs across systemsLLM traces, tokens, costs, qualitySetupNeeds instrumentation + backendQuick Docker setup with built-in UICustomizationHighly flexible, language-agnosticLLM-focused, less general-purposeIntegrationsBroad (e.g., Grafana, Jaeger)LLM frameworks (e.g., OpenAI, LangChain)VisualizationRequires external toolsBuilt-in dashboard.

Complementary Use

Langfuse supports OpenTelemetry via its OTLP endpoint (/api/public/otel). You can instrument your app with OpenTelemetry SDKs and send traces to Langfuse for LLM-specific analysis — like prompt/response pairs — while monitoring broader system metrics elsewhere.

  • Example: Use OpenTelemetry to trace a microservices app, export LLM-related spans to Langfuse, and view token costs in its dashboard.

When to Choose

  • OpenTelemetry: For complex, distributed systems needing universal observability.
  • Langfuse: For fast, LLM-focused monitoring with minimal setup.
  • Both: For standardized instrumentation (OpenTelemetry) with LLM insights (Langfuse).

Why It Matters

Without monitoring, LLM apps can bleed costs (e.g., excessive token usage), suffer performance lags, or deliver poor outputs. Langfuse simplifies this for AI developers, while OpenTelemetry offers a scalable foundation. Together, they bridge niche and enterprise needs.

Langfuse, with its Docker-friendly local setup, is a fantastic entry point for LLM observability — quick to deploy and purpose-built for AI. OpenTelemetry, meanwhile, shines in broader contexts, with the flexibility to grow alongside your system. By starting with our Langfuse example and understanding its place next to OpenTelemetry, you’re equipped to master LLM monitoring. Spin up Docker, trace your first LLM call, and explore what works for you!

If you found this article insightful and want to explore how these technologies can benefit your specific case, don’t hesitate to seek expert advice. Whether you need consultation or hands-on solutions, taking the right approach can make all the difference. You can support the author by clapping below 👏🏻 Thanks for reading!

Oleh Dubetsky|Linkedin

--

--

Oleh Dubetcky
Oleh Dubetcky

Written by Oleh Dubetcky

I am an management consultant with a unique focus on delivering comprehensive solutions in both human resources (HR) and information technology (IT).

No responses yet