Picture this: You’re tasked with implementing distributed tracing across your microservices. “Easy,” you think, “OpenTelemetry has auto-instrumentation!” Six hours later, you’re staring at empty trace dashboards wondering why your Python FastAPI service refuses to send a single span. This is that story.
The Setup
I started with a local minikube cluster to test OpenTelemetry auto-instrumentation.
- Deploy Jaeger for trace storage/UI
- Deploy OpenTelemetry Collector as a gateway
- Use the OpenTelemetry Operator to auto-instrument Python apps
- Watch traces flow without touching application code
# Start fresh
minikube start --memory=8192 --cpus=4
kubectl create namespace observability
# Install Jaeger
kubectl apply -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.52.0/jaeger-operator.yaml
kubectl apply -f - <<EOF
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: jaeger
namespace: observability
spec:
strategy: AllInOne
EOF
# Install OpenTelemetry Operator
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
Act 1: The Silent Treatment
Created the collector and instrumentation:
# collector.yaml
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: otel-gateway
namespace: observability
spec:
mode: deployment
config: |
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
exporters:
jaeger:
endpoint: jaeger-collector.observability:14250
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
exporters: [jaeger]
---
# instrumentation.yaml
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: python-instrumentation
namespace: default
spec:
exporter:
endpoint: http://otel-gateway-collector.observability:4318
propagators:
- tracecontext
- baggage
python:
env:
- name: OTEL_PYTHON_LOG_CORRELATION
value: "true"
Deployed a test FastAPI app with the magic annotation:
apiVersion: apps/v1
kind: Deployment
metadata:
name: fastapi-app
spec:
template:
metadata:
annotations:
instrumentation.opentelemetry.io/inject-python: "true"
spec:
containers:
- name: app
image: myapp:latest
ports:
- containerPort: 8000
Port-forwarded to Jaeger UI:
kubectl port-forward -n observability svc/jaeger-query 16686:16686
Result? Nothing. Zero traces.
Act 2: The REPL Detective Work
Time to get hands dirty. Exec’d into the pod:
kubectl exec -it deployment/fastapi-app -- bash
Test 1: Is auto-instrumentation even loaded?
$ python
>>> import sys
>>> 'sitecustomize' in sys.modules
True
>>> import sitecustomize
>>> print(sitecustomize.__file__)
/otel-auto-instrumentation-python/sitecustomize.py
Good! The operator injected its magic.
Test 2: Can we reach the collector?
>>> import requests
>>> endpoint = "http://otel-gateway-collector.observability:4318"
>>> r = requests.post(f"{endpoint}/v1/traces",
... data=b"junk",
... headers={"content-type": "application/x-protobuf"})
>>> r.status_code, r.text
(400, 'proto: illegal wireType 6')
Perfect! The collector is reachable and trying to parse our junk.
Test 3: Can we manually send a span?
>>> from opentelemetry import trace
>>> tracer = trace.get_tracer("manual-test")
>>> with tracer.start_as_current_span("test-span"):
... print("Hello from manual span")
...
>>> # Force flush
>>> trace.get_tracer_provider()._active_span_processor.force_flush()
Checked Jaeger… The manual span appeared!
So REPL can send traces, but the FastAPI server can’t? 🤔
Act 3: The Split-Brain Mystery
Let’s check what the actual server process sees:
# Check PID 1 environment
$ tr '\0' '\n' < /proc/1/environ | grep -E '^(PYTHONPATH|OTEL_)'
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-gateway-collector.observability:4318
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
OTEL_SERVICE_NAME=fastapi-app
_PEX_PYTHONPATH=/otel-auto-instrumentation-python
Hold on… _PEX_PYTHONPATH? Not PYTHONPATH?
# Check the actual process
$ ps aux | head -2
USER PID COMMAND
root 1 /usr/bin/python3.11 -sE /app/.bootstrap/pex/pex.py --python /usr/bin/python3.11 /app/service.pex
There’s the smoking gun! The app is packaged as a PEX with -sE flags:
-E: Ignores allPYTHON*environment variables-s: Ignores user site directory
Act 4: Understanding the PEX Problem
PEX (Python EXecutable) creates hermetic Python environments. Here’s what’s happening:
- OpenTelemetry Operator sets
PYTHONPATH=/otel-auto-instrumentation-python - PEX launcher starts Python with
-Eflag, ignoringPYTHONPATH - Python never loads
/otel-auto-instrumentation-python/sitecustomize.py - No auto-instrumentation happens
But why does the REPL work? Because python command bypasses the PEX launcher!
Let’s verify this theory:
# In REPL (works)
>>> import sys
>>> '/otel-auto-instrumentation-python' in sys.path
True
# Check server's sys.path
$ cat > check_path.py << EOF
import sys
import json
with open('/tmp/syspath.json', 'w') as f:
json.dump(sys.path, f)
EOF
$ python /app/service.pex check_path.py
$ cat /tmp/syspath.json | jq
# Result: No /otel-auto-instrumentation-python!
The Solutions
Solution A: Rebuild PEX with Non-Hermetic Scripts (Clean)
# Original PEX build
pex -r requirements.txt -c gunicorn -o service.pex .
# Fixed PEX build
pex -r requirements.txt \
-c gunicorn \
--venv service \
--non-hermetic-venv-scripts \
-o service.pex .
The --non-hermetic-venv-scripts flag creates venv scripts that respect environment variables.
Solution B: Runtime Wrapper (Quick & Dirty)
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
- name: app
command: ["/bin/sh"]
args:
- -c
- |
export PYTHONPATH="/otel-auto-instrumentation-python"
exec python /app/service.pex
Solution C: Why Other Approaches Fail
These don’t work:
PEX_EXTRA_SYS_PATH: Appends tosys.pathafter startup (too late forsitecustomize.py)PEX_INHERIT_PATH: Still blocked by-Eflag- Manual instrumentation: Defeats the whole “zero-code” purpose
Key Takeaways
1. The Quick Diagnostic
When OpenTelemetry auto-instrumentation seems broken:
# 1. Check if instrumentation is loaded in REPL
kubectl exec <pod> -- python -c "import sitecustomize; print('Loaded from:', sitecustomize.__file__)"
# 2. Check PID 1 environment
kubectl exec <pod> -- sh -c 'tr "\0" "\n" < /proc/1/environ | grep -E "^(PYTHONPATH|_PEX_)"'
# 3. Check the actual process command
kubectl exec <pod> -- ps aux | grep python
# 4. Test manual spans
kubectl exec -it <pod> -- python
>>> from opentelemetry import trace
>>> with trace.get_tracer("test").start_as_current_span("test"): pass
>>> trace.get_tracer_provider()._active_span_processor.force_flush()
2. Python Packaging Compatibility
| Packaging Method | Auto-instrumentation Works? | Why? |
|---|---|---|
| Plain Python/pip | ✅ Yes | Respects PYTHONPATH |
| Virtualenv | ✅ Yes | Normal Python startup |
| PEX (default) | ❌ No | Hermetic mode ignores PYTHONPATH |
| PEX (non-hermetic) | ✅ Yes | Respects environment |
| Zipapp | ❌ Usually No | Similar to PEX |
3. The Hidden Assumption
OpenTelemetry’s Python auto-instrumentation relies on a clever but fragile trick:
- Set
PYTHONPATHto include instrumentation - Python loads
sitecustomize.pyat startup - This hooks and instruments your code
Any packaging that breaks Python’s normal startup breaks this mechanism.