Microservices architecture has revolutionized how we build scalable applications, but it’s also introduced new challenges in monitoring and debugging. When your application is split across dozens or hundreds of services, understanding what’s happening becomes exponentially more complex. This is where centralized logging with LogFlux becomes invaluable.
The Microservices Monitoring Challenge
In a microservices architecture, a single user request might traverse through multiple services:
- API Gateway receives the request
- Authentication service validates the user
- Order service processes the order
- Payment service handles the transaction
- Notification service sends confirmations
- Audit service logs the activity
When something goes wrong, you need to trace the request across all these services quickly. Without proper logging, this becomes a nightmare.
Correlation IDs: Your Best Friend
The foundation of microservices monitoring is the correlation ID – a unique identifier that follows a request through all services:
// API Gateway
app.use((req, res, next) => {
req.correlationId = req.headers['x-correlation-id'] || uuid.v4();
req.logger = logger.child({
correlationId: req.correlationId,
service: 'api-gateway'
});
// Pass to downstream services
res.setHeader('x-correlation-id', req.correlationId);
next();
});
// Downstream service
const axios = require('axios');
async function callPaymentService(data, correlationId) {
logger.info('Calling payment service', {
correlationId: correlationId,
service: 'order-service',
action: 'payment-request'
});
const response = await axios.post('http://payment-service/process', data, {
headers: {
'x-correlation-id': correlationId
}
});
logger.info('Payment service response', {
correlationId: correlationId,
service: 'order-service',
status: response.status
});
return response.data;
}
Service-Level Logging Strategy
1. Entry and Exit Logging
Log every service entry and exit point:
func ProcessOrderHandler(w http.ResponseWriter, r *http.Request) {
correlationID := r.Header.Get("X-Correlation-Id")
// Entry log
logger.Info("Order processing started", map[string]interface{}{
"correlationId": correlationID,
"method": r.Method,
"path": r.URL.Path,
"service": "order-service",
})
defer func(start time.Time) {
// Exit log with duration
logger.Info("Order processing completed", map[string]interface{}{
"correlationId": correlationID,
"duration": time.Since(start).Milliseconds(),
"service": "order-service",
})
}(time.Now())
// Process order...
}
2. Error Boundary Logging
Implement comprehensive error logging at service boundaries:
from functools import wraps
import traceback
def log_service_errors(service_name):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
try:
return func(*args, **kwargs)
except Exception as e:
correlation_id = kwargs.get('correlation_id', 'unknown')
logger.error(f"Service error in {service_name}", {
'correlationId': correlation_id,
'service': service_name,
'error': str(e),
'stackTrace': traceback.format_exc(),
'function': func.__name__
})
raise
return wrapper
return decorator
@log_service_errors('inventory-service')
def check_inventory(product_id, quantity, correlation_id=None):
# Business logic here
pass
3. Performance Metrics
Track service performance across your architecture:
class PerformanceLogger {
constructor(logger, serviceName) {
this.logger = logger;
this.serviceName = serviceName;
}
async trackOperation(operationName, operation, correlationId) {
const start = Date.now();
const memBefore = process.memoryUsage().heapUsed;
try {
const result = await operation();
this.logger.info('Operation completed', {
correlationId: correlationId,
service: this.serviceName,
operation: operationName,
duration: Date.now() - start,
memoryDelta: process.memoryUsage().heapUsed - memBefore,
status: 'success'
});
return result;
} catch (error) {
this.logger.error('Operation failed', {
correlationId: correlationId,
service: this.serviceName,
operation: operationName,
duration: Date.now() - start,
error: error.message,
status: 'failure'
});
throw error;
}
}
}
// Usage
const perfLogger = new PerformanceLogger(logger, 'user-service');
await perfLogger.trackOperation(
'fetchUserProfile',
() => getUserProfile(userId),
correlationId
);
Distributed Tracing with LogFlux
LogFlux makes it easy to trace requests across services:
Setting Up Service Context
// Configure LogFlux with service metadata
logger := logflux.New(logflux.Config{
APIKey: os.Getenv("LOGFLUX_API_KEY"),
DefaultFields: map[string]interface{}{
"service": "payment-service",
"version": os.Getenv("SERVICE_VERSION"),
"environment": os.Getenv("ENVIRONMENT"),
"pod": os.Getenv("HOSTNAME"),
},
})
Implementing Request Tracing
import time
from logflux import Logflux
class RequestTracer:
def __init__(self, service_name):
self.logger = Logflux(
api_key=os.environ['LOGFLUX_API_KEY'],
service=service_name
)
def trace_request(self, correlation_id, operation):
"""Trace a request through the service"""
trace = {
'correlationId': correlation_id,
'service': self.service_name,
'timestamp': time.time(),
'spans': []
}
return RequestContext(self.logger, trace)
class RequestContext:
def __init__(self, logger, trace):
self.logger = logger
self.trace = trace
def start_span(self, name):
span = {
'name': name,
'start': time.time()
}
self.trace['spans'].append(span)
return span
def end_span(self, span, status='success', metadata=None):
span['end'] = time.time()
span['duration'] = span['end'] - span['start']
span['status'] = status
if metadata:
span['metadata'] = metadata
self.logger.info(f"Span completed: {span['name']}", {
'correlationId': self.trace['correlationId'],
'span': span
})
Health Check Monitoring
Monitor service health across your microservices:
// Health check endpoint with logging
app.get('/health', async (req, res) => {
const healthChecks = {
database: await checkDatabase(),
redis: await checkRedis(),
downstream: await checkDownstreamServices()
};
const overallHealth = Object.values(healthChecks).every(h => h.status === 'healthy');
logger.info('Health check performed', {
service: 'order-service',
health: overallHealth ? 'healthy' : 'unhealthy',
checks: healthChecks,
timestamp: new Date().toISOString()
});
res.status(overallHealth ? 200 : 503).json({
status: overallHealth ? 'healthy' : 'unhealthy',
checks: healthChecks
});
});
Circuit Breaker Logging
Log circuit breaker state changes for better visibility:
type CircuitBreaker struct {
logger *logflux.Logger
service string
state string
failures int
threshold int
}
func (cb *CircuitBreaker) Call(operation func() error, correlationID string) error {
if cb.state == "open" {
cb.logger.Warn("Circuit breaker is open", map[string]interface{}{
"correlationId": correlationID,
"service": cb.service,
"state": cb.state,
})
return ErrCircuitOpen
}
err := operation()
if err != nil {
cb.failures++
if cb.failures >= cb.threshold {
cb.state = "open"
cb.logger.Error("Circuit breaker opened", map[string]interface{}{
"correlationId": correlationID,
"service": cb.service,
"failures": cb.failures,
"threshold": cb.threshold,
})
}
return err
}
if cb.failures > 0 {
cb.failures = 0
cb.logger.Info("Circuit breaker reset", map[string]interface{}{
"correlationId": correlationID,
"service": cb.service,
})
}
return nil
}
Querying Across Services
Use LogFlux Inspector to trace requests across all services:
# Find all logs for a specific correlation ID
logflux query \
--filter "correlationId:abc-123-def" \
--sort timestamp
# Find slow requests across all services
logflux query \
--filter "duration>1000" \
--group-by service \
--stats
# Track error rate by service
logflux stats \
--filter "level:error" \
--group-by service,hour \
--interval 24h
# Find requests that touched specific services
logflux query \
--filter "correlationId IN (SELECT correlationId FROM logs WHERE service='payment-service' AND status='failed')" \
--sort timestamp
Dashboard for Microservices
Create a comprehensive dashboard in LogFlux:
# LogFlux Dashboard Configuration
name: Microservices Overview
widgets:
- type: line_chart
title: Request Rate by Service
query: |
SELECT service, COUNT(*) as requests
FROM logs
WHERE timestamp > NOW() - INTERVAL '1 hour'
GROUP BY service, time_bucket('1 minute', timestamp)
- type: heatmap
title: Service Latency Heatmap
query: |
SELECT service, percentile_disc(0.5) as p50,
percentile_disc(0.95) as p95,
percentile_disc(0.99) as p99
FROM logs
WHERE duration IS NOT NULL
GROUP BY service
- type: table
title: Recent Errors
query: |
SELECT timestamp, service, correlationId, error
FROM logs
WHERE level = 'error'
ORDER BY timestamp DESC
LIMIT 20
- type: graph
title: Service Dependencies
query: |
SELECT source_service, target_service, COUNT(*) as calls
FROM service_calls
GROUP BY source_service, target_service
Best Practices Summary
- Always use correlation IDs - They’re essential for tracing distributed requests
- Log at service boundaries - Entry, exit, and error points
- Include service metadata - Service name, version, instance ID
- Monitor health endpoints - Regular health checks with logging
- Track performance metrics - Duration, memory, CPU usage
- Implement circuit breakers - With proper logging of state changes
- Use structured logging - Makes querying and analysis much easier
- Set up alerting - For error rates, latency spikes, and service failures
Conclusion
Monitoring microservices doesn’t have to be overwhelming. With proper logging strategies and LogFlux’s powerful querying capabilities, you can maintain complete visibility into your distributed system. The key is consistency – ensure all services follow the same logging patterns and always include correlation IDs.
Start implementing these patterns in your microservices today, and transform your monitoring from reactive firefighting to proactive system management. Your future self (and your on-call team) will thank you!