Periskop: Exception Monitoring at Scale
A pull-based exception monitoring service inspired by Prometheus
- Track: Monitoring and Observability devroom
- Room: D.monitoring
- Day: Sunday
- Start: 13:40
- End: 14:20
- Video with Q&A: D.monitoring
- Video only: D.monitoring
- Chat: Join the conversation!
This talk is aimed for engineers operating in distributed environments (or microservices) interested in monitoring exceptions at scale. We introduce the open source project "Periskop", a pull-based exception monitoring service built at SoundCloud and inspired by Prometheus.
- What problems did we encounter with the traditional push-based model for exception monitoring.
- Thundering herd issues with bad deployments
- Difficulty navigating large volumes of logs for identifying exceptions
- An alternative pull-based model that scales well with the number of exceptions and instances.
- Aggregation + sampling for concrete occurrences
- Limitations and trade-offs (short lived processes and fork-based application servers)
- An implementation of such model into the open source project "Periskop"
- Initial Development
- Server and client-libraries
- Newly added features and roadmap (push-gateway, federation, time series visualization, integrations)
Speakers
Jorge Creixell | |
Marc Tuduri |