Everyone is welcome, everyone can contribute, everyone is unique and these are your strengths too!

25. #everyonecancontribute cafe: Observability with Opstrace


Sebastien Pahl introduced Opstrace, its principles and ideas with following on a live demo, and an open discussion at the end. We extended the 60 minutes to 95 minutes :-)

Open Source observability is moving fast, it is hard to catch up. We want to make things easy to deploy and use.

Enjoy the session! 🦊 - Twitter thread

After the session, we discovered the same ideas on Twitter:

Unpopular opinion: The biggest problem in monitoring/observability is the tool and feature fatigue.

Insights

  • Quickstart installation in AWS.
  • Opstrace deploys Loki, Cortex, Prometheus, Ingress Controller, APIs, UI, Grafana in the Kubernetes cluster in AWS.
  • Authentication with Auth0, future brings Dex to provide SAML, etc. for SSO.
  • Grafana comes with default dashboards.
  • You can send data to Opstrace from a local demo environment with docker-compose.
    • Metrics generated by Avalanche, scraped with Prometheus. Log messages scraped with Fluentd. Grafana combines Loki (logs) and Prometheus (metrics) as data sources.
  • Easy to use Prometheus Alert Manager, configuration using an API for automated rules creation, or a UI. The Cortex functionality is proxied by Opstrace with an authentication token and API interface.
    • Roadmap ideas: SLOs and error budgets - generate rules and provide templates out of the box.
  • Monitoring Cloud Vendor Metrics, no Prometheus provisioning. Instead, send configuration over the API and a new cloudwatch_exporter container is deployed to the Opstrace tenant.
  • Open discussion with ideas and questions:
    • High Availability - out of the box, Cortex comes with 3 nodes by default, and cloud/Kubernetes takes care of failover.
  • Which problems are not yet solved with monitoring/observability?
    • Now focus on onboarding, easy to get started with Open Source, similar experience like Datadog.
    • Improve usability of Grafana, should be much more collaborative as a UI. Make it a debug session, and instead of using Google docs / Notion, add text, graphs, etc. and have these documents live in there, even after a year.
    • How to answer any question - links between logs, metrics, traces. Exemplars for linking metrics and traces, released in Prometheus 2.26. More on this Grafana blog post about Tempo and our 6. Cafe with Tempo when it was announced in October 2020.
  • Integrating Opstrace, e.g. a graph into Merge Requests from a staging deployment.
  • Thought of integrating Vector for logs?
  • What was the intention to create Opstrace?
    • Ask infrastructure questions, and needed to collect data. We love Prometheus, but there is still so much to build.
    • Datadog and it runs in your SaaS, first idea was more closed.
    • Continued to iterate, we are standing on the should of giants - make it an open source project. It is harder.
    • Don’t re-implement everything, work together.
  • Reporting dashboards & customization - make it easy to use.
  • Incident management integrated with GitLab and alike.
  • As a developer, I don’t care about the configuration or the service being run in Kubernetes. I want to see metrics from a staging deployment, and focus on the fun stuff.
  • Security comes out of the box - communication between monitoring nodes. GDPR for logs, and compliance levels. What data is stored in the backend
  • We’ll revisit Opstrace in the future and see how things are going. And of course try it ourselves, maybe in a future #everyonecancontribute cafe.

Get Started

Get social with Opstrace:


Date published: April 14, 2021

Tags: Gitlab, Opstrace, Cloud, Monitoring, Ansible, Kubernetes, Security, Kiosk, Multi tenancy