Overview:
SOFTSWISS continues to expand the team and is looking for a Monitoring System Engineer. We need a true, experienced, and accomplished professional who shares our culture and values.
Key responsibilities:
The two main pillars of our workflow are:
Responding to Events/Monitoring Alerts (L1/L2 tasks for certain system parts):
- Offering on-duty service coverage, encompassing day and night on-call shifts.
- Provide timely and effective solutions to technical problems reported by users.
- Communicate clearly with users to understand their issues and provide updates on resolution status.
- Addressing incidents by troubleshooting and resolving issues, even seeking assistance from third-party or vendor support when necessary.
- Directing issues or queries to the relevant department as needed.
- Keeping detailed records and documentation of current infrastructure challenges and Root Cause Analyses (RCAs).
- Creating detailed reports for all technical support incidents, including descriptions, resolutions, and timelines.
Maintaining and Enhancing the Monitoring Systems:
- Collaborating with other teams to understand and define their monitoring needs, then implementing the right solutions.
- Setting up and adjusting the monitoring/observability systems for various teams.
- Designing and tweaking alerts and dashboards to suit specific needs.
- Refining alerts to reduce irrelevant notifications and increase their significance.
- Enhancing dashboards for better clarity, understanding, and a more comprehensive view.
- Building and sustaining connections between the monitoring systems and other platforms like Jira, Opsgenie, etc. when required.
- Establishing and updating a Knowledge Base, covering system configurations, alert processes, troubleshooting guidelines, and user manuals.
- Staying updated with the newest trends and best practices to continuously uplift our organization’s monitoring capabilities.
Required Experience:
- Minimum of 3 years’ experience as a Systems Engineer, SRE, DevOps, or Monitoring Support Engineer.
- Good understanding of Linux-like operating systems (Debian-based).
- Experience with containerization, virtualization, and orchestration (LXC/LXD, Docker, Kubernetes).
- Development experience in any scripting language (Bash, Python, Go, etc) and familiarity with REST API.
- Knowledge of basic database concepts (experience with PostgreSQL is preferable), including transactions and WAL.
- English proficiency at an Intermediate (B1) level or higher. It’s crucial to understand technical terminology related to our specific tech stack and to be able to interpret technical documentation.
Skills & Experience:
Monitoring/observability tools (experience with at least two of the following)
- Zabbix (familiarity with concepts such as LLD, prototypes, dependencies, and preprocessing)
- Grafana (knowledge of data sources, dashboard creation, and query usage)
- Prometheus/VictoriaMetrics/etc. (understanding of metrics collection and alerting)
- ELK/Splunk/etc. (ability to use queries and filters for log analysis)
- Site24x7/Pingdom/etc. (experience with web monitoring and performance metrics)
Linux-like operating systems
Strong understanding of key concepts, including:
- File systems
- Process management
- Built-in monitoring tools
- Networks
- Scripting
- Troubleshooting
Familiarity with
- Kafka
- RabbitMQ
- GitLab
- Nginx/Puma
- Clickhouse
- PostgreSQL
- MongoDB
- Hashicorp Vault
- Microservices and orchestration (Kubernetes)
- Any IaC / infrastructure automation
– Provisioning tools (Terraform)
– Configuration management (Ansible, Salt, Puppet)