Overview:
SOFTSWISS continues to expand the team and is looking for a Monitoring System Engineer.
If you’re passionate about delivering top-notch service and consider yourself a proactive, positive thinker, we’d love to hear from you! We’re eager for you to contribute to our team’s success. If you’re looking for a challenging and rewarding career opportunity, this could be the perfect fit.
Key responsibilities:
The two main pillars of our workflow are:
Responding to Events/Monitoring Alerts (L1/L2 tasks for certain system parts):
- Offering on-duty service coverage, encompassing day and night shifts.
- Addressing incidents by troubleshooting and resolving issues, even seeking assistance from third-party or vendor support when necessary.
- Directing issues or queries to the relevant department as needed.
- Keeping detailed records and documentation of current infrastructure challenges and Root Cause Analyses (RCAs).
- Contribute to safe and effective internal practices for AI usage in monitoring and incident response workflows.
Maintaining and Enhancing the Monitoring Systems:
- Collaborating with other teams to understand and define their monitoring needs, then implementing the right solutions.
- Setting up and adjusting the monitoring/observability systems for various teams.
- Designing and tweaking alerts and dashboards to suit specific needs.
- Refining alerts to reduce irrelevant notifications and increase their significance.
- Enhancing dashboards for better clarity, understanding, and a more comprehensive view.
- Building and sustaining connections between the monitoring systems and other platforms like Jira, Opsgenie, etc. when required.
- Establishing and updating a Knowledge Base, covering system configurations, alert processes, troubleshooting guidelines, and user manuals.
- Staying updated with the newest trends and best practices to continuously uplift our organization’s monitoring capabilities.
- Identify opportunities to automate repetitive monitoring and support tasks, including with AI-assisted approaches where suitable.
Required Experience:
- Minimum of 3 years experience as a Systems Engineer, SRE, DevOps, or Monitoring Support Engineer (L2+).
- Good understanding of Linux-like operating systems (Debian-based).
- Experience with containerization, virtualization, and orchestration (LXC/LXD, Docker, Kubernetes).
- Development experience in any scripting language (Bash, Python, Go, etc) and familiarity with REST API.
- Knowledge of basic database concepts (experience with PostgreSQL is preferable), including transactions and WAL.
- English proficiency at an Intermediate (B1) level or higher. It’s crucial to understand technical terminology related to our specific tech stack and to be able to interpret technical documentation.
- Practical interest in using AI-assisted tools for troubleshooting, automation, documentation, and operational efficiency:
– Ability to critically evaluate AI-generated output and validate it before using it in production environments.
– Understanding of the risks and limitations of AI usage in infrastructure and production operations.
Skills & Experience
Monitoring/observability tools (experience with at least two of the following)
- Zabbix (familiarity with concepts such as LLD, prototypes, dependencies, and preprocessing)
- Grafana (knowledge of data sources, dashboard creation, and query usage)
- Prometheus/VictoriaMetrics/etc. (understanding of metrics collection and alerting)
- ELK/Splunk/etc. (ability to use queries and filters for log analysis)
- Site24x7/Pingdom/etc. (experience with web monitoring and performance metrics)
Linux-like operating systems
- Strong understanding of key concepts, including:
- File systems
- Process management
- Built-in monitoring tools
- Networks
- Scripting
- Troubleshooting
Familiarity with
- Kafka
- RabbitMQ
- GitLab
- Nginx/Puma
- Clickhouse
- PostgreSQL
- MongoDB
- Hashicorp Vault
- Microservices and orchestration (Kubernetes)
- Any IaC / infrastructure automation: Provisioning tools (Terraform); Configuration management (Ansible, Salt, Puppet)
Main Advantages
- Private insurance (depending on contract type)
- Paid gym membership
- Comprehensive Mental Health Program
- Free English lessons (online)
- Local language courses
- Paid time off (PTO)
- Maternity leave support
- Referral program rewards
- Upskilling, internal workshops, and participation in professional conferences and corporate events