Hongzhi Wang
Development
Beijing, China
Skills
DevOps
About
Hongzhi Wang More Than's skills align with Consultants and Specialists (Information and Communication Technology). Hongzhi also has skills associated with System Developers and Analysts (Information and Communication Technology). Hongzhi Wang More Than has 12 years of work experience.
Work Experience
SRE
May 2022 - December 2023
- I belong to a department within the company. Our department consists of several dozen people, and I'm the SRE team leader responsible for reliability. We manage several hundred servers, with a daily PV (page views) of less than 1 billion. Below is what I have done: Optimized alerts, reducing them from dozens per day to single digits, improving the signal-tonoise ratio of the alerting system. Migrated self-hosted services from two regions to Alibaba Cloud managed services, including Kubernetes, Ceph, HBase, etc., reducing infrastructure management costs and improving reliability. Decommissioned Argocd and custom monitoring components, consolidating the SRE team technology stack. Simplified the directory structure of Terraform and Kubernetes configuration files in the source code repository, improving readability and maintainability. Continuously optimized costs and distributed them to respective business teams.
SRE
June 2021 - February 2022
- The infrastructure is hosted on AWS, and AWS resources are managed using Terraform Enterprise. The workflow aligns with GITOPS practices. In this role, I am responsible for the reliability of a small business unit and the overall cost optimization across the company.
Spark Thinking
March 2021 - June 2021
- SRE The technical team of Spark Thinking consists of over 1000 people with 10 SREs. My main focus has been on redesigning the entire container cluster and simplifying the container architecture. Cancel resource pools based on business and change to one big pool for all applications Cluster hosting mode changed from Independent Master Deployment Mode to Managed Master Deployment Mode. Our core competence is to use Kubernetes well rather than to manage Kubernetes well 32c128g node specification is adopted, which eliminates the coexistence of various specifications
Moji Weather
February 2017 - March 2021
- SRE Moji Weather is a company with an average daily PV of several billions. We manage about 2000 servers. When I first came to this company, most of the applications here were still deployed manually by logging on each server. Through the efforts of me and another teammate, the situation has improved a lot. I am responsible for: Work with development team to set up application compilation, packaging, deployment, and runtime specifications, and on this basis, design and implement application deployment tools based on the concept of GITOPS Build a monitoring system through Zabbix, Prometheus and other tools Write and maintain all kinds of documents for online application operation and maintenance Migrate application to k8s ELK system deployment and maintenance, virtual machine and container deployment automation process and tool design and implementation I made training about DEVOPS and service mesh in our company. Project Experience Migrating services from Zenlayer to Alibaba Cloud 2023.6-2023.8 Project Description: The services we built on the Zenlayer data center include k8s, Ceph, Mongo, and a self-built virtual machine platform. We encountered issues such as high management costs and difficulties in issue troubleshooting. We made the decision to migrate the cluster to Alibaba Cloud. I am the project owner, as well as the specific deployment of networking, container cluster, and monitoring. Through the migration, we were able to reduce management costs and improve service stability. Migrate application to k8s 2019.12-2021.3 Project Description: Our company has several k8s clusters. The number of nodes is less than 50. Another my teammate and I are responsible for the planning, deployment, component selection, etc. The daily traffic of the cluster through ingress is more than 800 million, the highest QPS is more than 30,000. I designed the entire deployment process of the application from source code to docker image and to online using tools such as Gitlab, Jenkins, Helm and so on. By using k8s, we have greatly improved the self-healing ability and deployment efficiency of online applications. I also shared our container experience in the DockerOne community (http://dockone.io/article/10643).
System Engineer
cisco
April 2016 - February 2017
- Responsible for on-site maintenance of CA equipment and systems at a department of National Radio and Television Administration.
System Engineer
June 2014 - February 2016
- I was assigned to J&J project and onsite at customer office. I belonged to the Asia-Pacific data center team and I was in charge of the infrastructure management of data center. With this role, technical work is limited, mostly is about coordination and communication. The primary gain to me is in English and the ability to deal with many kinds of affairs.
System Engineer
October 2013 - February 2014
- Responsible for the operation and maintenance of AIX system in a government department.
System Engineer
February 2013 - October 2013
- Responsible for the operation and maintenance of the AIX system of the Bank of China Credit Card Center.
System Engineer
July 2011 - February 2013
- First half year my work was about IBM's storage and small computer hardware maintenance. At the end of 2012 Feb, I went to CMBC project in-residence. I was assigned to the system team and in charge of the test environment maintenance.
Education