Lisa Liu
Development
CA, United States
Skills
Cloud Computing
About
Lisa Liu's skills align with Programmers (Information and Communication Technology). Lisa also has skills associated with System Developers and Analysts (Information and Communication Technology). Lisa Liu has 17 years of work experience, with 2 years of management experience, including a high-level position.
View more
Work Experience
Platform Engineer
Royal Bank of Canada
January 2024 - Present
- Project: Set up Airflow and JupyterHub on Openshift Responsibilities: Evaluated the existing services running on AKS/GCP/EKS/OpenShift and identified their dependencies, configurations, and resource requirements. Deployed and configured Apache Airflow and JupyterHub on OpenShift, ensuring high availability and scalability of data workflows by CICD pipelines and analytical environments. Collaborated with cross-functional teams to define requirements and design solutions tailored to meet specific business needs. Implemented best practices for containerization, orchestration, and version control, optimizing the development and deployment processes. Provided technical guidance and support to team members, fostering a culture of continuous learning and innovation. Monitored system performance and conducted troubleshooting to resolve issues and minimize downtime, ensuring smooth operation of critical services. Documented configurations, procedures, and troubleshooting steps, facilitating knowledge sharing and onboarding of new team members.
Cloud Engineer
FlashFood Inc
January 2022 - December 2023
- Project: Service deployment on Kubernetes of Azure, GCP, OpenShift Design and implement the services, dependencies and data migration from to AKS/GCP, OpenShift. Responsibilities: Evaluated the existing services running on AKS/GCP/OpenShift and identified their dependencies, configurations, and resource requirements. Determined the desired architecture and deployment model in Azure AKS. Assessed the compatibility of the applications and associated components with AKS. Containerized the applications in containers. This involves packaging the services into Docker containers, ensuring they are properly configured and have all dependencies included. Setup AKS Environment, provision an AKS cluster in Azure and configure it to match the requirements. This includes setting up networking, security, and storage components. Developed the deployment manifests and Helm charts used in OpenShift to deploy the applications in AKS. Ensured that any platform specific configurations or dependencies are updated to work with AKS. Migrated data of any persistent data associated with the services. This involves migrating databases, file systems, or other storage solutions to Azure compatible services such as Azure Database for PostgreSQL or Azure Files. Conducted thorough testing to verify that the services are functioning correctly in AKS. This included functional testing, performance testing, and validating any integration points or dependencies. Designed and planned a cutover strategy to minimize downtime during the migration. This involved gradually migrating services or performing a cut and switch approach. Monitored the cutover process closely and validated that all services are operational in AKS. Optimized the AKS environment for performance, scalability, and cost efficiency. This included fine-tuning resource allocation, implementing scaling strategies, and utilizing AKS specific features like Azure Monitor and Azure Container Insights. Ensured the proper backup and rollback plans were in place to mitigate any unexpected issues. Considered any security and compliance requirements specific to Azure AKS, such as configuring network policies, implementing Azure Active Directory integration, and enabling encryption at rest and in transit. Project: Vault Distributed Key Management System. HashiCorp Vault is a tool designed to manage secrets and protect sensitive data, including the management of cryptographic keys. Responsibilities: Installed and configured HashiCorp Vault on Kubernetes Cluster of GCP/AWS/Azure, including setting up Vault servers, storage backends, and other necessary configurations. Implemented high availability configurations to ensure Vault remains available and resilient to failures. This may involve setting up a cluster of Vault servers. Configured the storage backend for persistent data storage. Options include Consul, etcd, Amazon DynamoDB, or others. Implemented authentication methods, such as LDAP, GitHub, or role based authentication, to control access to Vault. Defined and implemented authorization policies to govern access to different secrets and functionalities within Vault. Configured secrets engines to manage various types of secrets, such as encryption keys, API tokens, or database credentials. Established key rotation policies to ensure the regular rotation of encryption keys and other sensitive credentials. Set up monitoring tools and logging mechanisms to track Vault's performance, detect anomalies, and maintain an audit trail of activities. Implemented backup and disaster recovery procedures to ensure data integrity and availability in case of unexpected incidents. Integrated Vault with other systems and applications that require access to secrets or encryption keys. Documented the entire setup process, configurations, and any operational procedures. This documentation is crucial for maintenance, troubleshooting, and future expansion. Provided training for relevant team members and stakeholders on how to use and interact with Vault. Ensure that team members are aware of best practices and security considerations. Ensured that the Vault setup complies with relevant security standards and regulatory requirements. Implement auditing features to facilitate compliance checks. Regularly reviewed and updated configurations, perform software updates, and address any security vulnerabilities or issues. Project: Service and Data Security Enhancement pipeline (CICD) Setting up the CICD pipelines by integrating security scanning tools (Aqua, etc.) to implement measures and practices to enhance the security of services and data in Azure Kubernetes, OpenShift within an organization. Responsibilities: Containerized applications on OpenShift/Kubernetes environment saving huge on cloud cost. Deployed Dremio cluster on OpenShift/Kubernetes environment Automated data ingestion pipelines using Jenkins/Python/Bash/Azure Data Factory/Dremio Conducted a comprehensive security assessment to identify vulnerabilities, risks, and weaknesses in the existing services and data infrastructure. Performed penetration testing, vulnerability scanning, and code reviews to uncover security flaws. Identified and evaluated potential threats and risks specific to the services and data by Aqua, Snky, etc. Analyzed the identified risks and prioritized them based on their potential impact and likelihood. Developed a risk management strategy that includes mitigation plans for each identified risk. Implemented controls and safeguards to minimize the impact of potential security incidents. Defined and established security policies, standards, and best practices that align with industry regulations and compliance requirements. Documented security policies related to access control, authentication, encryption, data privacy, and incident response. Regularly reviewed and updated these policies to address emerging threats and evolving security standards. Implemented secure coding practices and conducted regular security training for developers to raise awareness of secure coding principles. Conducted code reviews to identify and address potential security vulnerabilities. Integrated security testing, such as static code analysis and dynamic application security testing, into the software development lifecycle.
DevOps Engineer
ThinkData Works Inc
January 2020 - December 2021
- Project: Cert-manager Cert Management System Cert-Manager is an Open source certificate management system specifically designed to streamline and automate the management of TLS certificates within Kubernetes environments. Responsibilities: Installed and configured Cert-manager within the Kubernetes cluster on GCP and verified compatibility with the Kubernetes version in use. Set up and configure issuers or certificate authorities (CAs) based on the organization's policies and requirements. Choose between different issuer types, such as Let's Encrypt, self-signed, or custom CAs. Ensured that DNS records for the domains are properly configured to validate certificate requests (especially for DNS-01 challenge type). Verified that DNS providers are supported by Cert-manager. Configured storage backends for Cert-manager to store and manage certificates securely. Chosen an appropriate storage solution based on the organization's needs. Defined and implemented certificate issuance policies, including parameters such as key types, key sizes, and expiration periods. Set up policies for certificate revocation if needed. Implemented monitoring for certificate expiration and renewals. Configured alerts to notify administrators of upcoming certificate expirations. Integrated Cert-manager with the chosen Ingress controller to automatically provision and manage certificates for services. Configured Ingress resources with annotations for Cert-manager integration. Set up automatic certificate renewal processes. Configured renewal hooks or scripts if additional actions are required during the renewal process. Established backup procedures for Cert-manager configuration and stored certificates. Documented the process for restoring Cert-manager in case of failures or disasters. Implemented RBAC policies to control access to Cert-manager resources. Defined roles and role bindings for users or service accounts requiring access. Created comprehensive documentation for the Cert-manager setup, including installation steps, configuration details, and best practices. Documented troubleshooting steps and common issues. Regularly reviewed the security configurations of Cert-manager. Stay informed about updates and security patches and apply them promptly. Provided training for administrators and other relevant personnel on using and maintaining Cert-Manager. Promoted awareness of security best practices and policies related to certificate management. Ensured that Cert-manager configurations align with industry regulations and organizational governance policies. Periodically reviewed and updated configurations to address any changes in compliance requirements. Project: Upgrade Helm2 to Helm3 to improve the security of Cloud platform Because Helm 3 introduces improved security features, including the removal of Tiller, the server-side component in Helm 2 that had cluster-wide access. Helm 3 utilizes Role-Based Access Control (RBAC) to enhance security, so we need to upgrade helm2 to helm3 to improve the security of the Kubernetes Cluster of GCP. Responsibilities: Analyzed potential risks associated with the upgrade process, considering factors like data loss, application downtime, and system instability. Performed a comprehensive backup of all data and applications to mitigate the risk of data loss during the upgrade. Developed a failover plan to ensure continuity in case of unexpected issues during the upgrade. This involves routing traffic or operations to alternative systems or environments. Executed the upgrade plan, following the defined steps and procedures for migrating from the current system to the upgraded version. Conducted health checks post-upgrade to verify the stability and functionality of the upgraded environment. This includes validating data integrity, application performance, and system reliability.
Senior DevOps Engineer
NetBrain Canada Inc
January 2018 - December 2020
- Project: Multiple Operating System DevSecOps + Cloud Platform Net Brain is an application that offers network visibility and automation services, while utilizing several technologies for data collection, storage, and analysis, including Elasticsearch/MongoDB/RabbitMQ/Redis etc. Net brain multiple operating system DevSecOps + cloud platform is a cloud-based infrastructure that supports continuous integration and continuous deployment processes with security enhancement across different operating systems. Such a platform provides scalable, reliable, and secure services for developers and testers to automate the build, test, and deployment of applications. Responsibilities: Designed and implemented a CI (Continuous Integration) pipeline built on GitHub and Jenkins involves creating an automated workflow that integrates code changes, performs tests, and delivers reliable builds. Improve the DevSecOps pipelines by integrating security scanning. Designed and implemented the Continuous Deployment (CD) platform for Windows and Linux services built on Kubernetes, Docker, Flannel and etcd involves creating an automated workflow to deploy applications efficiently. Designed and implemented the DevOps tools including rolling update, rollback, A/B testing, grey release, blue green deployment, red, black deployment, etc. Designed and implemented the log system built on Elasticsearch, Logstash, and Kibana (commonly referred to as the ELK stack) involves utilizing these technologies for log management and analysis. Designed and implemented a monitor system, alert system, and disaster recovery system involves creating robust mechanisms to ensure the availability, performance, and resilience of a system. Building a Kubernetes operator system involves developing a custom controller that manages the lifecycle of Kubernetes applications. The operator system automates the packaging, deployment, and management of these applications. Project: Application Performance Monitor Cloud Platform Net Brain APM system is an application performance monitoring system built on Elasticsearch/ELK. It allows you to monitor software services and applications in realtime collect detailed performance information on response time for incoming requests, database queries, calls to caches, external HTTP requests, and more. This makes it easy to pinpoint and fix performance problems quickly. This system also automatically collects unhandled errors and exceptions. Errors are grouped built primarily on the stack trace, so you can identify new errors as they appear and keep an eye on how many times specific errors happen. Metrics are another important source of information when debugging production systems. Net brain Elastic APM agents automatically pick up basic host level metrics and agent specific metrics, like JVM metrics in the Java Agent, and Go runtime metrics in the Go Agent. Responsibilities: Designed and implemented IaaS/PaaS/SaaS on Azure, AWS and GCP. Designed and implemented the APM cloud platform built on Kubernetes/Docker/Helm/Chart/Terraform. Designed and implemented snapshot and restore functions built on Cronjob. Set up multi-region disaster recovery (DR) environments. Index lifecycle management. Implemented Security information and event management system (SIEM). Designed and implemented devops tools: rolling update, rollback, scaling out, data migration, etc. Auto-scaling in a hybrid cloud environment (AWS, Azure, GCP, OpenShift) to dynamically adjust the allocation of computing resources based on the workload demand. Upgraded Helm2 to Helm3 in the production environment without downtime. Set up monitoring system and logging system: Datadog, Log DNA, Elastic- Search, Jaeger, etc. Designed and implemented disaster recovery environment. Designed and implemented cert management system: SSL certificates using cert-manager and Let's Encrypt. Designed and implemented secret management system: HashiCorp Vault. Designed and implemented CICD workflows: GitHub Actions and Argo CD. Implemented cross cloud deployment: Anthos Discovery. Maintained Vertical, HDFS, Presto, Cloud SQL, Big Query, Snowflake etc. Implemented big data ingestion, integration, and data automation tools and process. Automated ETL process for cloud data warehouses: Big Query, Snowflake, Vertica. Designed and implemented next generation CICD pipeline based on Circle CI, Argo CD, Cross plane. Lead SRE team and played a crucial role in bridging the gap between development and operations, applying a software engineering mindset to system administration topics. Project: DevOps CI/CD Cloud Platform Built on Jenkins, GitLab, Docker, Kubernetes, Spinnaker, and various AWS services (EC2, ECR, S3, Load Balancer, RDS), the DevOps platform offers a comprehensive set of tools and technologies to support the development, deployment, and management of applications. This DevOps platform plays a crucial role in enabling Continuous Integration (CI) and Continuous Deployment (CD) pipelines, which empower developers and testers to verify their code changes in isolated environments and deliver production ready releases efficiently. DevOps tools provide various deployment strategies and capability ties that streamline the release process and enhance productivity. (Rolling Update, Rollback, Canary Deployment, Blue-Green Deployment, etc.) Responsibilities: Designed and implemented a CI (Continuous Integration) pipeline that integrates with GitHub in a short timeframe of 2 weeks. Designed and implemented the CD production and implemented the DevOps tools (rolling update, rollback ) in 3 months. Designed and implemented the resource computing module and strategy on Kubernetes involves effectively utilizing available resources such as CPU, memory, disk, and network throughput. Designed and implemented the resource isolation function in a hybrid cloud platform to ensure the stability and reliability of the production environment. Implemented, deployed, and maintained highly available, fault-tolerant, and scalable services to ensure reliable and efficient operations. Implemented a data locality function built on AWS, RDS, and PostgreSQL to ensure that data is stored and accessed in a manner that maximizes performance and minimizes latency. Implemented automation configuration management tools (Ansible, Puppet, Chef) to accelerate DevOps platform initiatives and operations Participate in code review and leading scrums (Python/Golang). Set up Prometheus, Heapster, Grafana, Influx DB for querying and visualizing Metrics. Migrated log Centre from ELK to Pandora to have a better log collection and data analysis. Project: Data infrastructure on Cloud platform The Data pipelines use Airflow DAGs processing querying over billion rows event driven Snowflake data deployed on Kubernetes of Azure/GCP/ AWS according to the different requirements from our customers. Responsibilities: Lead data engineering and data integration efforts for a team of 5-7 members Designed data ingestion patterns for various types of data sources that are used throughout our company. This defined how the organization managed and maintained data within the Hadoop data lake. Developed data pipelines using Scala/Python, Apache Spark, Elasticsearch
Senior Software Engineer
HULU Inc
January 2017 - December 2018
- Project: DevOps Cloud Platform Hulu, a popular video streaming service, utilizes an extraordinary micro services architecture hosted primarily on a PaaS system called DevOps Cloud Platform. This platform is built on Kubernetes, Mesos, and Docker, providing isolated environments for developers and testers to deploy and test their services. The platform consists of schedulers and executors that handle user requirements and container deployment. Responsibilities: Designed and implemented the scheduler, executor, resource computing functions built on Kubernetes, Mesos, Docker to manage and allocate computing resources efficiently. Designed and implemented service discovery, metric collector, Log centre functions. Designed and implemented network bandwidth limitation function for isolating hybrid cloud environments. Implemented Redis cluster as a service platform built on Mesos and Golang. Set up the distributed file system using Ceph and Gluster FS for log storage. Designed and implemented the DAG Flow module for creating a system that can execute and manage Directed Acyclic Graphs (DAGs). Designed and implemented an AI platform that offers TensorFlow as a service on Marathon, Mesos, Kubernetes, Python, Django, and Celery to create a robust and scalable platform. Maintaining the hybrid cloud environment involves managing and optimizing resources across multiple cloud providers such as AWS, Azure, and GCP.
Senior Software Engineer
Byte Dance TikTok Ltd
January 2016 - December 2016
- Project: AI Empowered Cloud Platform AI Empowered Cloud is a cutting-edge cloud computing platform that leverages container technology and artificial intelligence (AI) to provide intelligent and transformative solutions to enterprises. With its focus on intelligent transformation, AI Empowered Cloud aims to revolutionize the way organizations leverage cloud services and AI capabilities. Responsibilities: Designed and implemented the Cloud Infrastructure Platform on Docker, Kubernetes. Implemented the CI/CD DevOps pipeline built on Bash, Ansible, Flannel, Kubernetes and Docker, Calico, etc. Reduced latency in container-to-container communication for optimizing the performance of containerized applications. Implement features and configurations that help minimize latency by utilizing Calico network, an opensource networking and network security solution.
Founder, CTO
BI Cloud Ltd
January 2014 - December 2015
- Project: Intelligent Cloud Platform The Intelligent Cloud Platform provides a comprehensive range of infrastructure services that can be accessed on-demand and with a pay-as-you-go pricing model. These services include computing power, storage options, networking, and databases, among others. The platform offers over 20 services, including data warehousing, deployment tools, directories, and content delivery. One of the key features of the Intelligent Cloud Platform is its support for container management. It is built on industry standard technologies such as Kubernetes, Flannel/Calico, etcd, Ansible, and Docker. This allows you to easily deploy and manage Docker containers on a managed cluster, which can be hosted on Amazon EC2 instances. By utilizing the Intelligent Cloud Platform, you can eliminate the need to install, operate, and scale your own cluster management infrastructure. Instead, you can leverage the platform's capabilities to launch and stop Docker enabled applications, monitor the cluster's state, and access familiar features like security groups and load balancing through simple API calls. Furthermore, the platform offers container scheduling capabilities, allowing you to optimize the placement of containers across the cluster based on resource requirements and availability constraints. This enables efficient utilization of resources and enhances scalability and performance. Responsibilities: Designing and implementing an intelligent cloud platform involves creating a comprehensive architecture and implementing the necessary components to enable intelligent services and capabilities. Implemented configuration management tools such as Ansible and Puppet to automate and streamline the configuration of systems and infrastructure. Set up the distributed file system by Ceph, Gluster FS. Implemented a DevOps platform built on Mesos and Kubernetes. Optimized the performance of the connection between container instances to enhance overall system efficiency. Implemented disaster recovery system, logging system and alert system. Developed DevOps tools, e.g.: rolling update, scale up, scale down. Set up the monitor system by using Heapster, Prometheus, Grafana. Lead a team of developers to deliver a cloud platform within a tight timeline of 5 months. Worked closely with designers and project managers, collaborating on planning, and troubleshooting activities to ensure smooth project execution.
Senior Software Engineer
Wind River System Inc
January 2010 - December 2014
- Project Name: Platform for Gateway As an IoT (Internet of Things) production, Wind River Platform for Gateways provides original equipment manufacturers (OEMs) and original design manufacturers (ODMs) a pre-integrated, fully supported reference platform to create products that aggregate and manage devices and services at the network edge. Implemented networking features: Firewall, Layer 2/3 network packet filtering, NAT, DDOS, IPSec, L2TP, GRE, IPv6, IGMP, QoS, VLAN. Operation System: Wind River Linux 4.3; ARM Project Name: IoT Home Gateway The Wind River IoT Home Gateway is a commercial grade Linux development platform for original equipment manufacturers (OEMs) to launch smart services that take advantage of cloud computing. Responsibilities: Implemented Connection Daemon: It monitors all the types of connection (Ethernet/3G/Wi-Fi) for network traffic controlling and auto switching. Implemented Security: It allows the system administrator to define a least privilege policy for the system, in which every process and user has only the lowest privileges needed to function. Implemented mobile app features: 3G Connection: SIM Card Control, phone book management, SMS/ MMS management, IPv6 over IPv4 Tunneling. Operation System: Wind River Linux 5.0; x86
Software Engineer
Huawei Inc
January 2008 - December 2009
- Project Name: IoT Connected Home Gateway The IoT Connected Home system consists of two main components: the IoT Connected Home Gateway and the Remote Access clients for mobile phones and PCs. The primary purpose of the IoT Connected Home Gateway is to serve as a central hub for controlling all the UPnP (Universal Plug and Play) devices within the home network. Responsibilities: Implemented remote access module: A Home-to-Home (H2H) connection is bidirectional which means that all UPnP devices in both networks are replicated to the other network. The VPN between the two Connected Home Gateways makes use of IPSEC in tunnel mode. Maintaining the TCP/IP stack in the kernel space of Broadcom involves working with the network drivers and firmware specific to Broadcom's hardware. Implemented a firewall module between the Local Area Network (LAN) and the Wide Area Network (WAN) to protect the network from unauthorized access and potential threats. Operation System: Linux (Ubuntu 8.04, Kernel version: 2.6.21.5) Project Name: Eudemon 8000E Firewall The E8000E adopts the architecture of independent control modules, interface modules, and service processing modules. Built on the dual NP, the interface module ensures the speed forwarding of interface traffic. Built on the multi-core and multi-thread architecture, the service processing module ensures the high speed concurrent processing of multiple services, such as the Network Address Translation (NAT), Application Specific Packet Filter (ASPF), Anti-DDoS, and VPN. E8000E adopts the distributed concurrent processing mechanism, which greatly enhances the product performance. Thus, users can expand capacities with low cost. Responsibilities: Implemented a TCP/IP stack for embedded security modules to handle Denial of Service (DoS) Defence and developed software components that handle the TCP/IP protocol suite and incorporated specific mechanisms to mitigate DoS attacks, such as TCP flood, UDP flood, ICMP flood, and TCP proxy. Implemented NAT (Network Address Translation) and NAT server modules to protect private IPs and enable connectivity to public networks. Designed and implemented a blacklist module to drop network packets from source IP addresses listed in the blacklist to enhance network security. Operation System: VxWorks 7.1; NP/RMI
Software Engineer (Internship)
Huawei Inc
January 2006 - December 2006
- Project Name: Packer Filter & Monitor Tool As a network packet filtering tool built on the Net filter architecture (Linux Kernel Version: 2.4.31), it intercepts network packets and provide visibility into their contents, allowing you to analyze and manage them. They can capture network traffic, display packet information (such as source and destination IP addresses, ports, protocols), and provide filtering capabilities to focus on specific packets of interest. Responsibilities: Modified kernel modules of network protocol in Linux. (Linux Kernel Version: 2.4.31) Project Name: Network Analysis & Monitor System The product I designed and implemented includes advanced network traffic analysis capabilities. It enables the identification of network applications being used by clients, providing insights into the types of applications and protocols being utilized on the network. By analyzing network traffic, it helps organizations gain a better understanding of their network usage patterns, identify potential security risks, and optimize network performance. (e.g: MSN, QQ, and other STUN protocol applications) Responsibilities: Implemented network packet monitoring system for the applications to enhance network security and performance. Operation System: FreeBSD 4.3