Close this
Close this

Harshavardhan Pothula

Product Management
Illinois , United States

Skills

Python

About

Harshavardhan Pothula's skills align with System Developers and Analysts (Information and Communication Technology). Harshavardhan also has skills associated with Programmers (Information and Communication Technology). Harshavardhan Pothula has 6 years of work experience.
View more

Work Experience

Azure Data Engineer

State Farm
January 2023 - Present
  • Description: State Farm is an insurance company that offers insurance and financial services. It is the largest property, casualty, and auto insurance provider. It offers services to increase food production, establish commercial orchards, develop land, develop farm machinery, and the breeding, production, and distribution of seeds. Responsibilities: Conducted Performance tuning and optimization of Snowflake data warehouse, resulting in improved query execution times and reduced operational costs. Designed and deployed a Kubernetes-based containerized infrastructure for data processing and analytics, leading to a 20% increase in data processing capacity. Wrote and executed various MYSQL database queries from Python using Python- MySQL connector and MySQL dB package. Involved in monitoring and scheduling the pipelines using Triggers in Azure DataFactory. Created Pipelines in ADF using Linked Services/ Datasets/ Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards. Build and deployed the code Artefacts into the respective environments in the Confidential Azure cloud. Storing different configs in No SQL database Mongo DB and manipulating the configs using PyMongo. Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds. Involved in loading data from rest endpoints to Kafka. Used Continuous Delivery Pipeline. Deployed microservices, including provisioning Azure environments and developed modules using Python scripting and Shell Scripting. Involved in various phases of Software Development Lifecycle (SDLC) of the application, like gathering requirements, design, development, deployment, and analysis of the application. Led requirement gathering, business analysis, and technical design for Hadoop and Big Data projects. Spearheaded HBase setup and utilized Spark and SparkSQL to develop faster data pipelines, resulting in a 60% reduction in processing time and improved data accuracy. Instantiated, created, and maintained CI/CD (continuous integration & deployment) pipelines and apply automation to environments and applications. Worked on various automation tools like GIT, Terraform, Ansible. Created datasets from S3 using AWS Athena and created Visual insights using AWS Quicksight Monitoring Data Quality and integrity end to end testing and reverse engineering and documented existing program and codes. Extensive use of cloud shell SDK in GCP to configure/deploy the services using GCP Big Query. Integrated Kubernetes with cloud-native services, such as AWS EKS and GCP GKE, to leverage additional scalability and managed services. Working on migrating Data to the cloud (Snowflake and AWS) from the legacy data warehouses and developing the infrastructure. Handled importing data from various Data sources, performed transformations using Pandas, Spark and loaded into Hive as external files using HDFS. Enhanced by adding Python XML SOAP request/response handlers to add accounts, modify trades and security updates. Used Pig as ETL tool to do Transformations with joins and pre-aggregations before storing the data onto HDFS and assisted Manager by providing automation strategies, Selenium/ Cucumber Automation and JIRA reports. Ensured data quality and accuracy with custom SQL and Hive scripts and created data visualizations using Python and Tableau for improved insights and decision-making. Implemented data transformations and enrichment using Apache Spark Streaming to clean and structure the data for analysis. Environment: Python, Django, JavaScript, MySQL, NumPy, SciPy, PandasAPI, PEP, PIP, Jenkins, JSON, Git, JavaScript, AJAX, RESTful web service, MySQL, PyUnit.

AWS Data Engineer

OSF Saint Francis Medical Center
April 2022 - December 2022
  • Description: OSF Saint Francis Medical Center is a medical practice company. It is a not-for-profit Catholic health care organization that operates a medical group, hospital system, and other health care facilities. Responsibilities: Worked with AWS Terraform templates in maintaining the infrastructure as code. Develop metrics based on SAS scripts on legacy system, migrating metrics to snowflake (AWS). The AWS Lambda functions were written in Spark with cross - functional dependencies that generated custom libraries for delivering the Lambda function in the cloud. Performed raw data ingestion into, which triggered a lambda function and put refined data into ADLS. Looked into existing Java /Scala spark processing and maintained, enhanced the jobs. Analyzed and developed a modern data solution with Azure PaaS service to enable data visualization. Understood the application's current Production state and the impact of new installation on existing business processes. Involved in building database Model, API s and Views utilizing Python, in order to build interactive web based solutions. Developed Python Spark modules for Data ingestion & analytics loading from Parquet, Avro, JSON data and from database tables. Ability to apply the spark Data Frame API to complete data manipulation with in sparksession. Worked on creating MapReduce programs to parse the data for claim report generation and running the Jars in Hadoop. Co-ordinated with Java team in creating MapReduce programs. Staged the API and Kafka Data (in JSON file format) into Snowflake DB by Flattening the same for different functional services. Experience in creating Kubernetes replication controllers, Clusters and label services to deployed Microservices in Docker. Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark Databricks cluster and Ability to apply the spark Data Frame API to complete Data manipulation within spark session. Used Python to write Data into JSON files for testing Django Websites, Created scripts for data modelling and data import and export. Consult leadership/stakeholders to share design recommendations and thoughts to identify product and technical requirements, resolve technical problems and suggest Big Data based analytical solutions. Instantiated, created, and maintained CI/CD continuous integration & deployment pipelines and apply automation to environments and applications. Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis. Environment: Spark, Python, AWS, S3, Glue, Redshift, DynamoDB, Hive, SparkSQL, Docker, Kubernetes, Airflow, GCP, ETL workflows

Data Engineer

Equitas Small Finance Bank
November 2019 - December 2021
  • Description: Equitas Small Finance Bank is a small finance bank. It operates through 3 segments: Treasury, Wholesale banking and Retail Banking. Its services include retail banking business with focus on micro finance, commercial vehicle finance, etc for individuals and micro and small enterprises. Responsibilities: Experience in using different types of stages like Transformer, Aggregator, Merge, Join, Lookup, and Sort, remove duplicated, Funnel, Filter, Pivot for developing jobs. Created pipelines to load the data using ADF. Building/Maintaining Docker container clusters managed by Kubernetes Linux, Bash, GIT, Docker. Build Data pipelines using Python, Apache Airflow for ETL related jobs inserting data into Oracle. Creating job flow using Airflow in python and automating the jobs. Airflow will have separate stack for developing DAGs on and will run jobs on EMR or EC2 Cluster. Written queries in MySQL and Native SQL. Pipelines were created in Azure Data Factory utilizing Linked Services to extract, transform, and load data from many sources such as Azure SQL Data warehouse, write-back tool, and backwards. Configured Spark streaming to get ongoing information from the Kafka and store the stream information to DBFS. Deployed models as python package, as API for backend integration and as services in a microservices architecture with a Kubernetes orchestration layer for the Dockers containers. Involved in the entire lifecycle of the projects including Design, Development, and Deployment, Testing and Implementation, and support. Analyzed the SQL scripts and designed it by using Spark SQL for faster performance. Worked on Big Data Integration & Analytics based on Hadoop, SOLR, Py Spark, Kafka, Storm and web Methods. Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries. Processed the image data through the Hadoop distributed system by using Map and Reduce then stored into HDFS. Used AWS to create storage resources and define resource attributes, such as disk type or redundancy type, at the service level. Created Data tables utilizing PyQt to display customer and policy information and add, delete, update customer records. Working on data management disciplines including data integration, modeling and other areas directly relevant to business intelligence/business analytics development. Developed tools using Python, Shell scripting, XML to automate tasks. Involved in database migration methodologies and integration conversion solutions to convert legacy ETL processes into Azure Synapse compatible architecture. Created clusters to classify control and test groups. Developed multiple notebooks using Pyspark and Spark SQL in Data bricks for data extraction, analyzing and transforming the data according to the business requirements. Build Jenkins jobs for CI/CD Infrastructure for GitHub repos. Automated and monitored AWS infrastructure with Terraform for high availability and reliability, reducing infrastructure management time by 90% and improving system uptime. Integrated Azure Data Factory with Blob Storage to move data through DataBricks for processing and then to Azure Data Lake Storage and Azure SQL data warehouse. Environment: ER/Studio, Teradata, SSIS, SAS, Excel, T-SQL, SSRS, Tableau, SQLServer, Cognos, Pivottables, Graphs, MDM, PL/SQL, ETL, DB2, Oracle, SQL, Teradata, Informatica Power Center etc.

Data Analyst

TTK Group
July 2017 - October 2019
  • Description: TTK Group manufactures home appliances and health care products. It produces and markets kitchen ware, pharmaceuticals, medical devices, animal products and food products. I designed, developed, and maintained scalable web applications. Responsibilities: • Involved in creating logical and physical data analyst with STAR and SNOWFLAKE schema techniques using Erwin in Data warehouse as well as in Data Mart. Migrated SSIS packages from SQL Server to SSIS and Created Mappings for Initial load from MSSQL server 2005 to Netezza while performing data cleansing. • Performed data analysis and data profiling using complex SQL on various sources systems including Oracle, Netezza and Teradata. Performed match/ merge and ran match rules to check the effectiveness of MDM process on data • Involved in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch. • Writing and executing customized SQL code for adhoc reporting duties and used other tools for routine • Extensively used SQL to develop reports from the existing relational dataware house tables (Queries, Joins, Filters, etc) • Converted SQL Server packages to Informatica Power Center mapping to be used with Netezza and involved in Informatica MDM processes including batch based and real-time processing • Performed ad-hoc queries by using SQL, PL/SQL, MSAccess, MS excel and UNIX to meet business Analyst's needs • Responsible for reviewing data model, data base physical design, ETLdesign, and Presentation layer design • Conducted meetings with business and development teams for data validation and end-to-end data mapping • Developed logging for ETL load at the package level and task level to log number of records processed by each package and each task in a package using SSIS. Involved in Oracle, SQL, PL/SQL, T-SQL queries programming and creating objects such as stored procedures, packages, functions, triggers, tables, and views. Environment: Erwin, MDM, ETL, Teradata, MS SQLServer 2005, PL/SQL, Netezza, DB2, Oracle, SSIS, IBM etc.

Education

Governors State university

Masters in computer Science