Sree Raga Lahari Guttikonda
Development
Pennslyvania, United States
Skills
Data Engineering
About
SREE RAGA LAHARI's skills align with System Developers and Analysts (Information and Communication Technology). SREE also has skills associated with Database Specialists (Information and Communication Technology). SREE RAGA LAHARI has 9 years of work experience.
View more
Work Experience
AWS Data Engineer
Anthem
December 2021 - Present
- Responsibilities: Engaged with designing and developing highly distributed, fault tolerant ETL platforms using Spark that meet a variety of business use cases. Conducted thorough analyses of large-scale, semi-structured and structured datasets to inform business analytics and develop ETL strategy. Developed and implemented cloud based ETL solutions to optimize data transformation procedures using AWS Glue and Lambda. The regular ETL duties of processing eligibility, claims, and pharmacy data were implemented as cron-scheduled jobs in AWS Glue and Lambda. The outputs were prepared as CSV and Parquet files for external stakeholders. Mastered the challenges of transferring data and code from Teradata Vantage to AWS Redshift Acquired hands-on expertise utilizing Glue, Athena, S3, Lambda, and CloudFormation among other AWS technologies to facilitate diverse data engineering assignments. Used AWS CloudWatch extensively to monitor data pipelines and put-up custom metrics, alarms, and dashboards. Managed large-scale data storage solutions with Amazon S3, guaranteeing data security and adherence to lifecycle management and access control regulations. Used Apache Spark to carry out data processing and analytics operations, combining Spark Streaming and Spark SQL for batch and real-time processing requirements. Utilizing managed data processing environments provided by AWS Glue ETL and Amazon EMR, I deployed and managed Spark-based apps on AWS Demonstrated expertise in the design and optimization of data schemas across both relational and NoSQL databases, ensuring efficient data architecture for high-performance querying and scalability. Optimized SQL queries to enhance database performance and support complex data modeling requirements, ensuring streamlined data operations for diverse applications." Managed source code and version control using GitLab, ensuring effective team collaboration and code integrity. Environment: PySpark, Python, AWS services (Glue, Lambda, Secrets Manager, S3, CloudWatch, Redshift), Teradata Vantage, Terraform, SQL, Jenkins, Ansible, GitLab.
Azure Data Engineer
September 2019 - November 2021
- Responsibilities: Developed and implemented distributed, fault tolerant ETL platforms on Azure that can handle various business use cases and guarantee high availability of services for processing data. Large-scale semi-structured and structured data sets were analyzed to support business analytics and create Azure -based ETL plans. Utilizing Azure Data Factory and Azure Functions, cloud native ETL solutions were implemented to optimize data integration and transformation procedures. Workflows for an automated data pipeline, including scheduling jobs in Azure Data Factory and Functions and exporting data in CSV and Parquet formats for business stakeholders to use. Effectively handled the complex process of migrating code and data from on-premises databases to Azure Synapse Analytics (formerly known as SQL Data Warehouse). Acquired expertise in utilizing Azure data services, such as Data Factory, Data Lake Storage, Databricks, and Synapse Analytics, to facilitate diverse data engineering assignments. Azure Monitor was used to monitor data pipelines, and custom metrics, alarms, and dashboards were built to preserve data integrity and pipeline effectiveness. Oversaw the administration of massive data storage on Azure Blob Storage and Data Lake , enforcing adherence to data governance guidelines and security best practices. Used Azure Databricks for data processing and analytics, utilizing batch and real-time processing capabilities along with Databricks SQL Analytics connectivity. Utilized Databricks and HDInsight to deploy and manage Spark -based apps and big data processing environments on Azure. Specialized in creating and fine-tuning data schemas for Cosmos and Azure SQL databases while maximizing scalability and performance. Improved database performance through the optimization of intricate data models and SQL queries to guarantee effective data handling for a range of applications. Leveraged Azure Repos to manage source code and version control, encouraging teamwork and preserving consistency and quality of the code. Environment: Azure services (Data Factory, Functions, Azure Databricks, Synapse Analytics, Data Lake Storage, Blob Storage, Azure Monitor, Azure SQL Database, Cosmos DB, Azure Repos,Python, PySpark, SQL, Terraform, Jenkins, Azure DevOps
Big Data Engineer
Bank of America
June 2017 - August 2019
- Responsibilities Utilized Apache Spark to implement data ingestion procedures for loading various datasets from RDBMS files, XML, and CSV sources. Spark /PySpark was used to make it easier to load datasets into Cassandra and Hive while maintaining data accessibility and integrity. Utilized Apache Spark to carry out data transformation and cleaning procedures, incorporating Hive for improved data processing and storage capabilities. Switched from HQL to PySpark scripts for computational code, optimizing for scalability and efficiency in data processing. Employed Scala and Python to create high-performance Spark apps, optimizing data with DataFrames, RDDs, and datasets. Conducted data consolidation tasks with Spark and Hive, applying ETL processes for data repair, auditing, and filtering. Built a solid real-time data streaming infrastructure by integrating Spark and Apache Kafka to manage the ingesting and processing of data streams. Configured and maintained Kafka clusters, skillfully handling consumer groups, partitions, and topics to guarantee a smooth data flow. Exhibited expertise in data visualization by using programs like Tableau to produce informative data representations for stakeholders. Managed HDFS for distributed storage, handling data replication and ensuring robust fault tolerance for data operations. Identified and resolved issues within Spark, Kafka, and Hive environments, focusing on performance tuning and resource optimization. leveraged PySpark to do data extraction, aggregation, and analysis; the processed data was then stored in Hive for further use. Converted Informatica ETL logic into Spark processes, employing Spark SQL and the DataFrames API to perform data transformations in accordance with BI and reporting requirements. Environment: Apache Spark, PySpark,HDFS, Apache Hive, Apache Cassandra,Spark Streaming, Apache Kafka, Scala, Python,Tableau,RDBMS (e.g., MySQL, PostgreSQL), NoSQL (e.g., MongoDB, HBase),Linux, Unix, Windows.
Hadoop Developer
Shineteck Software Solutions
May 2014 - June 2016
- Responsibilities: Collaborated with data architects and IT teams to understand enterprise data requirements and translated these into Hadoop development tasks. Strong Hadoop ecosystems were designed and built to support scalable data processing and analytics capabilities. Created Hadoop applications utilizing MapReduce, Hive, and Pig to do data transformation, mapping, and aggregation tasks. Ensured data accuracy and integrity by utilizing Sqoop, Flume, and custom scripts to conduct data import from several data sources into HDFS Managed Hadoop clusters, enhancing HDFS throughput and processing layer performance through effective resource allocation. Using data partitioning and bucketing techniques, Hive's query efficiency was improved enabling quicker data retrieval and analysis. Produced HiveQL scripts to carry out ETL, data cleansing, and data analytics; created and maintained Hive tables. Oozie was used to integrate complicated data workflows, guaranteeing smooth data pipeline and scheduled job automation. To help data science and machine learning initiatives, Pig Latin scripts for data extraction, transformation, and loading (ETL) processes were developed. Maintained and enhanced Hadoop systems to increase system resilience, data quality, and processing performance. Hadoop cluster performance was tracked, and MapReduce tasks were adjusted to maximize memory and processing power. Collaborated with the DevOps team to facilitate continuous integration and continuous deployment (CI/CD) practices for Hadoop applications. Environment: Hadoop MapReduce, HDFS, Hive, Pig, Oozie, Sqoop, Flume, YARN, Apache Ambari, Java, Python, Shell Scripts, Apache Ranger, Apache Knox, Apache Oozie, MySQL, PostgreSQL, Apache Zeppelin or Hue, Jenkins, Bamboo.