Close this

Praneeth Poreddy

Development
Texas, United States

Skills

Data Engineering

About

Praneeth Poreddy's skills align with System Developers and Analysts (Information and Communication Technology). Praneeth also has skills associated with Database Specialists (Information and Communication Technology). Praneeth Poreddy appears to be a low-to-mid level candidate, with 5 years of experience.
View more

Work Experience

Data Engineer

Acko
August 2020 - November 2021
  • Involved in developing batch processing applications that require functional pipelining using Spark APIs. Involved in building a data pipeline and performed analytics using AWS stack (EMR, EC2, S3, RDS, Lambda, Glue, Redshift). Collaborated with client team to transform data and integrate algorithms and models into automated processes. Utilized Spark's in memory capabilities to handle large datasets on S3 Data Lake. Loaded data into S3 buckets, then filtered and loaded into Hive external tables. Strong Hands-on experience in creating and modifying SQL stored procedures, functions, views, indexes, and triggers. Performed ETL operations using Python, Spark SQL, S3 and Redshift on terabytes of data to obtain customer insights. Used programming skills in Python to build robust data pipelines and dynamic systems. Good Understanding of other AWS services like S3, EC2 IAM, RDS Experience with Orchestration and Data Pipeline like AWS Step functions/Data Pipeline/Glue. Integrated data from a variety of sources, assuring that they adhere to data quality and accessibility standards. Experience in building data transformation and processing solutions. Has strong knowledge of large-scale search applications and building high volume data pipelines. Experience in Writing ETL (Extract / Transform / Load) processes, designs database systems and develops tools for real-time and offline analytic processing. Used knowledge in Hadoop architecture, HDFS commands and experience designing & optimizing queries to build data pipelines

Data Engineer

HTC Global
March 2018 - July 2020
  • Utilized Spark's in memory capabilities to handle large datasets on S3 Data Lake. Loaded data into S3 buckets, then filtered and loaded into Hive external tables. Strong Hands-on experience in creating and modifying SQL stored procedures, functions, views, indexes, and triggers. Performed ETL operations using Python, SparkSQL, S3 and Redshift on terabytes of data to obtain customer insights Involved heavily in setting up the CI/CD pipeline using Jenkins, Terraform and AWS Performed end- to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift, S3 Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB Good Understanding of other AWS services like S3, EC2 IAM, RDS Experience with Orchestration and Data Pipeline like AWS Step functions/Data Pipeline/Glue. Transformed the data using AWS Glue dynamic frames with PySpark; cataloged the transformed the data using Crawlers and scheduled the job and crawler using workflow feature Designed and managed public/private cloud infrastructures using Confidential Web Services (AWS) which include EC2, S3, Cloud Front, Elastic File System, and IAM which allowed automated operations. Deployed Cloud Front to deliver content further allowing reduction of load on the servers. Created IAM policies for delegated administration within AWS and Configure IAM Users / Roles / Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB. Projects Sentiment Analysis for Product Reviews Developed a Python code for conducting sentiment analysis of product reviews in an e-commerce website to recommend products to users with similar interests based on ratings. Cleaned and analyzed the data using Pandas, Matplotlib, NumPy, Math, and Seaborn libraries and conducted the analysis. Online Social Networking Platform - DBMS Created a backend online social networking portal for sending data from CSV files to a database using Python script. Designed a relational database with tables for the online social networking portal using an SQL server which incorporates strong and weak entities with non-key attributes and surrogate keys. Designed a relational database with tables for the online social networking portal using an SQL server which incorporates strong and weak entities with non-key attributes and surrogate keys. Data Integration Using Azure Services Utilized Twitter API to extract data from Twitter like tweets, user profiles, and hashtags and retrieve data in JSON format. Implemented data transformation tasks within Azure Data Factory (ADF) to process the raw Twitter data, using ADF's data flow capabilities to cleanse, filter, and enrich the extracted tweets, extracting essential information like tweet text, user location, timestamps, and user mentions.

Data Engineer

HCA
December 2022 - Present
  • Experience with Azure cloud platforms (HDInsight, Databricks, DataLake, Blob, Data Factory, Synapse, SQL DB, SQL DWH). Performed data cleansing and applied transformations using Databricks and Spark data analysis Designed and automated Custom-built input adapters using Spark, Sqoop and Oozie to ingest and analyze data from RDBMS to Azure Datalake. Involved in the development of automated workflows for daily incremental loads, moving data from traditional RDBMSs to data lakes Worked on Azure Synapse analytics service that brings together enterprise data warehousing and Big Data analytics Experience in the creation of database objects such as tables, views, stored procedures, triggers, packages, and functions using T-SQL to provide efficient data management and structure Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks. Created Pipelines in Azure Data Factory to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards. Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity. Hands-on experience on developing SQL Scripts for automation purpose. Good Understanding of Data ingestion, Airflow Operators for Data Orchestration and other related python libraries. Analyzed the SQL scripts and designed solutions to implement using PySpark. Developed ETL Process using SPARK. Extensively used Databricks notebooks for interactive analytics using Spark APIs Involved in building an Enterprise DataLake using Data Factory and Blob storage, enabling other teams to work with more complex scenarios and ML solutions. Used Azure Data Factory, SQL API and Mongo API and integrated data from MongoDB, MS SQL, and cloud (Blob, Azure SQL DB) Facilitated data for interactive Power BI dashboards and reporting purposes. Continuous Integration and Continuous Deployment (CI/CD) of the Applications into Azure Cloud. Experience in creating various jobs in Jenkins like maven, free-style, external, pipeline and multi-configuration job.

Education

University of North Texas

Masters in computer science
January 2022 - May 2023