Close this

Sohail Anjum

Development
Punjab, Pakistan

Skills

Data Engineering
Apache Spark
AWS (Amazon Web Services)
Azure Databricks
PostgreSQL
mySQL
Microsoft SQL Server
Hadoop

About

SOHAIL ANJUM's skills align with Programmers (Information and Communication Technology). SOHAIL also has skills associated with Database Specialists (Information and Communication Technology). SOHAIL ANJUM has 12 years of work experience.
View more

Acomplishments

Professional Summary • Overall, 11 years of IT experience across Big Data. ETL\ELT, SQL, Python, Scala, C#, Java. Interested and passionate about working in Big Data environment and data analytics • Expertise in AWS Services such as Amazon Glue, AWS Lambda, Amazon RDS, Amazon Redshift, Amazon DynamoDB, Amazon S3, AWS Data Pipeline, AWS Code Commit, AWS CloudWatch, AWS IAM, AWS IoT Analytics etc. • Designed and developed ETL/ELT data pipelines with AWS Glue/Lambda. • Utilized AWS Glue Data Catalog to maintain a centralized metadata repository, enabling efficient data discovery and cataloging. • Implemented and optimized complex data transformations on unstructured, semi-structured and structure data using AWS Glue/ PySpark data frames. • Developed Crawlers in AWS Glue to automate the discovery and cataloging of metadata from diverse data sources. • Integrated AWS Glue with other AWS services, including Amazon S3, RDS and Redshift, to create scalable and flexible data architectures. • Strong Analytical and data aggregation skills using PySpark, Advanced SQL, duckdb. Pandas. • Developed data pipelines using Databricks Delta, a data lake house platform built on Apache Spark that provides ACID transactions and scalable analytics. • Performed data ingestion from different data sources like S3 bulk files, database using PySpark/JDBC. • Hands on experience in Glue Catalog/Athena and Hadoop, Hive tables • Configured Auto workload management in AWS Redshift • Implemented PySpark streaming to pick up data from Kafka topic and send to Data pipeline. • Design and Implemented Database, Tables, Relationships and Store Procedures. • Developed custom scripts and applications using AWS Wrangler API to automate data ingestion, cleaning, and transformation tasks, resulting in faster data processing times and improved data quality. • Designed interactive dashboards and reports in Microsoft Power BI, enhancing data-driven insights. • Implemented DAX/Measure/Relationship for custom KPIs to track key business metrics.
Professional Summary • Overall, 11 years of IT experience across Big Data. ETL\ELT, SQL, Python, Scala, C#, Java. Interested and passionate about working in Big Data environment and data analytics • Expertise in AWS Services such as Amazon Glue, AWS Lambda, Amazon RDS, Amazon Redshift, Amazon DynamoDB, Amazon S3, AWS Data Pipeline, AWS Code Commit, AWS CloudWatch, AWS IAM, AWS IoT Analytics etc. • Designed and developed ETL/ELT data pipelines with AWS Glue/Lambda. • Utilized AWS Glue Data Catalog to maintain a centralized metadata repository, enabling efficient data discovery and cataloging. • Implemented and optimized complex data transformations on unstructured, semi-structured and structure data using AWS Glue/ PySpark data frames. • Developed Crawlers in AWS Glue to automate the discovery and cataloging of metadata from diverse data sources. • Integrated AWS Glue with other AWS services, including Amazon S3, RDS and Redshift, to create scalable and flexible data architectures. • Strong Analytical and data aggregation skills using PySpark, Advanced SQL, duckdb. Pandas. • Developed data pipelines using Databricks Delta, a data lake house platform built on Apache Spark that provides ACID transactions and scalable analytics. • Performed data ingestion from different data sources like S3 bulk files, database using PySpark/JDBC. • Hands on experience in Glue Catalog/Athena and Hadoop, Hive tables • Configured Auto workload management in AWS Redshift • Implemented PySpark streaming to pick up data from Kafka topic and send to Data pipeline. • Design and Implemented Database, Tables, Relationships and Store Procedures. • Developed custom scripts and applications using AWS Wrangler API to automate data ingestion, cleaning, and transformation tasks, resulting in faster data processing times and improved data quality. • Designed interactive dashboards and reports in Microsoft Power BI, enhancing data-driven insights. • Implemented DAX/Measure/Relationship for custom KPIs to track key business metrics.

Work Experience

Software Engineer

Trees Technologies
April 2013 - March 2024
  • Key Responsibilities:
  • • Coordinated with the Technical Lead on current programming tasks.
  • • Collaborated with other programmers to design and implement features.
  • • Quickly produce well-organized, optimized, and documented source code.
  • • Create and document software tools required by artists or other developers.
  • • Debug existing source code and polish feature sets.
  • • Contributed to technical design documentation.
  • • Work independently when required.
  • Tools / Environment:
  • C#.NET, Asp.Net, Web

Senior Data Engineer

S&P Global Market Intelligence
February 2016 - December 2021
  • Key Responsibilities:
  • • Experienced in designing and deploying data pipelines and analytics solutions using Databricks on AWS.
  • • Developed ETL workflows using Databricks and AWS services such as Glue, Lambda.
  • • Expertise in performance tuning and optimization of PySpark-based applications and clusters on AWS
  • using Databricks, Spark UI, and CloudWatch metrics.
  • • Implementing complex business transformation using PySpark language and storing transformations into
  • delta/parquet format, performed

Software Engineer

F3 Technologies/Health Care Technologies
March 2014 - January 2016
  • Key Responsibilities:
  • • Analyzed and implemented best coding practices into the project code.
  • • Full knowledge of how to design, program, implement, and maintain.
  • • Identified and developed areas for revisions in current projects.
  • • Executing and implementing software tests.
  • • Developed quality assurance procedures for software projects.
  • • Coordinating the efforts and cooperating with other developers, designers, system and business
  • analysts, etc.
  • • Break down bigger tasks into smaller, easily

Senio Data Engineer

DPL
December 2021 - Present
  • Key Responsibilities: Structured and programmed scalable data pipelines using AWS Glue and Lambda functions to process large volumes of data from various sources. Utilized AWS Glue allowing seamless data integration, transformation, and loading into data lakes, databases and warehouses Designed and implemented multiple databases using Amazon RDS, to support the company's web, mobile applications and IoT data Implemented data processing pipelines using PySpark, a Python library for Apache Spark, to handle largescale data transformations efficiently. Utilized AWS Glue for ETL processes, allowing seamless data integration, transformation, and loading into data lakes and warehouses. Designed and optimized data workflows to ensure scalability and performance in a cloud environment. Collaborated with cross-functional teams to understand data requirements and implemented solutions for complex data processing tasks. Improved database performance by through implementing database optimization techniques such as indexing, query optimization, and database tuning Hands on NoSQL databases using AWS DynamoDB, utilizing features such as partitions, global secondary indexes, and streams. Designed and maintained serverless ETL workflows using AWS Step Functions and AWS Lambda to transform data in the data lake into consumable formats for analytics and reporting. Designed and implemented real-time data streaming solutions using AWS Kinesis and AWS Lambda to process and analyze real-time data. Designed interactive dashboards and reports in Microsoft Power BI, enhancing data-driven insights. Implemented DAX/Measure/Relationship for custom KPIs to track key business metrics. Implemented security best practices for AWS services using AWS Identity and Access Management (IAM) and VPC security groups. Experienced in using Nexus Scrum framework to scale Agile Scrum practices across multiple teams, resulting in improved alignment, coordination, and delivery of complex software products. Expertise in Jira for project management, with experience in creating and managing tasks, user stories, bugs, and epics, resulting in increased visibility and efficiency in software development workflows Tools / Environment: AWS Service (Glue, Lambda, Step Functions, Data pipeline, IoT Core, IoT Analytics, RDS for MySQL, Code commit), Microsoft Power BI, PySpark, Pandas, NumPy, VPC, Subnets, Nexus Scrum, JIRA etc.

Education

University of Arid Agriculture

BS(CS)
January 2008 - January 2012