Portfolio Yash Shimpi

About Me

Hi, I'm Yash, I like making sense out of data to communicate meaningful and actionable insights to solve a challenge. The way I do that is a thorough understanding of business requirements, asking "what, how, and why" questions, performing exploratory analysis, building predictive models, and presenting visually pleasing and insightful answers that are comprehensible to everyone! Making sense requires me to constantly study, understand and apply the languages, databases, and tools that are needed to give the best results. Throughout my experience, I have demonstrated my expertise in data analysis, data visualization, and machine learning. As a Research Assistant at Syracuse University, I played a key role in developing an ETL pipeline to collect and analyze millions of tweets, providing valuable insights into changes in people's beliefs. My collaboration with fellow researchers involved providing statistical and visual insights through Tableau dashboards and the NLTK library, supporting efforts to alleviate the struggles of female refugees and children. During my tenure as a Data Science Intern at RSG Media, I developed a high-accuracy Matching Pipeline to unify data from multiple movie tables, resulting in a comprehensive database. Leveraging tools such as Databricks and PySpark, I engineered an efficient pipeline for transforming and loading massive amounts of metadata. I also automated processes by restructuring workflows and converting SQL queries to PySpark format, enabling seamless integration with machine learning models. As a Data Analyst Intern at Syvylyze Analytics, I successfully profiled store information using Python and Google Maps API, gathering additional metadata crucial for analyzing retail performance. Working closely with cross-functional teams, I enhanced the retail store Data Master and effectively communicated data reporting to non-technical stakeholders.

Contact Details

Yash Shimpi
315-603-8308
yshimpi@syr.edu

Download Resume

Career

I'm currently developing over 10 Python scripts aimed at automating daily validation and API-related tasks, resulting in a reduction of manual workload by more than 75%. Furthermore, I'm also involved in streamlining the procedure of storing over 1,000 rows of data and conducting financial coverage checks by comparing it with an existing dataset of over 100,000 records. Lastly, I'm supporting 2 teams by facilitating the creation of ad hoc queries for quick insights and seamlessly integrating a new API to fuel system expansion and growth

I formed ETL pipeline to collect 2.5M+ tweets using Twitter API to understand changes of beliefs in people. I extracted 800+ content data using web scraping tools: Selenium and Python and transformed the data items to store in MongoDB for social network analysis between users. I designed a process for classifying an Event by using Wikipedia API, nltk library and Support Vector Machine (SVM) Model to get an accuracy score of 90%. I assisted 5 Fellows in providing statistical and 20+ visual and text insights using Tableau dashboards and nltk library to support the research for reducing the struggles of female refugees and their children.

I developed an 8-layered Matching Pipeline with a match accuracy of 97% to match 2 movie tables to form a unified data source with metdata from both tables. I engineered a production-ready pipeline for creating a database using Databricks and PySpark to perform transformation and loading of 20M+ metadata in 15 tables from the TMDB API to AWS S3 bucket. I restructured a workflow by converting 10+ SQL Queries to PySpark format to automate the process of sending metadata and metrics as an input to machine learning models.

I profiled 900+ store information using Python and Google Maps API to get 8 additional meta data for better analyzing retail transactional performance. I collaborated closely with 2 delivery teams to augment the existing Data Master for retail stores within the enterprise data warehouses and communicate the data reporting to non-technical people. I executed and presented clear documentation of 3 different projects that were an integral part of providing geolocation data for a real-time client project: retail analytics. I conducted log analysis on more than 1000 log files for Data Quality Dashboard with ELK stack.

Education

Coursework: Data Science, Database Management, Data Analysis and Decision Making

Relevant Coursework: Communication Skills, Machine Learning, Data Warehousing

Skills

Loving data means you have to be know how to manipulate it, how to store it, how to visualize it and how to make sense of it(no tool for this!).

Programming Languages

Python

R programming

PySpark

Databases

SQL (MySQL, PostgreSQL, MSSQL)

NoSQL (MongoDB)

Microsoft Azure

Tools

Tableau

PowerBI

Excel

A/B Testing and Hypothesis Testing

Databricks

Yash Shimpi

I'm a Graduate Student studying Information Management from Syracuse University, NY. Let's start scrolling and learn more about me.

About Me

Contact Details

Career

Data Analyst

Research Assistant

Data Science Intern

Data Analyst Intern

Education

Syracuse Univserity

University of Mumbai

Skills

Projects

Say Hello

Have a new project in mind? Let's collaborate and build something awesome. Let's turn that idea to an even greater product :)

Email

Phone

Yash Shimpi

I'm a Graduate Student studying Information Management from Syracuse University, NY. Let's start scrolling and learn more about me.

About Me

Contact Details

Career

Data Analyst

Research Assistant

Data Science Intern

Data Analyst Intern

Education

Syracuse Univserity

University of Mumbai

Skills

Projects

Text Analysis using Reddit API

Music Recommendation using Spotify API

Football Data Analysis Visualization

Recommendations for Hotels based on the cancellation data

The Office – Meeting Room Booking System

Skaterboy

Rucksack

Sand Dunes

Say Hello

Have a new project in mind? Let's collaborate and build something awesome. Let's turn that idea to an even greater product :)

Email

Phone