Yash Shimpi

I'm a Graduate Student studying Information Management from Syracuse University, NY. Let's start scrolling and learn more about me.

Scroll Down

About Me

Hi, I'm Yash, I like making sense out of data to communicate meaningful and actionable insights to solve a challenge. The way I do that is a thorough understanding of business requirements, asking "what, how, and why" questions, performing exploratory analysis, building predictive models, and presenting visually pleasing and insightful answers that are comprehensible to everyone! Making sense requires me to constantly study, understand and apply the languages, databases, and tools that are needed to give the best results. Throughout my experience, I have demonstrated my expertise in data analysis, data visualization, and machine learning. As a Research Assistant at Syracuse University, I played a key role in developing an ETL pipeline to collect and analyze millions of tweets, providing valuable insights into changes in people's beliefs. My collaboration with fellow researchers involved providing statistical and visual insights through Tableau dashboards and the NLTK library, supporting efforts to alleviate the struggles of female refugees and children. During my tenure as a Data Science Intern at RSG Media, I developed a high-accuracy Matching Pipeline to unify data from multiple movie tables, resulting in a comprehensive database. Leveraging tools such as Databricks and PySpark, I engineered an efficient pipeline for transforming and loading massive amounts of metadata. I also automated processes by restructuring workflows and converting SQL queries to PySpark format, enabling seamless integration with machine learning models. As a Data Analyst Intern at Syvylyze Analytics, I successfully profiled store information using Python and Google Maps API, gathering additional metadata crucial for analyzing retail performance. Working closely with cross-functional teams, I enhanced the retail store Data Master and effectively communicated data reporting to non-technical stakeholders.


Contact Details

Yash Shimpi
315-603-8308
yshimpi@syr.edu

Career

Data Analyst

Reorg Research Inc. Jul 2023 - Present

I'm currently developing over 10 Python scripts aimed at automating daily validation and API-related tasks, resulting in a reduction of manual workload by more than 75%. Furthermore, I'm also involved in streamlining the procedure of storing over 1,000 rows of data and conducting financial coverage checks by comparing it with an existing dataset of over 100,000 records. Lastly, I'm supporting 2 teams by facilitating the creation of ad hoc queries for quick insights and seamlessly integrating a new API to fuel system expansion and growth

Research Assistant

Syracuse University Sep 2021 - Jul 2023

I formed ETL pipeline to collect 2.5M+ tweets using Twitter API to understand changes of beliefs in people. I extracted 800+ content data using web scraping tools: Selenium and Python and transformed the data items to store in MongoDB for social network analysis between users. I designed a process for classifying an Event by using Wikipedia API, nltk library and Support Vector Machine (SVM) Model to get an accuracy score of 90%. I assisted 5 Fellows in providing statistical and 20+ visual and text insights using Tableau dashboards and nltk library to support the research for reducing the struggles of female refugees and their children.

Data Science Intern

RSG Media May 2022 - Aug 2022

I developed an 8-layered Matching Pipeline with a match accuracy of 97% to match 2 movie tables to form a unified data source with metdata from both tables. I engineered a production-ready pipeline for creating a database using Databricks and PySpark to perform transformation and loading of 20M+ metadata in 15 tables from the TMDB API to AWS S3 bucket. I restructured a workflow by converting 10+ SQL Queries to PySpark format to automate the process of sending metadata and metrics as an input to machine learning models.

Data Analyst Intern

Syvylyze Analytics Sep 2020 - May 2020

I profiled 900+ store information using Python and Google Maps API to get 8 additional meta data for better analyzing retail transactional performance. I collaborated closely with 2 delivery teams to augment the existing Data Master for retail stores within the enterprise data warehouses and communicate the data reporting to non-technical people. I executed and presented clear documentation of 3 different projects that were an integral part of providing geolocation data for a real-time client project: retail analytics. I conducted log analysis on more than 1000 log files for Data Quality Dashboard with ELK stack.

Education

Syracuse Univserity

Master of Science in Information System (M.S.I.M) Aug 2021 - May 2023 GPA : 4.0

Coursework: Data Science, Database Management, Data Analysis and Decision Making

University of Mumbai

Bachelor of Engineering in Computer Engineering Jul 2016 - Oct 2020 GPA : 8.9

Relevant Coursework: Communication Skills, Machine Learning, Data Warehousing

Skills

Loving data means you have to be know how to manipulate it, how to store it, how to visualize it and how to make sense of it(no tool for this!).

Programming Languages
  • Python
  • R programming
  • PySpark

  • Databases
  • SQL (MySQL, PostgreSQL, MSSQL)
  • NoSQL (MongoDB)
  • Microsoft Azure

  • Tools
  • Tableau
  • PowerBI
  • Excel
  • A/B Testing and Hypothesis Testing
  • Databricks
  • Projects

    Say Hello

    Have a new project in mind? Let's collaborate and build something awesome. Let's turn that idea to an even greater product :)