Hi, I'm Yash, I like making sense out of data to communicate meaningful and actionable insights to solve a challenge. The way I do that is a thorough understanding of business requirements, asking "what, how, and why" questions, performing exploratory analysis, building predictive models, and presenting visually pleasing and insightful answers that are comprehensible to everyone! Making sense requires me to constantly study, understand and apply the languages, databases, and tools that are needed to give the best results. Throughout my experience, I have demonstrated my expertise in data analysis, data visualization, and machine learning. As a Research Assistant at Syracuse University, I played a key role in developing an ETL pipeline to collect and analyze millions of tweets, providing valuable insights into changes in people's beliefs. My collaboration with fellow researchers involved providing statistical and visual insights through Tableau dashboards and the NLTK library, supporting efforts to alleviate the struggles of female refugees and children. During my tenure as a Data Science Intern at RSG Media, I developed a high-accuracy Matching Pipeline to unify data from multiple movie tables, resulting in a comprehensive database. Leveraging tools such as Databricks and PySpark, I engineered an efficient pipeline for transforming and loading massive amounts of metadata. I also automated processes by restructuring workflows and converting SQL queries to PySpark format, enabling seamless integration with machine learning models. As a Data Analyst Intern at Syvylyze Analytics, I successfully profiled store information using Python and Google Maps API, gathering additional metadata crucial for analyzing retail performance. Working closely with cross-functional teams, I enhanced the retail store Data Master and effectively communicated data reporting to non-technical stakeholders.
Yash Shimpi
315-603-8308
yshimpi@syr.edu
I'm currently developing over 10 Python scripts aimed at automating daily validation and API-related tasks, resulting in a reduction of manual workload by more than 75%. Furthermore, I'm also involved in streamlining the procedure of storing over 1,000 rows of data and conducting financial coverage checks by comparing it with an existing dataset of over 100,000 records. Lastly, I'm supporting 2 teams by facilitating the creation of ad hoc queries for quick insights and seamlessly integrating a new API to fuel system expansion and growth
I formed ETL pipeline to collect 2.5M+ tweets using Twitter API to understand changes of beliefs in people. I extracted 800+ content data using web scraping tools: Selenium and Python and transformed the data items to store in MongoDB for social network analysis between users. I designed a process for classifying an Event by using Wikipedia API, nltk library and Support Vector Machine (SVM) Model to get an accuracy score of 90%. I assisted 5 Fellows in providing statistical and 20+ visual and text insights using Tableau dashboards and nltk library to support the research for reducing the struggles of female refugees and their children.
I developed an 8-layered Matching Pipeline with a match accuracy of 97% to match 2 movie tables to form a unified data source with metdata from both tables. I engineered a production-ready pipeline for creating a database using Databricks and PySpark to perform transformation and loading of 20M+ metadata in 15 tables from the TMDB API to AWS S3 bucket. I restructured a workflow by converting 10+ SQL Queries to PySpark format to automate the process of sending metadata and metrics as an input to machine learning models.
I profiled 900+ store information using Python and Google Maps API to get 8 additional meta data for better analyzing retail transactional performance. I collaborated closely with 2 delivery teams to augment the existing Data Master for retail stores within the enterprise data warehouses and communicate the data reporting to non-technical people. I executed and presented clear documentation of 3 different projects that were an integral part of providing geolocation data for a real-time client project: retail analytics. I conducted log analysis on more than 1000 log files for Data Quality Dashboard with ELK stack.
Coursework: Data Science, Database Management, Data Analysis and Decision Making
Relevant Coursework: Communication Skills, Machine Learning, Data Warehousing
Loving data means you have to be know how to manipulate it, how to store it, how to visualize it and how to make sense of it(no tool for this!).
Programming LanguagesCollected 700+ news title and links using Reddit API to understand which type of news and the new source gets the most attention and why. Created Visualizations to support the findings. Implemented sentiment analysis using nltk library to find sentiments of top news.
Collected 1.2M song dataset from Kaggle and performed data cleaning. Integrated Spotify API to retrieve Songs and Key metrics which were not present in dataset. Implemented cosine similarity from sklearn module to compare and find similar songs from the user input.
Performed Exploratory Data Analysis on 4.5k rows of football transfer data collected from Kaggle. Analysis was done using visualization module matplotlib and seaborn. Made over 20 visualizations using 8 different chart types.
Analysed the dataset containing 40k rows to understand the key metrics for people cancelling hotel reservations. Used descriptive and exploratory analysis in R to understand. Machine Learning Algorithms like Linear Regression, SVM, and Decision Tree to find appropriate model for answering the question “When will user cancels a booking?”. Based on our analysis recommendation were suggested.
Designed and implemented a database management system for handling the booking system of meeting rooms for a company. Using MSSQL created tables, views, procedure, triggers for working with database. Used Microsoft Azure for hosting the database and Microsoft Powerapps for showcasing the different layout of application.
Quisquam vel libero consequuntur autem voluptas. Qui aut vero. Omnis fugit mollitia cupiditate voluptas. Aenean sollicitudin, lorem quis bibendum auctor.
Odio soluta enim quos sit asperiores rerum rerum repudiandae cum. Vel voluptatem alias qui assumenda iure et expedita voluptatem. Ratione officiis quae.
Proin gravida nibh vel velit auctor aliquet. Aenean sollicitudin, lorem quis bibendum auctor, nisi elit consequat ipsum, nec sagittis sem nibh id elit.