Projects
Anything British
Applied Bayesian methods to explore the relationship between musical talent and national origin using Spotify’s Daily Streaming dataset of 8.5M records. Features engineered by combining GCP’s Custom Search Engine with spaCy’s Named Entity Recognition (NER) feature.
View on Github Tableau Visualizations
Visualizing Markets with Natural Language Processing
Deployed a Flask app with Heroku to visually discover insights in the AI & Machine Learning startup market using Natural Language Processing. Using Crunchbase data, NLP and Dimensionality Reduction were applied to company descriptions / categories to create an interactive app to explore competitor proximity with Bokeh. Language preparation & processing utilized spaCy, NLTK, CountVectorizer, LDA, t-distributed Stochastic Neighbor Embedding (T-SNE), and TfidfVectorizer.
View on Github Try on Heroku Medium Article
Species Identification with Convolutional Neural Networks
Applied convolutional neural networks with Keras / TensorFlow to compute regional species classification using imagery from Texas wildlife cameras and Microsoft Cognitive Services API. Data were stored in S3, verified with AWS Rekognition, processed on EC2 GPU, and presented with Tableau. The deployed model's functionality demonstrated text notifications to mobile devices using AWS SNS when species were positively identified.
Net Promoter Scores, Version 2.0
Modernized the Net Promoter Score metric for online reviews using NLP (Spacy, Gensim, CountVectorizer, TF-IDF), topic modeling (LDA/NMF/CorEx), sentiment analysis (VADER), Naive Bayes and Logistic Regression. The Amazon Customer Reviews Dataset of 130M records was stored in MongoDB format on AWS EC2.
Rural Land Valuation
Developed a Multivariate Regression model for farm & ranch land valuation using data scraped from online property listings with Python / BeautifulSoup and features engineered with Natural Language Processing and Google Cloud Platform’s Maps API. Project includes use of Lasso, Ridge, ElasticNet, XGBoost, and Multilayer Perceptron regressors and parameter optimization with LassoCV, RidgeCV, Yellowbrick, and GridCV.
You Are What You Eat! Predicting Diets from Instacart Orders
Engineered features from Kaggle Instacart dataset to classify users practicing a paleolithic diet. Built classification model that leveraged Logistic Regression and weighted the F1 results. Data stored in PostgreSQL (AWS).