Projects

Anything British

image-left

Applied Bayesian methods to explore the relationship between musical talent and national origin using Spotify’s Daily Streaming dataset of 8.5M records. Features engineered by combining GCP’s Custom Search Engine with spaCy’s Named Entity Recognition (NER) feature.

View on Github Tableau Visualizations


Visualizing Markets with Natural Language Processing

image-left

Deployed a Flask app with Heroku to visually discover insights in the AI & Machine Learning startup market using Natural Language Processing. Using Crunchbase data, NLP and Dimensionality Reduction were applied to company descriptions / categories to create an interactive app to explore competitor proximity with Bokeh. Language preparation & processing utilized spaCy, NLTK, CountVectorizer, LDA, t-distributed Stochastic Neighbor Embedding (T-SNE), and TfidfVectorizer.

View on Github Try on Heroku Medium Article


Species Identification with Convolutional Neural Networks

image-left

Applied convolutional neural networks with Keras / TensorFlow to compute regional species classification using imagery from Texas wildlife cameras and Microsoft Cognitive Services API. Data were stored in S3, verified with AWS Rekognition, processed on EC2 GPU, and presented with Tableau. The deployed model's functionality demonstrated text notifications to mobile devices using AWS SNS when species were positively identified.

View on Github


Net Promoter Scores, Version 2.0

image-left

Modernized the Net Promoter Score metric for online reviews using NLP (Spacy, Gensim, CountVectorizer, TF-IDF), topic modeling (LDA/NMF/CorEx), sentiment analysis (VADER), Naive Bayes and Logistic Regression. The Amazon Customer Reviews Dataset of 130M records was stored in MongoDB format on AWS EC2.

View on Github


Rural Land Valuation

image-left

Developed a Multivariate Regression model for farm & ranch land valuation using data scraped from online property listings with Python / BeautifulSoup and features engineered with Natural Language Processing and Google Cloud Platform’s Maps API. Project includes use of Lasso, Ridge, ElasticNet, XGBoost, and Multilayer Perceptron regressors and parameter optimization with LassoCV, RidgeCV, Yellowbrick, and GridCV.

View on Github


You Are What You Eat! Predicting Diets from Instacart Orders

image-left

Engineered features from Kaggle Instacart dataset to classify users practicing a paleolithic diet. Built classification model that leveraged Logistic Regression and weighted the F1 results. Data stored in PostgreSQL (AWS).

Coming Soon!