A collection of my various data science projects
-
Web Crawling for data: Capturing data with a scrapy web spider, an open source framework for data scraping. This web spider crawls a retailers site to build an inventory list with prices. The spider-generated inventory table contains over 135,000 entries. Because it could be run through a cloud service, e.g. Scrapy Cloud, this method is extensible and scalable.
Keywords: Web Spider, Web Crawler, Scrapy, Pandas, Data visualization -
Split Test Analysis with Bayes Statistics: A product split test analysis starting from a table of coversion rates.
Keywords: A/B Test, Bayesian inference, Pandas, Data visualization -
Geographic sales data: A sample of geographic sales data for California. Geospatial data (latitude and longitude) is logged from two CSV files and merged into one table by order identification. The geo data is used to extrapolate zip code, city, and average income.
Keywords: Econometrics, Geographic data, Pandas, Google maps, Heatmap, Data analytics, Table merge -
Online dating stats: An analysis, with posterior distributions, of dating data for a Latino test account compared to similar demographics.
Keywords: A/B Test, Bayesian inference, Pandas, Data visualization -
Micro-hydro power generation: Due diligence on the viability of utilizing micro-hydro power generators in California's San Joaquin Valley irrigation canals. This is a work in progress!
Keywords: Entrepreneur ventures, Business Development, Return on investment, Net present value, Lists of cash flows, Levelized cost of electricity, Returns over time -
Fitting a sigmoid function to Silicon Emissivity Data : Reworking recorded intrinsic silicon emissivity data with by fitting sigmoid function using pymc3. A work in progress. Keywords: PYMC3, Bayesian inference, Pandas
- Data Wrangling: A data munging exercise, working with JSON file with 150,000 entries.
Keywords: Data munging, JSON, Large data, Pandas, String manipulation