didactic-autogiggle

Useful codes for working with structured data

Basic data prep steps for structured data before regressions:

'dataprep_plots_code_pp' Tasks performed by the code:

Outlier treatment (cap at 1 percentile and 99 percentile)
Missing value treatment (substitute with 0 or substitute with median/mean)
Create visualizations (13 different types) all customized in terms of fonts etc so that it can be directly inserted into word doc reports.
Factor analysis template to check whether the responses to survey questions used to measure different constructs are coming out consistently or not using chronback alpha

Code for applying LDA for text classification.

'LDAR_pradeep_28June.ipynb' code has code in R. Once data is ready in terms of 'documents', it does the following operations:

Makes text lowercase, removes stopwords, numbers and whitespaces and creates DTM
Creates vocabulary from DTMs
Create ngrams
Apply LDA
identify top terms for each topic
Score the documents on topics

'LDApy_pradeep_28June.ipynb' code has code for applying LDA for text classification using python. Still WIP:

remove stop words, tokenize, lemmatize, create dtm
create vocab from DTM
apply LDA

Code from 'crowdfunding as an alternative to VC funding' project.

'SIC_assignment.R' assigns Standard Industrial Codes to Angel.co companies based on textual description of the company:

Tokenize words, create vocab and create DTM from each text document
calculate tfidf for each dtm
calculate cosine similarity of each document with SIC code with document with SIC code
assign SIC code to document without SIC code based on the closest company calculated using cosine similarity

Other data prep codes for the project.

'crowdfunding.py'
'CF_dataprep_code_5June.R'
'Crowdfunding_2_5_master.egp'

Code from 'Women on company boards and the impact of diversity on innovation outcomes and risk taking ability of firms'

'womboards_4Jun.py' dataprep
'womboards_28May.do' regressions

Success of micro-entrepreneurs in rural settings

'CSC.py' dataprep
'cscreport_regs.do' analysis and regressions

dataprep for networks

'networks.py'

dataprep for 'entrepreneurial clusters' project

'cluster.py'

Extract data (tweets) from Elasticsearch and store in CSV format

'Elasticsearch_to_csv.ipynb'

pradgol / didactic-autogiggle Goto Github PK

didactic-autogiggle's Introduction

didactic-autogiggle

Basic data prep steps for structured data before regressions:

Code for applying LDA for text classification.

Code from 'crowdfunding as an alternative to VC funding' project.

Code from 'Women on company boards and the impact of diversity on innovation outcomes and risk taking ability of firms'

Success of micro-entrepreneurs in rural settings

dataprep for networks

dataprep for 'entrepreneurial clusters' project

Extract data (tweets) from Elasticsearch and store in CSV format

didactic-autogiggle's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent