This project creates a graph of New York Times Articles (node), Users (node), and user's comments (relationships). The data was taken from kaggle: see New York Times Comments.
The data contains information about the comments made on the articles published in New York Times in Jan-May 2017 and Jan-April 2018. The month-wise data is given in two csv files - one each for the articles on which comments were made and for the comments themselves. The csv files for comments contain over 2 million comments in total with 34 features and those for articles contain 16 features about more than 9,000 articles.
Running this project will start 2 services:
Neo4J Graph @ http://localhost:7474 | Toy Recommender @ http://localhost:5000 |
---|---|
git
docker
docker-compose
Note: Downloading our dataset (via setup.sh
) uses kaggle API credentials stored in ~/.kaggle/kaggle.json
. See API Credentials from the kaggle-api
docs.
git clone https://github.com/jvani/nyt-comments.git && \
cd nyt-comments && \
./setup.sh
The setup.sh
script will do the following:
- Pull used docker images.
- Download neo4j plugins (written to
$PWD/plugins
) - Download our dataset (written to
$PWD/kaggle/
) - Create our graph data (written to
$PWD/import
) - Import our data into neo4j (database files will be written to
$PWD/data
) - Start our docker-compose services: neo4j and toy recommender. NOTE: a jupyter server is included but the image is HUGE and off by default; uncomment in the
docker-compose.yaml
if desired. - Run graph statistics on our database.