-
server.js
serves whole news data as an object if requestedGET
/news
. -
crawler.js
scrapes news of the day and save as<date>.json
file and MySQL.-
MySQL: 'news' table in 'MOSAIQ' database
-
colums of 'news' table
Field Type Null Key Default Extra news_id INT(11) NO PRI NULL AUTO_INCREMENT date DATE NO NULL publisher VARCHAR(20) NO NULL headline TINYTEXT YES NULL body MEDIUMTEXT YES NULL img TEXT YES NULL link TINYTEXT NO NULL type VARCHAR(10) YES NULL isFirst VARCHAR(1) NO NULL
-
-
autoCrawler.js
acts just ascrawler.js
except that it automatically scrapes articles every 5 a.m. -
plainText.js
removes tags from articles and count their words -
clientModel.js
shows how to request data to the server.- the server returns the array of objects.
- data form
[
{
date:<date>,
publisher:<publisher>,
type:<type of article>,
headline:<article headline>,
body:<article body>,
img:<list of images>,
length:<length of the body>,
isFirst:<'Y' or 'N'>
},
...
]
-
plainText.js
reads rows withtype
from MySQL MOSAIQ database.
Then, it makes filefile.csv
of the data for machine learning.
- ssh login to AmazonAWS
git pull
in~/workspace/MOSAIQ
npm run build
in~/workspace/MOSAIQ
- when the build is done,
sudo cp -rf ~/workspace/MOSAIQ/build/* /var/www/html/
- now we have to stop server.js.
forever list
to see the pid of server.js run by background process. forever stop <pid of server.js>
to stop the process.forever start server.js <MySQL password>
to start the process.