Git Product home page Git Product logo

ideas-pub's Introduction

IDEAS RESTful API

For visual learners, I've created a YouTube Tutorial that runs through the steps below.

Vanilla Installation Guide

The installation steps have been tested on Ubuntu 12.04, 12.10 and 13.04. I have not tested the installation on other flavors of Linux. That said, the software dependencies are very basic, so it should be workable on other Linux platforms.

The following is my "development stack":

  • Ubuntu 12.10
  • MySQL 5.5
  • Python 2.7
  • Flask 0.9
  • Apache2

I am making some basic assumptions:

  • The user has installed Ubuntu and is comfortable operating a terminal.
  • The user has a stable internet connection, necessary to download Ubuntu packages and IDEAS API RESTful software.
  • The installation should be smooth requiring less than 20 minutes. Longer installation times are estimated for customization of the application.
  • Memory and space requirements depends on the complexity of the queries and the size of the underlying data. The user will determine this based on their needs. (I have used 3.7GB RAM, 160GB RAM, 32-bit or 64-bit)

Getting started

From this point forward, we will be using our Terminal to begin our installation. To get started, you'll need to have a few tools installed on your server.

sudo apt-get install git
sudo apt-get install python-pip
sudo apt-get install python-mysqldb

Installing MySQL

For this project, I use MySQL. If you have your own MySQL server on a separate server, a cloud service, etc -- feel free to use that instead. Note* I realize there are other databases. A later goal is to support database ORMs such as SQLAlchemy.

If you would like to install the MySQL on Ubuntu:

sudo apt-get install mysql-server

Follow the steps of the installation and set your default password. For my example, I use the password "toor". When you see "toor", please realize that this is the default password. Your root password is recommended to be different.

Create a database for MySQL which will store the API data. First log into MySQL using your credentials. For those new to MySQL, -u indicates username and -p indicates password. Notice there is a space between -u and the username while there is no space between -p and the password.

mysql -u root -ptoor

Once you've logged into MySQL, we must create a database where we would like to store the API data. The example below shows how to create a database called "goideas". This matches the file in the IDEAS-Pub configuration file (which will be described further below).

create database goideas;
exit

Clone the GitHub repository

In a directory that you find convenient, (I use my default user directory) clone the GitHub repository.

git clone https://github.com/laironald/IDEAS-Pub.git

There are several directories, many of which are libraries to support the RESTful interface. The one that needs to be configured is the config directory: ~/IDEAS-Pub/IDEAS/config/init.py

On your local machine, configure this file so it matches with the settings to connect to MySQL. The account utilized requires both read and write access to the database. I realize this may be sensitive for security purposes and is an item to address in future updates to the software. I do not recommend using your root username and password, even though this is how it is currently setup.

Installing the Python Library

Make sure to change your directory to IDEAS-Pub. Assuming, you installed this on the user home account, it would be:

cd ~/IDEAS-Pub

From here, installing the IDEAS RESTful API inovlves two steps:

sudo pip install -r requirements.txt
sudo python setup.py install

Note that if you decide to modify the source code or the configuration files, you must re-install the application. You may also need to restart Apache2. Details on installing Apache2 is below. Restarting Apache2 is as simple as:

sudo service apache2 restart

Configuring Flask for Apache2

Below is a abbreviated guide to deploy Flask for Apache2 so that the Python scripts can be transferred through a web service. Please read the official Flask documentation to gain a more thorough understanding of how to deploy Flask.

The first step is to install the Apache HTTP Server

sudo apt-get install apache2

Next, Flask requires the Apache2 module WSGI so let's install that as well

sudo apt-get install libapache2-mod-wsgi

Then, modify the Apache default configuration file. The file on Ubuntu is located at: /etc/apache2/sites-available/default. To modify this document, you will need to use sudo. Include the configuration options between the two elipsis: (...)

<VirtualHost *:80>
    ServerAdmin webmaster@localhost

    ...

    WSGIDaemonProcess ideas user=www-data group=www-data threads=5
    WSGIScriptAlias /v1 /var/www/ideasapi/start.wsgi

    <Directory /var/www/ideasapi>
        WSGIProcessGroup ideas
        WSGIApplicationGroup %{GLOBAL}
        Order deny,allow
        Allow from all
    </Directory>

    ...

</VirtualHost>

You will find a sub directory called ideasapi which is required to set up Apache2 for a production environment. ~/IDEAS-Pub/ideasapi. Clone this folder to /var/www. For the time being, please follow the directions above, but certainly understandable if there are other preferences for the structure

sudo cp -r ~/IDEAS-Pub/ideasapi /var/www/

To conclude, you must restart your server.

sudo service apache2 restart

For reference, log files for Apache2 are stored in the following directory: /var/apache2/log. This can be helpful to observe if there are errors related to interacting with web service. Other web servers such as Nginx can be used, but do consult the appropriate Flask documentation.

Loading data

For convenience, we have generated sample API data. (thanks Sonya). The sample data is located ~/IDEAS-Pub/sample

There are multiple ways to import data into the API. The first is demonstrated with assistance of the file: output.sql.tar.gz and the second approach involves the csv files and ingest.py

Approach 1: MySQL

For our purposes, we have a compressed tar file of the MySQL SQL dump file, which can be loaded to generate the necessary schema. We will want to extract the contents from this tar file.

tar -xzf output.sql.tar.gz

A output.sql file is generated which is a plaintext file that can be loaded into MySQL. First, login to your MySQL server. The goideas provided below is the database that we created in a step above. As noted: if your MySQL server exists on another host, please login that host using the -h op

mysql -u root -ptoor goideas

Once you have logged into your MySQL server, simply load the data by using the source command. You must be accessing this in the same directory the file is located in, otherwise you will have to change the second portion of the command.

source output.sql
exit
Approach 2: CSV files and ingest.py

This method is a bit more involved, but it is not meant to be complicated. In the directory, you will find a file called ingest.py and schema.csv. These two files are used to build the MySQL databases and contain the same information as the output.sql.tar.gz in Approach 1.

schema.csv appears as the following and is purely optional

...

POP_publication_object,variable,title,
POP_publication_object,type,text,

...

This translates to for the POP_publication_object file, the variable name "title" should actually be "text". The ingest script does its best in guessing the most appropriate data type, but can be wrong. Using this method overrides the default behavior. The data types can be found by understanding the MySQL data types (http://dev.mysql.com/doc/refman/5.5/en/data-types.html)

Once the schema file is setup, running the ingest.py script is the final step. There are three options provided.

  1. ingest -- ingest all csv files in the current directory. This means if there are files that do not need to be reloaded, they do not need to be in this directory.
  2. backup -- create a output.sql.tar.gz file to transport to others. this may be a quicker means to load data as in Approach 1.
  3. drop -- this eliminates the data. This may be useful because there is a cache that is generated. Dropping the data allows us to clear the cache and all the data.

A natural data flow for updating the data would be the following.

  1. alter existing csv files or generate new ones
  2. ingest the data

If clearing the cache is appropriate, continue. Otherwise, the following steps are not necessary. Generally speaking, clearing the cache will be required for the data to be relevant.

  1. backup the data
  2. drop the data
  3. using Approach 1, re-load the data

The command to ingest the files is as follows. The second ingest is the option, so modify this to backup or drop if those are the options you would like to pursue.

python ingest.py ingest

Final stuff

Moving forward, updating the software is easy. When the software gets updated on GitHub, simply enter the directory (~/IDEAS-Pub) and pull. Re-install the python files and restart Apache2. There is no dedicated timing on when the software will be updated.

git pull
sudo pip install -r requirements.txt
sudo python setup.py install
sudo service apache2 restart

If you make some awesome modifications to the software, please let me know. Fork it. Collaborate with me. Etc. For now, I have my own private repository of this software and this directory is merely a more polished clone. My goal is to stabalize a few more items so I can feel confident about making it a truly collaborative and open system!

Access the sample data in the API by going to http://[yourdomain.com]/v1/pop/structure. Our is http://api.goideas.org/v1/pop/structure

If the output follows the output below than voila! It works!

{
    "object": [
        "organization", 
        "person", 
        "publication", 
        "grant", 
        "patent"
    ], 
    "legend": [
        "topic"
    ]
}

Relevant links

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.