Cognos Analytics on Cloud and Watson Studio on Cloud now work better together. Cognos Analytics users can now connect to the more powerful data science capabilities in Watson Studio: AutoAI, Jupyter Notebooks, and GPUs. With this integration, both data science and business intelligence teams can share a single ecosystem to make the most of their organization's data.
The integration between the two offerings serves as a bridge to empower data scientists and business analysts to collaborate on the cloud. Data scientists can easily script against governed Cognos data in Watson Studio and share results back into their Cognos ecosystem.
This code pattern showcases this integration by guiding the user through an examination of credit risk related data. You will refine the data and build a model using Watson Studio and Watson Machine Learning. The model is then used to score new credit applications to determine if they are a risk or not. The results are then fed into Cognos Analytics, where you can create visualizations to provide greater insights into the factors that most influence the credit-worthiness of the applicants.
- Credit risk data is loaded into Cognos Analytics.
- Data Scientist runs Jupyter notebook in Watson Studio.
- Data from Cognos Analytics is loaded into Jupyter notebook where it is prepared/refined for modeling.
- Jupyter notebook uses Watson Machine Learning to create a credit risk model.
- New credit applications are scored against the model and the results are pushed back into Cognos Analytics.
- Business Analyst runs Cognos Analytics to visualize the results.
- Cognos Analytics: A business intelligence solution that empowers users with AI-infused self-service capabilities that accelerate data preparation, analysis, and repot creation.
- IBM Watson Studio: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
- Jupyter Notebooks: An open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text.
- pandas: A Python library providing high-performance, easy-to-use data structures.
- Clone the repo
- Upload data file into Cognos Analytics
- Create a new Watson Studio project
- Create a Cognos Analytics connection in Watson Studio
- Create data access token
- Create the notebook in Watson Studio
- Add data to the notebook
- Run the notebook
- Refine the data and create a data model
- Write out data using Cognos Analytics connection
- Visualize the data in Cognos Analytics
git clone https://github.com/IBM/credit-risk-analysis-with-cognos-analytics-and-watson-studio.git
-
Log into IBM's Cognos Analytics.
-
From the Cognos Analytics main dashboard, select the
+
icon in the lower left corner and selectUpload files
. -
From the file selection dialog, select the two
CSV
files located in your localdata
folder. In this example, the files have been uploaded into thecognos-studio-data
folder in Cognos Analytics.
-
Log into IBM's Watson Studio. Once in, you'll land on the main dashboard.
-
Create a new project by clicking
New project +
and then click onCreate an empty project
. -
Enter a project name.
-
Choose an existing Object Storage instance or create a new one.
-
Click the
Create
button.
Upon a successful project creation, you are taken to the project Overview tab. Take note of the Assets and Settings tabs, we'll be using them to associate our project with any external assets (datasets and notebooks) and any IBM cloud services.
-
From you Watson Studio project dashboard, click
Add to project +
, and select theConnection
option. -
Select
Cognos Analytics
as the data source. -
Configure the connection to point to your Cognos Analytics instance. Provide a name for your connection, plus
namespace
,username/password
, andURL
. -
Click the
Create
button.
- From you Watson Studio project dashboard, selected the
Settings
tab.
- Scroll down to the
Access tokens
list, and click onNew token +
.
- Provide a name for your token and set the access level to
Editor
.
- Click the
Create
button. - To view the token value, select the action menu located on the right-hand side of the token row, and click on
View
.
-
From the new project
Overview
tab, click+ Add to project
on the top right and choose theNotebook
asset type. -
Fill in the following information:
- Select the
From URL
tab. [1] - Enter a
Name
for the notebook and optionally a description. [2] - For
Select runtime
select theDefault Python 3.6
option. [3] - Under
Notebook URL
provide the following url [4]:
https://raw.githubusercontent.com/IBM/credit-risk-analysis-with-cognos-analytics-and-watson-studio/master/notebooks/german-credit-risk.ipynb
- Select the
-
Click the
Create
button.TIP: Your notebook will appear in the
Notebooks
section of theAssets
tab.
-
After creation, the notebook will automatically be loaded into the notebook runtime environment. You can re-run the notebook at any time by clicking on the
pencil
edit icon displayed in right-hand column of the notebook row. -
The second cell of the notebook should be labeled as a
@hidden_cell
. This is where we will load in our Cognos AnalyticsCSV
file.Note: This cell is marked as a
@hidden_cell
because it will contain sensitive credentials. -
Use
Find and Add Data
(look for the 01/00 icon) and itsConnection
tab. -
Place your cursor at the bottom of the
@hidden_cell
(indicated by the red arrow), then selectinsert to code
, and click on thepandas DataFrame
option. -
The
@hidden_cell
should now contain the access token and data connector that will allow you to upload your Cognos AnalyticsCSV
file. -
In the next cell, change the path and file name to point to the German credit model data stored in Cognos Analytics.
TIP: To search the data files that your connector points to, run the following command in the cell:
CADataConnector.search_data("keyword")
When a notebook is executed, what is actually happening is that each code cell in the notebook is executed, in order, from top to bottom.
Each code cell is selectable and is preceded by a tag in the left margin. The tag format is In [x]:
. Depending on the state of the notebook, the x
can be:
- A blank, this indicates that the cell has never been executed.
- A number, this number represents the relative order this code step was executed.
- A
*
, this indicates that the cell is currently executing.
There are several ways to execute the code cells in your notebook:
- One cell at a time.
- Select the cell, and then press the
Play
button in the toolbar.
- Select the cell, and then press the
- Batch mode, in sequential order.
- From the
Cell
menu bar, there are several options available. For example, you canRun All
cells in your notebook, or you canRun All Below
, that will start executing from the first cell under the currently selected cell, and then continue executing all cells that follow.
- From the
- At a scheduled time.
- Press the
Schedule
button located in the top right section of your notebook panel. Here you can schedule your notebook to be executed once at some future time, or repeatedly at your specified interval.
- Press the
Initially, the German credit risk dataset is very large and contains irrelevant data. Through a series of explorations, the DataFrame is reduced by eliminating unrelated variables and data where the vales are 0 or undefined.
The notebook also performs some data visualizations to find patterns, detect outliers, understand distribution and more. It uses graphs, such as:
- Histograms, boxplots, etc. to find distribution / spread of our continuous variables.
- Bar charts to show frequency in categorical values.
The notebook concludes by creating a machine learning model. It uses the insights and intuition gained from the data visualizations to determine what kind of model to create and which features to use. The end result will be a simple classification model.
NOTE: An example version of the notebook (shown with output cells) can be found in the
notebooks/with-output
folder of this repo.
Once the model is created, we will use it to score a new set of credit applications.
First we need to upload the new application data from Cognos Analytics using our data connector.
NOTE: You will need to change the path and file name to point to the new German credit applications data in Cognos Analytics.
The new applications will be scored using our model, and a new DataFrame will be created that contains the result - whether the application is considered a credit risk or not.
Once the new scored DataFrame is created, we can use the data connector to write the data back out to Cognos Analytics for further investigation and data visulization.
In this example, we are writing it out to a file named german_credit_new_apps_scored.csv
.
Once complete, you should see the file in the data folder of your Cognos Analytics instance.
Log into your Cognos Analytics instance and navigate to your data directory that contains the German credit risk data, including the scored data we generated in the previous step.
From the Cognos Analytics main dashboard, select the +
icon in the lower left corner and select Data module
.
From the file selection dialog, select the original German credit risk data we used to create the machine learning model, and the two files related to new credit applications. In this example, the file names are:
-
german_credit_model_data.csv
- the original data set used to create our scoring model. -
german_credit_new_apps_data.csv
- new credit applications that are to be scored. -
german_credit_new_apps_scored.csv
- new credit application scores.
Click OK
.
From the Data module
panel, select the Relationship
tabs. We need to create a link between the german_credit_new_apps_data.csv
and german_credit_new_apps_scored.csv
files. The new applications file contains all of the data used to determine risk, and the scored file just contains the result.
Right click on either file icon and select the option Relationship...
.
From the Relationship
dialog, select the other file as Table 2
. Then select the field CustomerID
in both tables.
Click Match Selected Columns
, then click OK
.
Click the Save
icon in the top menu to name and save the Data module
.
From the Cognos Analytics main dashboard, select the +
icon in the lower left corner and select Dashboard
. Accept the default template and click OK
.
Click Select a source
to bring up the selction dialog. Select the Data module
you just created in the previous step, and click OK
.
You should then see a blank canvas to create your dashboard.
From the image above:
- [1] The data module currently associated with the dashboard.
- [2] The resources included in the data module.
- [3] The dashboard canvas.
- [4] The tabs defined for the dashboard.
To create your dashboard, you will need to become knowledgeable with the numerous tools available from icons and pop-up menus.
From the image above:
- [1] Toggles you between edit and preview mode.
- [2] Toggles display of the resouces (data objects) in the data module.
- [3] An example of one of many drop-down menus associated with data objects.
- [4] Displays the relationship between all of the visual objects on your dashboard. Objects with the same number are related.
- [5] Toggles full-screen mode on and off.
- [6] Toggles display of the filter panels.
- [7] Displays the fields associated with the selected visual object.
- [8] Displays the properties associated with the selected visual object.
- [9] Filters that can be applied to dashboard visual objects. The filter can be set for all dashboard tabs (left side), or for the current tab (right side).
The types of visualizations available include the following:
Our first visualization we will be a spiral which will rank how important each of the drivers are in determining credit risk.
Select the PredictedRisk
field in the german_credit_risk_new_apps_scored
file in the data source list and drag it onto the canvas.
The toolbar at the top of the window is active for the currently selected visualization. For convenience, you can click on Undock toolbar
to have the toolbar snap next to the selected visualization.
Click on the anchor icon to bring up the toolbar for the visualization. Then click on the Change visualization
tool. In this particular case, the default visualization choosen for the data type is a table
. We need to change this to a spiral
.
From the pop-up menu, click All visualiztions
to open up the list of available visualizations. Select spiral
.
From the visualization toolbar, click on the Edit the title
icon, and set the title to Owns Property and Existing Savings is the best predictor of credit risk
.
Use the box sizing tools to position the box in the upper left-hand corner of the dashboard.
Use the Expand/Collapse
button in the upper right-hand corner of your visualization to view in expanded mode or to collapse the view in your dashboard canvas.
As you can see, the spiral
visualization ranks the drivers that influence the target field - PredictedRisk
.
NOTE: Use the
Fields
tab to change what the visualization is based on, and use theProperties
tab to modify the look and feel of the visualization.
Next we will show what effect Loan Amount
and Loan Duration
have on credit risk.
Click the Visualizations
icon in the left navigation bar and select the Line
graph icon and drap it onto the canvas.
With the new visualization selected, click on the Fields
tab.
Going back to our resource list, from the german_credit_risk_new_apps_data
file, drag LoanAmount
to the x-axis
field, and LoanDuration
to the y-axis
field.
From the german_credit_risk_new_apps_scored
file, drap PredictedRisk
onto the visualization, which should assign it to the Color
field.
If desired, you can use the Change color palatte
option under the Properties
tab to change the line colors.
As the graph indicates, the higher the loan amount and the longer the length of the loan, the higer the risk.
Change the title of the visualization to Loan amount and duration vs predicted risk
.
Our last visualization will be to show the associated risk factors related to the original German credit dataset that had an assigned risk value for each applicant.
Select the Risk
field in the german_credit_risk_model_data
file in the data source list and drag it onto the canvas.
Change the visualization to a Decision tree
.
As you can see, all possible Employment
values are expanded to show risk when combined with other factors.
Change the title of the visualization to Historic risk drivers
.
Here is what the final version of the dashboard should look like:
Now that you understand Data Modules
and Dashboards
, feel free to explore new visualziations using the German credit risk data.
This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.