Example of processing Lending Club data with VoltDB and H2O. The goal here is to show operationalization of Machine Learning models in VoltDB.
The primary entities here are Loans and Funds.
Loan - Each new loan (as generated by the client or ingested from Kafka) has a bunch of attributes such as (a) The loan Amount, (b) The credit rating of the borrower, (c) The % of loan amount that has been funded already, and over 100 others. Machine Learning is used to predict the risk that is associated with each loan.
Fund - A Fund is an amount of money that is available to be invested in the loans with a risk profile. Each fund has the total amount that belongs to that fund and how that amount should be divided among the different risk levels.
The goal of the application is to match the best fund to invest in a loan based on the calculated risk and the fund's tolerance for risk.
The primary performance metric of this application is the number of new loans accepted per second. To provide maximum scalability for ingestion, the insert workload is partitioned on the unique id of the loan. Since the funds are not updated frequently and do not need a lot of memory, the Funds table is replicated across the database. Since the risk factor is what brings the loans and the funds together, the relationship table that maps the funds to the loans that they are invested in is partitioned by the risk factor.
Since the relevant tables are partitioned by different columns, the application divides the processing of each new loan into two transactions. One transaction accepts the new loan, calculates the risk, and inserts it into the NEW_LOAN table. The other transaction matches the loan with the appropriate fund based on the calculated risk. The application uses VoltDB's loopback exporter for the first transaction to call the second transaction.
- Each new loan is passed to NewLoan procedure which calculates the risk and saves it to the NEW_LOANS table.
- Some of the new loan's data is also written out to the LOAN_BY_RISK stream which is a loopback exporter back to VoltDB
- The AllocateLoan procedure is called for each new loan written out to the LOAN_BY_RISK stream where a decision is made on the loan.