-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathAnnotatedBibliography.Rmd
135 lines (82 loc) · 8.1 KB
/
AnnotatedBibliography.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
title: 'Improving P2P Lending Returns: An Annotated Bibliography'
author:
- affiliation: Independent
email: kuhnrl30@gmail.com
location: 'Basking Ridge, NJ 07920'
name: Ryan Kuhn
date: "`r format(Sys.Date(),'%B %d, %Y')`"
output:
pdf_document: default
pdf_docuement:
includes:
in_header: latexOps.tex
number_sections: yes
html_document:
number_sections: yes
keywords:
- P2P Lending
- Marketplace lending
- Lending Club
csl: journal-of-finance-annotated-bib.csl
spacing: double
bibliography: bibliographies.bib
---
## Introduction
Peer-to-peer lending connects borrowers with investors through an online marketplace. Loans offered through the marketplace are made possible by investors, who invest in exchange for solid returns.
Current models suggest that investor profitability can be improved with the application of high quality data analysis. The existing body of research generally focuses on assigning a probability of default to minimize adverse selection rather than optimizing for investor profitability. To encourage the consolidation of knowledge, this bibliography is a source of research and articles from around the web that could be used to understand the current state of analysis.
This bibliography classifies articles into one of three categories based on their source and intended audience: academic research, industry analysis, and models 'in the wild'. Industry analysis are generally published by a thought leadership organization or service provider to explain a narrowly focused insight by reviewing relationships between specific variables. Models use by individual investors may methodically review loan attributes to propose further testing approaches or they may develop an investment strategy by applying an algorithm.
## Academic Research
[@litAlaraj]
[@litCipoeru]
[@litEmekter]
[@litGe]
[@litGuo]
[@litNandi]
[@litReddy]
[@litSerrano]
This article proposes a decision support system for choosing loans optimized for profitability. This is in contrast to previous literature which focused on modeling default probabilities without considering the increased interest rates to compensate the investor's risk. The authors proposed using internal rate of return, or IRR, to measure of a loan's profitability. IRR was chosen because it is able to handle uneven payments at non-uniform intervals. They created several models to predict a loan's IRR.
[@litZhou]
The article explores the distribution of Loss Given Default (LGD) and uses multi-linear regression to understand the drivers. The paper stops short of suggesting an application of the knowledge to pricing and valuation decisions.
## Industry Analysis
Orchard
Mon Ja
Lending Times
Lend Academy
Lending Robot
NSR/ Peer Cube
Peer Lending Server
[@mdlSocialLender]
[@indLendAcad1]
[@indWu1]
## Modeling "In the Wild"
[@mdlCashorali]
This geographical analysis of loans prior to Q3 2011 showed loans in California and Florida defaulted more often whereas Texas, Pennsylvania, and New Jersey had comparably low default rates.
[@mdlDarre]
An analysis exploring the relationship between many of the loan attributes and the historical default experience. The author puts forth the analysis as exploratory rather than seeking to answer specific questions. Hence, their are only observations and no conclusions. The observations include: Default rates decrease with credit history age; higher credit utilization leads to greater risk of default; loan grades are progressively less correlated with the FICO score, presumably, as LC's rating algorithm improved.
[@mdlDavenport]
An analysis to understand which attributes are driving Lending Club's interest rates. It includes a brief summary of the components of the FICO score, such as length of credit history and payment history, and how that may be a starting point for identifying confounding variables. The discussion suggests finding local optima of credit risk to interest rate given that LC uses flat interest rates for loan sub grades. The article does not reflect the subsequent changes in LC's scoring model since 2013.
[@mdlDavenport2]
An expansion on the author's earlier analysis but with discussion around challenges in improving the model. One such challenge is overcoming the numerous unique employment titles to create meaningful groupings.
[@mdlDavis]
This analysis is proposes a simple valuation method by weighting the interest rate by the probability of defaulting to arrive at the expected interest rate. The author used this valuation method along with a logistic regression model built to predict the default risk to show that returns can be significantly improved for portfolios with less than 400 loans.
[@mdlKaggle]
[@litKeough]
An article exploring the concept that fraudulently reported income increases default risk since the borrower was granted more credit than their ability to repay warranted. The authors hypothesize that borrowers with verified income will default less than non-verified borrowers but their limited testing leads to the opposite conclusion. The superficial analysis did not identify nor control for confounding variables leaving the conclusion unreliable.
[@mdlMistry]
An insightful analysis using 3 unique methods: modeling profitability with survival analysis, measuring significance with concordance instead of p-values, and measuring profit as the ratio of cash in(out) flows. Survival analysis can measure the probability that the cash inflows, or payment received, will reach a given payback threshold. Concordance analysis is more robust to big data and prevents statistically significant variables that don't improve the model.
[@mdlMoy]
This blog post introduces a simple method for quantifying the expected investment return using the results of the loan classification algorithm. The model assumes you will only invest in loans predicted not to default. Of those loans, use the probability of the loan still defaulting (Type II) to weight the gain or loss for each state.
$[P(Pred= "No Default" | No Default) * Gain]- [P(Pred= "No Default" | Default ) * Loss] * P(Pred = "No Default")$
[@mdlOrourke]
A blog post which quickly moves through a cursory exploratory analysis and on to applying machine learning algorithms. O'Rourke applies two algorithms to Lending Club's historical loan data from 2007-2011: Decision Tree and Logistic Regression. The author did not develop a hypothesis to test nor contributed new insights to the body of knowledge. The article is a common form of analysis meant to demonstrate the author's ability to apply a model to a dataset (a necessary step in the learning process).
[@mdlPatierno]
Using a genetic algorithm, the author proposes a rules based approach for selecting a mix of loans which optimizes the cash inflows.
[@mdlPolena]
An article demonstrating how to use Watson Analytics on Lending Club data. The cursory analysis showed that employment length was negatively correlated with default probability. Meaning that over 6 or more years of employment defaulted more than less than 5 years.
[@mdlSummers]
This article takes the first steps in building a model to optimize profitability but falls short by not including lost principle in returns calculation. The 'loss' of a defaulted loan is the opportunity cost of lost interest and understates the downside risk for a given loan.
[@mdlToth]
An exploratory analysis of loan attributes and their connection to the loan's outcome. Toth provides a step-by-step write up of how he analyzed data from 2012-2013. The article does not include a summary nor a discussion of the results. An interesting method was to group the loan purposes into three categories: 1) debt, 2) major purposes such as home improvement, and 3) luxury purchase such as vacations and weddings.
[@mdlTsai]
This master's thesis starts from the goal of not losing money and optimizes for precision, or identifying all defaulted loans at the expense of misidentifying performing loans. This is accomplished by introducing a penalty function which decreases overall accuracy but increases the precision. The authors apply their model to identify loans with interest rates set higher then the risk profile warrants.