An archive of historical mutual fund pricing information in India. This repository contains the raw data as well as the scripts used to generate it.
The data in the data/
directory is in the same format as AMFI exports it, without any changes. This notably includes a few errors:
- Invalid ISINs (such as
NOTAPP
orNA
for "Not Applied"), and a few with invalid prefixes (IINF
instead ofINF
) or lowercase ISINs. - Invalid NAV. (
"#N/A",'#DIV/0!','N.A.', 'NA', 'B.C.', 'B. C.'
)
You can get the latest dataset at https://github.com/captn3m0/historical-mf-data/releases/latest/funds.db.zst. Each dataset includes all historical NAVs at all known times for a given mutual fund. See below for the data format.
The dataset does not include search indexes to reduce the download size. Please run the following commands to setup:
wget https://github.com/captn3m0/historical-mf-data/releases/latest/download/funds.db.zst
unzstd funds.db.zst
# Create search indexes
# Main Index to get NAV by date and scheme_code
echo 'CREATE INDEX "nav-main" ON "nav" ("date","scheme_code")' | sqlite3 funds.db
# Index by scheme code separately to get NAV for all dates
echo 'CREATE INDEX "nav-scheme" ON "nav" ("scheme_code")' | sqlite3 funds.db
# Index all securities by scheme_code for joins with NAV table
echo 'CREATE INDEX "securities-scheme" ON "securities" ("scheme_code")' | sqlite3 funds.db
# Index all securities by isin for metadata information
echo 'CREATE INDEX "securities-isin" ON "securities" ("isin")' | sqlite3 funds.db
The versioning scheme follows SemVer, with the date being used for the minor and patch version in a MAJOR.MINOR.YYYYMMDD
format. This results in the date being clearly provided in the version number.
- The Major number is currently 0, to denote alpha release status. It will be bumped to 1 once the database schema is stable.
- Minor releases will be bumped on non-breaking changes to the schema - such as new fields, or indexes being added, or deprecation announcements.
- Major version will be bumped only on breaking changes.
- In departure from SemVer, minor and patch versions might not get reset to
0
on major and minor upgrades respectively. - There's no guarantee about Pricing Information from release date being present in the dataset.
The output dataset is a SQLite Database, with the following schema:
scheme_code INTEGER PRIMARY_KEY
scheme_name TEXT
date
scheme_code INTEGER
nav FLOAT
FOREIGN KEY (scheme_code) REFERENCES schemes(scheme_code));
isin TEXT UNIQUE
--- 0=Growth/Divident Payout
--- 1=Divident Reinvestment
type INTEGER
scheme_code INTEGER
FOREIGN KEY (scheme_code) REFERENCES schemes(scheme_code));
Helper view to directly query NAV against the ISIN.
isin TEXT
date
nav FLOAT
Since Data is not always available on all dates, you need to get the latest value before or on that date:
SELECT date,nav from nav_by_isin
WHERE isin='INF277K01741'
AND date<='2023-03-23'
ORDER BY date DESC
LIMIT 0,1
SELECT nav from nav_by_isin
WHERE isin='INF277K01741'
ORDER BY date DESC
LIMIT 0,1
SELECT date,nav from nav_by_isin
WHERE isin='INF277K01741'
ORDER BY date DESC
LIMIT 0,90
SELECT isin,type,S1.scheme_code,scheme_name FROM securities S1
LEFT JOIN schemes S2 ON S1.scheme_code = S2.scheme_code
SELECT isin,type,S1.scheme_code,scheme_name FROM securities S1
LEFT JOIN schemes S2 ON S1.scheme_code = S2.scheme_code
WHERE isin='INF277K01741'
Licensed under the MIT License. See LICENSE file for details.