diff --git a/README.md b/README.md index 22056d1..3951338 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,18 @@ # Introduction to the company_dns To enable a more automated approach to gathering information about companies `company_dns` was created. This release enables the synthesis of data from the [SEC EDGAR repository](https://www.sec.gov/edgar/searchedgar/companysearch.html) and [Wikipedia](https://wikipedia.org). A [Medium](https://medium.com) article entitled "[A case for API based open company firmographics](https://medium.com/@michaelhay_90395/a-case-for-api-based-open-company-firmographics-145e4baf121b)" is available discussing the process and motivation behind the creation of this service. +# Introducing V3.0.0 +The V3.0.0 release of the `company_dns` is a significant update to the service. The primary changes are: +1. Shift from Flask to Starlette with Uvicorn +2. Automated monthly container builds, from the main branch of the repository, using GitHub Actions +3. Simplification of all aspects of the service including code structure, shift towards simpler Docker, and a more streamlined service control script +4. Vastly improved embedded help with a query console to test queries +We were motivated to make these changes to the service making it easier to improve, maintain and use. + + # Installation & Setup The follwing basic steps are provided for the purposes of getting the tool running. -## Get the code +## For developers Get the code Assming you have setup access to GitHub, you'll need to clone the repository. Here we assume you're on a Linux box of some kind and will follow the steps below. 1. If you're performing development create a directory that will contain the code: `mkdir ~/dev` @@ -15,9 +24,6 @@ Before you get started it is important to install all prequisites and then creat 1. Enter the directory with the service bits (assuming you're using ~/dev): `cd ~/dev/company_dns/company_dns` 2. Install all prerequsites: `pip3 install -r ./requirements.txt` -3. Change the `USER_AGENT` setting in `~/dev/company_dns/company_dns/app/pyedgar.conf` to your own user agent definition. If you don't the SEC downloads will fail. - -The utility `dbcontrol.py` will download EDGAR data, process it, and then create a database for the `company_dns`. Note that you do not need to directly run this utility as the service control script will handle it for you. For more information on the database control utility please checkout the [readme](company_dns/app/README.md) for it. ## Service Control Script A service control script, `svc_ctl.sh` is provided to wrap build, run, and log tailing functions as of V2.3.0. Compared to past versions this script significantly simplifies working with the `company_dns` removing many manual steps to getting it running. As a result there is only one step needed to get the service running `cd ~dev/company_dns;svc_ctl.sh up`. This script will: @@ -36,15 +42,11 @@ DESCRIPTION: Control functions to run the company_dns COMMANDS: - help up down start stop create_db build delete_db foreground tail + help start stop build foreground tail help - call up this help text - up - bring up the service including building and pulling the docker image - down - bring down the service and remove the docker image start - start the service using docker-compose stop - stop the docker service - create_db - create a new database cache for the company_dns - delete_db - delete the database cache for the company_dns build - build the docker images for the server foreground - run the server in the foreground to watch for output tail - tail the logs for a server running in the background @@ -53,7 +55,6 @@ COMMANDS: ## Verify that the service is working Regardless of the approach you've taken to run the `company_dns` checking to see if it is operating is important. Therefore you can point a browser to the server running the service. If you're running on localhost then the following link should work [http://localhost:6868/V2.0/help](http://localhost:6868/V2.0/help) however if you're on another server then you'll need to change the server name to the one you're using. If this is successful you will be able to see the embedded help which describes the available set of endpoints, and provides and example query to the service. A screenshot of the help screen can be found below. -![Screen Shot 2022-10-16 at 8 18 57 PM](https://user-images.githubusercontent.com/10818650/196084425-6fd9d724-1f59-4eed-9548-c553168bf387.png) ## Checkout a live system We're hosting an instance of the `company_dns` on our website for our usage and your exploration. Below are several example queries and access to embedded help to get you a better view of the system. @@ -71,20 +72,14 @@ We try to keep high level Todos and Improvements in a list contained in a sectio ### Future work/Todos Here are the things that are likely to be worked but without any strict deadline: -1. ~~Create a simple wrapping script to operationalize service behaviors~~ [see issue #4](https://github.com/miha42-github/company_dns/issues/4) -2. ~~Incrementally refactor the repository and the code~~ -3. ~~Enable TLS on nginx or provide instructions to do so~~, [see issue #10](https://github.com/miha42-github/company_dns/issues/10) 4. Determine if feasible to talk to the companies house API for gathering data from the UK 5. Research other pools of public data which can serve to enrich 6. Evaluate if financial data can be added from EDGAR, Wikipedia and Companies House -7. ~~Clean up stale EDGAR URLs~~ 8. Provide instructions/details for running on a Pi or Arm based system, see Lagniappe below -9. ~~Update README.md with the appropriate language, etc.~~, [see issue #9](https://github.com/miha42-github/company_dns/issues/9) -10. ~~Add additional URLs for news, stock, patents, etc. to the merged response~~, [see issue #11](https://github.com/miha42-github/company_dns/issues/11) -11. ~~Add ticker information from Wikipedia into the response~~, [see issue #7](https://github.com/miha42-github/company_dns/issues/7) + ### The Lagniappe -If you would like to run this on a RasberryPi I'll be adding a couple of configuration files and appropriate instructions later, but until then I suggest you check out [Matt's](https://www.raspberrypi-spy.co.uk/author/matt/) guide to [getting Nginx, UWSGI and Flask running on a Pi](https://www.raspberrypi-spy.co.uk/2018/12/running-flask-under-nginx-raspberry-pi/). At some point if someone would like to create a docker image for these elements running on the Pi that would be great. +Run on a RasberryPi: To be reauthored # License @@ -93,8 +88,7 @@ Since this code falls under a liberal Apache-V2 license it is provided as is, wi # Key Dependencies - [PyEdgar](https://github.com/gaulinmp/pyedgar) - used to interface with the SEC's EDGAR repository - [SQLite](https://www.sqlite.org/index.html) - helps all utilities and the RESTful service quickly and expressively respond to interactions with the other elements to find appropriate company data -- [Flask](https://www.palletsprojects.com/p/flask/) and associated utilities - used to realize the RESTful service -- [nginx](http://nginx.org) - enables hosting of the RESTful service -- Docker & Docker Compose - Container and server framework +- [Starlette](https://www.starlette.io) - used to create the RESTful service +- [Uvicorn](https://www.uvicorn.org) - used to run the RESTful service - [GeoPy with ArcGIS](https://github.com/geopy/geopy) - Enables proper address formatting and reporting of lat-long pairs for companies - [wptools](https://github.com/siznax/wptools/) - provides access to MediaWiki data for company search diff --git a/pyedgar.conf b/pyedgar.conf index 6566874..e231744 100644 --- a/pyedgar.conf +++ b/pyedgar.conf @@ -56,7 +56,7 @@ INDEX_CACHE_PATH_FORMAT=full_index_{year}_Q{quarter}.gz KEEP_ALL=True KEEP_REGEX= ; User Agent for downloading, to keep the SEC happy -USER_AGENT=Mediumroast, Inc. hello@mediumroast.io +USER_AGENT=company_dns hello@mediumroast.io [Index] ; Index file settings INDEX_DELIMITER=\t diff --git a/svc_ctl.sh b/svc_ctl.sh index 997871a..a21ed3a 100755 --- a/svc_ctl.sh +++ b/svc_ctl.sh @@ -82,26 +82,6 @@ function bring_down_server () { print_footer $FUNC } -function bring_up_server () { - FUNC="Bring up service" - STEP="bring_up_server" - print_header $FUNC - - print_step "Create cache db" - create_db - - print_step "Build docker images" - docker-compose build - - print_step "Pull docker images" - docker-compose pull - - print_step "Bring up ${SERVICE}" - docker-compose up -d - - print_footer $FUNC -} - function stop_server () { FUNC="Stop ${SERVICE}" STEP="stop_server" @@ -152,10 +132,6 @@ function tail_backend () { ################################### -function create_db () { - python3 ./makedb.py -} - function print_help () { clear echo "NAME:" @@ -165,14 +141,11 @@ function print_help () { echo " Control functions to run the ${SERVICE}" echo "" echo "COMMANDS:" - echo " help up down start stop create_db build delete_db foreground tail" + echo " help start stop build foreground tail" echo "" echo " help - call up this help text" - echo " up - bring up the service including building and pulling the docker image" - echo " down - bring down the service and remove the docker image" echo " start - start the service using docker-compose " echo " stop - stop the docker service" - echo " create_db - create a new database cache for the ${SERVICE}" echo " build - build the docker images for the server" echo " foreground - run the server in the foreground to watch for output" echo " tail - tail the logs for a server running in the background" @@ -190,13 +163,6 @@ function print_help () { if [ ! $1 ] || [ $1 == "help" ]; then print_help -elif [ $1 == "up" ]; then - create_db - bring_up_server - -elif [ $1 == "down" ]; then - bring_down_server - elif [ $1 == "start" ]; then start_server @@ -204,7 +170,6 @@ elif [ $1 == "stop" ]; then stop_server elif [ $1 == "build" ]; then - create_db build_server elif [ $1 == "foreground" ]; then @@ -213,9 +178,6 @@ elif [ $1 == "foreground" ]; then elif [ $1 == "tail" ]; then tail_backend -elif [ $1 == "create_db" ]; then - create_db - fi exit 0