New Show Hacker News story: Show HN: Database for analyzing US companies, visualize using Apache SuperSet
Show HN: Database for analyzing US companies, visualize using Apache SuperSet
9 by tessbi | 1 comments on Hacker News.
My main motivation was that I wanted to be able to drill down and filter across all the available stocks, look at the data for myself, and narrow down on the stocks I am interested based on my own sets of criteria, and make data-driven analysis for my personal investment strategies. I used PostgreSQL as the backend database for ELT data pipelines, and used Citus Data cstore_fdw for columnar compression for the final dataset. All financial data is coming from SEC Edgar, https://ift.tt/Kc5gUDG . I used Python for downloading most of the data. I also run the data load development locally on my home Ubuntu server that I built 5 years ago. I bought 4TB of M2 disks for best database performance, with PRIME B360M-A motherboard and Intel Chip Coffee Lake S. I built the website simply using WordPress, and I run Apache Superset using gunicorn via Apache Webserver reverse proxy. The registration form I had to build myself with PHP and some JavaScript, because it needed to automatically create a SuperSet user upon registration. Otherwise, I would need to input everyone manually. I used Python again for the data integration. Please don't use the database directly as an investment tool, as its in Beta, and the data still needs to undergo heavy data quality checks, please confirm all the numbers yourself, as I provide a link for every company to the SEC filings.
9 by tessbi | 1 comments on Hacker News.
My main motivation was that I wanted to be able to drill down and filter across all the available stocks, look at the data for myself, and narrow down on the stocks I am interested based on my own sets of criteria, and make data-driven analysis for my personal investment strategies. I used PostgreSQL as the backend database for ELT data pipelines, and used Citus Data cstore_fdw for columnar compression for the final dataset. All financial data is coming from SEC Edgar, https://ift.tt/Kc5gUDG . I used Python for downloading most of the data. I also run the data load development locally on my home Ubuntu server that I built 5 years ago. I bought 4TB of M2 disks for best database performance, with PRIME B360M-A motherboard and Intel Chip Coffee Lake S. I built the website simply using WordPress, and I run Apache Superset using gunicorn via Apache Webserver reverse proxy. The registration form I had to build myself with PHP and some JavaScript, because it needed to automatically create a SuperSet user upon registration. Otherwise, I would need to input everyone manually. I used Python again for the data integration. Please don't use the database directly as an investment tool, as its in Beta, and the data still needs to undergo heavy data quality checks, please confirm all the numbers yourself, as I provide a link for every company to the SEC filings.
Comments
Post a Comment