-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out where to get 'fake data' #15
Comments
Not sure about the existence of this raw data but I think you could use faker to generate all the data you need in a predictable way (seeding). |
@felubra very nice, thanks! Ideally would be good to get hands on real data (I might just make some of mine public), but that's super helpful too. |
Briefly tried And also there is mimesis, which claims to be faster. I guess generally the problem is that random data doesn't quite work for the demos, because real data has some sort of 'narrative', and causal structure. But anyway it's certainly useful to generate lots of it, and then filter out the datapoints so that it starts making some causal sense. |
In terms of organizing the code, etc: it seems that the data generations would belong well to the data access layers. The idea is that the code that parses raw data and the code that generates fake raw data are close, so they don't go out of sync (also that allows to have CI for data parsing for free, just run it against the fake data). Then, the corresponding HPI module uses the DAL to generate fake data and set it as inputs: Lines 78 to 84 in 28fcc1d
It works as a decorator, e.g.
, here's an example: https://github.com/karlicoss/dashboard/blob/623555e09647cce20bcc60f8ba6e9f5e932d32a2/src/dashboard/tabs.py#L103-L116 And the end result: Rescuetime data heatmap generated against the completely fake data, with everything running on CI! https://karlicoss.github.io/dashboard/rescuetime.html The snippets are a bit awkward at the moment, but I'll fix a couple of minor caveats, and I feel like this could work really well! |
Some test data I uploaded myself |
It would be nice to have a public repository of raw data from different services, so it would be easy to test HPI and demonstrate without having to give up your own data. Does such a thing exist?
P.S. maybe this issue rather belongs here, and I'll tranfer it.
The text was updated successfully, but these errors were encountered: