{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":23742071,"defaultBranch":"master","name":"oldnyc","ownerLogin":"danvk","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2014-09-06T18:38:04.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/98301?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1726328484.0","currentOid":""},"activityList":{"items":[{"before":"3759ad64a7cb30fcde03ccc6b842f24885e0d953","after":"2d796eb0710c5408adb6bfaf702cfcc48d584142","ref":"refs/heads/master","pushedAt":"2024-09-17T19:22:33.000Z","pushType":"push","commitsCount":4,"pusher":{"login":"danvk","name":"Dan Vanderkam","path":"/danvk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/98301?s=80&v=4"},"commit":{"message":"prefer non-rotated images, start scraping all DC smallbacks","shortMessageHtmlLink":"prefer non-rotated images, start scraping all DC smallbacks"}},{"before":"9faa6e26cef3bec865d713a9623b93ba0470d673","after":"3759ad64a7cb30fcde03ccc6b842f24885e0d953","ref":"refs/heads/master","pushedAt":"2024-09-15T17:12:28.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"danvk","name":"Dan Vanderkam","path":"/danvk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/98301?s=80&v=4"},"commit":{"message":"format modifications","shortMessageHtmlLink":"format modifications"}},{"before":"463f1c3f7c86ad7582aaebc98e6583543dccfe2f","after":"9faa6e26cef3bec865d713a9623b93ba0470d673","ref":"refs/heads/master","pushedAt":"2024-09-15T17:10:04.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"danvk","name":"Dan Vanderkam","path":"/danvk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/98301?s=80&v=4"},"commit":{"message":"Drop pickle format, add GPT tools (#122)\n\n* generate records.json instead of records.pickle\r\n\r\n* start getting off of pickle\r\n\r\n* debugging geocoding; wire up API key\r\n\r\n* ignore unlocated.csv\r\n\r\n* store alt title\r\n\r\n* download un-OCRd backing images\r\n\r\n* set up dotenv\r\n\r\n* extract data from NYPL HTML pages\r\n\r\n* slow down fetch\r\n\r\n* update crop_morphology\r\n\r\n* py3 for crop_morphology, add argparser\r\n\r\n* front -> back mapping JSON file\r\n\r\n* add back_ids to records.json\r\n\r\n* checkpoint backing_image.py; needs an update if still useful\r\n\r\n* Generate and review GPT batches; good but not great\r\n\r\n* checkpoint geogpt\r\n\r\n* use system instructions, add review tool\r\n\r\n* try a second example\r\n\r\n* remove second example\r\n\r\n* working towards GPT-based geocoding\r\n\r\n* Include some fields inferred from \"source\" in records.json\r\n\r\n* remove dead code\r\n\r\n* patch GPT queries\r\n\r\n* GPT geocoder\r\n\r\n* pickle -> JSON for photos, static site\r\n\r\n* drop pickle\r\n\r\n* script updates to regenerate site\r\n\r\n* data update for site update\r\n\r\n* Source images directly from NYPL site\r\n\r\n* Add GeoJSON output format\r\n\r\n* fetch backing images from new bucket\r\n\r\n* more flags/stroking in crop_morphology\r\n\r\n* add --border_only mode to crop_morphology\r\n\r\n* generate truth data for OCR analysis\r\n\r\n* working on OCR eval pipeline; mean score on golden-90 is 0.737\r\n\r\n* add rotation, start building out nougat pipeline\r\n\r\n* GPT-4o-mini is very good at high resolution OCR: 0.926\r\n\r\n* be more lax about cookies IPs in feedback\r\n\r\n* Descriptive JSON is not helpful; score=0.969\r\n\r\n* Ask GPT about rotations\r\n\r\n* checkpoint a few improvements, lots of debug code for crop_morphology\r\n\r\n* conservative stamp analysis, border crop bug fix\r\n\r\n* initial tesseract run; score=0.929\r\n\r\n* search feature in OCR review\r\n\r\n* try transpositions to improve score; a little too complex\r\n\r\n* transpose short lines in truth data\r\n\r\n* less aggressive permutations, more logging\r\n\r\n* simplify interface\r\n\r\n* add instructions about (1), (2), etc.\r\n\r\n* temperate zero\r\n\r\n* reject likely mismatches\r\n\r\n* running GPT over 350 images\r\n\r\n* parameterize generate_gpt_batch.py\r\n\r\n* Initial take on GPT batch management tools\r\n\r\n* splitter + batch manager seem to be working!\r\n\r\n* batch manager is working!\r\n\r\n* revert html crawler\r\n\r\n* revert geocoding data changes\r\n\r\n* blacken new files\r\n\r\n* blacken a few modified files","shortMessageHtmlLink":"Drop pickle format, add GPT tools (#122)"}},{"before":"7cf7a3e957030533cf1290d06872f21af51639dd","after":"837c18491b419d2f53cb63366e53ff21f9813d31","ref":"refs/heads/drop-pickle","pushedAt":"2024-09-15T16:54:08.000Z","pushType":"push","commitsCount":58,"pusher":{"login":"danvk","name":"Dan Vanderkam","path":"/danvk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/98301?s=80&v=4"},"commit":{"message":"blacken a few modified files","shortMessageHtmlLink":"blacken a few modified files"}},{"before":null,"after":"7cf7a3e957030533cf1290d06872f21af51639dd","ref":"refs/heads/drop-pickle","pushedAt":"2024-08-27T19:22:12.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"danvk","name":"Dan Vanderkam","path":"/danvk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/98301?s=80&v=4"},"commit":{"message":"start getting off of pickle","shortMessageHtmlLink":"start getting off of pickle"}},{"before":"9a9d901e76d16d6fe46971b84d45f820bd6a5c3a","after":null,"ref":"refs/heads/redo-geocode-2024","pushedAt":"2024-08-27T18:27:33.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"danvk","name":"Dan Vanderkam","path":"/danvk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/98301?s=80&v=4"}},{"before":"6bae712e0bd550d8ada5ed42dd4118aedb103200","after":"463f1c3f7c86ad7582aaebc98e6583543dccfe2f","ref":"refs/heads/master","pushedAt":"2024-08-27T18:27:29.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"danvk","name":"Dan Vanderkam","path":"/danvk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/98301?s=80&v=4"},"commit":{"message":"Re-run geocoder in 2024 (#121)\n\n* port coder / test to Python 3 and pytest\r\n\r\n* fix some more tests\r\n\r\n* commit intersections.json\r\n\r\n* port coders to python 3\r\n\r\n* Mostly able to regenerate geocodes!\r\n\r\n* restore bug for perfect repro\r\n\r\n* notes\r\n\r\n* commit records.pickle\r\n\r\n* unzip geocache\r\n\r\n* drop records.pickle","shortMessageHtmlLink":"Re-run geocoder in 2024 (#121)"}},{"before":"6aed87b27c9755946fb22def66aeec91deb78336","after":"9a9d901e76d16d6fe46971b84d45f820bd6a5c3a","ref":"refs/heads/redo-geocode-2024","pushedAt":"2024-08-27T18:26:35.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"danvk","name":"Dan Vanderkam","path":"/danvk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/98301?s=80&v=4"},"commit":{"message":"drop records.pickle","shortMessageHtmlLink":"drop records.pickle"}},{"before":"9e9239c3091ddaadaa1793dc2c4fedeef0a5107f","after":"6aed87b27c9755946fb22def66aeec91deb78336","ref":"refs/heads/redo-geocode-2024","pushedAt":"2024-08-27T18:21:51.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"danvk","name":"Dan Vanderkam","path":"/danvk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/98301?s=80&v=4"},"commit":{"message":"unzip geocache","shortMessageHtmlLink":"unzip geocache"}},{"before":null,"after":"9e9239c3091ddaadaa1793dc2c4fedeef0a5107f","ref":"refs/heads/redo-geocode-2024","pushedAt":"2024-08-27T18:17:37.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"danvk","name":"Dan Vanderkam","path":"/danvk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/98301?s=80&v=4"},"commit":{"message":"commit records.pickle","shortMessageHtmlLink":"commit records.pickle"}},{"before":"90fc0ccc8d7320467b142f8197832db414f69f37","after":"6bae712e0bd550d8ada5ed42dd4118aedb103200","ref":"refs/heads/master","pushedAt":"2024-08-27T15:07:27.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"danvk","name":"Dan Vanderkam","path":"/danvk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/98301?s=80&v=4"},"commit":{"message":"Crawl NYPL API (#120)\n\n* crawl roots\r\n\r\n* Find all collections in the Milstein division\r\n\r\n* recursive crawler\r\n\r\n* recover, store all items\r\n\r\n* completed crawl? (45801 items)\r\n\r\n* complete crawl via /items\r\n\r\n* crawl \"captures\" as well; results are a bit disappointing\r\n\r\n* Commit original Milstein CSV file\r\n\r\n* Build nyc-records.extended.json\r\n\r\n* attach OCR text to extended records\r\n\r\n* README","shortMessageHtmlLink":"Crawl NYPL API (#120)"}},{"before":"52ad3939032df7407880d9c23fe4a1c3a579bedd","after":"22a23c416b5272dc0f6b1c75b7a59b1e6a4a3c15","ref":"refs/heads/api-crawl","pushedAt":"2024-08-27T15:05:09.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"danvk","name":"Dan Vanderkam","path":"/danvk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/98301?s=80&v=4"},"commit":{"message":"README","shortMessageHtmlLink":"README"}},{"before":null,"after":"52ad3939032df7407880d9c23fe4a1c3a579bedd","ref":"refs/heads/api-crawl","pushedAt":"2024-08-27T14:58:47.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"danvk","name":"Dan Vanderkam","path":"/danvk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/98301?s=80&v=4"},"commit":{"message":"attach OCR text to extended records","shortMessageHtmlLink":"attach OCR text to extended records"}},{"before":"84d2d1748b1ebd12c753abe708998c08c648db90","after":"90fc0ccc8d7320467b142f8197832db414f69f37","ref":"refs/heads/master","pushedAt":"2024-08-25T16:10:03.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"danvk","name":"Dan Vanderkam","path":"/danvk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/98301?s=80&v=4"},"commit":{"message":"scrub implausible dates like \"13905\"","shortMessageHtmlLink":"scrub implausible dates like \"13905\""}},{"before":"d936df3ca4289549d7e7cdbbb0428f2b8ee19441","after":"84d2d1748b1ebd12c753abe708998c08c648db90","ref":"refs/heads/master","pushedAt":"2024-08-25T14:23:10.000Z","pushType":"push","commitsCount":14,"pusher":{"login":"danvk","name":"Dan Vanderkam","path":"/danvk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/98301?s=80&v=4"},"commit":{"message":"simplify date display","shortMessageHtmlLink":"simplify date display"}},{"before":"21d020fb87b58fa42e768b774a99447abd5bb3eb","after":"d936df3ca4289549d7e7cdbbb0428f2b8ee19441","ref":"refs/heads/master","pushedAt":"2024-08-21T19:54:08.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"danvk","name":"Dan Vanderkam","path":"/danvk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/98301?s=80&v=4"},"commit":{"message":"allow cookie to be missing (maybe a bug?)","shortMessageHtmlLink":"allow cookie to be missing (maybe a bug?)"}},{"before":"6a8d6a6fb64b27fb9448345d7efb88c6f4b6b098","after":"21d020fb87b58fa42e768b774a99447abd5bb3eb","ref":"refs/heads/master","pushedAt":"2024-08-21T19:22:43.000Z","pushType":"push","commitsCount":5,"pusher":{"login":"danvk","name":"Dan Vanderkam","path":"/danvk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/98301?s=80&v=4"},"commit":{"message":"http -> https","shortMessageHtmlLink":"http -> https"}},{"before":"a09ea03e8b0f96ffe235755a1120569a61a0dd79","after":"6a8d6a6fb64b27fb9448345d7efb88c6f4b6b098","ref":"refs/heads/master","pushedAt":"2024-08-16T14:37:48.000Z","pushType":"push","commitsCount":11,"pusher":{"login":"danvk","name":"Dan Vanderkam","path":"/danvk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/98301?s=80&v=4"},"commit":{"message":"preserve sort order, pull in new NYPL URLs","shortMessageHtmlLink":"preserve sort order, pull in new NYPL URLs"}}],"hasNextPage":false,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOS0xN1QxOToyMjozMy4wMDAwMDBazwAAAAS46OX1","startCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOS0xN1QxOToyMjozMy4wMDAwMDBazwAAAAS46OX1","endCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOC0xNlQxNDozNzo0OC4wMDAwMDBazwAAAAScO2A1"}},"title":"Activity ยท danvk/oldnyc"}