-
-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spider for Campo Mourão/PR #438
Conversation
Hi @giuliocc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @allisonsampaio and @rodps ! You don't need to create another PR actually hehehehe. The "soon" status is not mandatory (rarely used) to avoid this overhead.
You can add the city to CITIES.md as already "done" in this PR. That said, I'll close that PR.
I requested some changes and gave some tips. Thanks for the contribution!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work on those corrections! ❤️
Here the gazettes with edition number 1511-1516 (the first ones) weren't extracted, could you check into that?
Since the PR is almost ready to be merged I'll make a suggestion for further work if you are interested :)
Looking into data from the census I detected that atende.net is a system which is used by many cities. Looking for "diario oficial atende.net" in a search engine gives us many of those (I don't know if we can get the full list somewhere). If you are interested in contributing further to the project, a nice addition would be making another PR generalizing this spider to a base spider and add the other cities :)
Hi @giuliocc. I would like to do "another PR generalizing this spider to a base spider and add the other cities". Should I use this branch (rodps:main) or okfn-brasil:main for base? |
code = gazette.xpath("//button[@data-acao='download']/@data-codigo").get() | ||
id = gazette.xpath("//button[@data-acao='download']/@data-id").get() | ||
|
||
is_extra = True if edition_type == "Extraordinária" else False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_extra = True if edition_type == "Extraordinária" else False | |
is_extra = edition_type == "Extraordinária" |
def parse(self, response, page=1): | ||
|
||
gazettes = response.xpath("//div[@class='nova_listagem ']/div[@class='linha']") | ||
follow_next_page = False if not gazettes else True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow_next_page = False if not gazettes else True | |
follow_next_page = bool(gazettes) |
Hello @TZorawski, I'd advise waiting for @rodps PR to be ready. I'm afraid that the future changes in this PR would slow you down. But if you want to start anyways you should fork rodps:main so you get the changes from this PR. @rodps do you need any help on this one? I left two comments with minor suggestions but the priority would be fixing Giulio's comment: #438 (review) |
Hello everybody. Sorry the late. The previous problem mentioned by @giuliocc I just inverted the place of the month and the day in the 'start_date' variable. Now it happened that the gazette website changed the way it renders precisely the part where the gazette is downloaded. The content is now generated via javascript. I didn't find a way to do this that fits with the project, besides the fact that time is running out in this period. If @TZorawski or anyone else wants to continue with this issue, please feel free. |
Thanks @rodps, so I will continue |
Ah, mais uma coisa: pode atualizar o CITIES.md, por favor? |
Closed as stale. |
…cia na criação do spider base do sistema replicável Atende.
para trabalhar com o spider base do sistema replicável 'Atende'. Resolve okfn-brasil#430 Adiciona spider para Campo Mourão - PR.
Issue #430