-
Notifications
You must be signed in to change notification settings - Fork 3
Conversation
… conventions (http:// issue)
…nd for data mocking
requests==2.18.4 | ||
simplejson==3.13.2 | ||
six==1.11.0 | ||
spavro==1.1.10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curiously fastavro
is still there 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh, if you don't wipe out the previous requirements.txt things can sneak in despite no longer being in primary requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or not. This is more mysterious than I though.
pbr==3.1.1 | ||
mock==2.0.0 | ||
pbr==4.0.2 | ||
psycopg2==2.7.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
psycopg2==2.7.4
could you remove this line? 😳
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure it's redundant? That's what pip_freeze stuck in there for the requirement: psycopg2-binary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you take a look at the console logs you'll see that psycopg2
is the old version and psycopg2-binary
is the new one that will replace the other one in the next releases.
Someone with the same question as you:
cursor.execute(count_str); | ||
for row in cursor: | ||
return row.get("new_rows") | ||
return sum([1 for row in cursor]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't SELECT COUNT(*) FROM ...
work for you? 🤔
or maybe
SELECT COUNT(DISTINCT
e.id,
e.modified,
ps.name,
ps.id,
s.name,
s.id)
FROM kernel_entity e
...
Bring to the server all the data when you only need the number seems very expensive for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Select count stopped working, which was a total mystery. So I moved to this. I was afraid it would be slow so I bench-marked it. The operation is on the MS timescale so I stopped with the solution since it's "efficient enough" for now since this only happens once every 10 seconds or so and this and the DB are both on the same machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mysteries of life!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was surprised because it's kind of a key piece of functionality and all of the sudden nothing worked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you include a comment? I think this question is going to show up every time someone reads the code 🤓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Of course.
inner join kernel_schema s on ps.schema_id = s.id | ||
WHERE e.modified > '%s' | ||
ORDER BY e.modified ASC; | ||
''' % (offset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I be 😈 with the SQL format? Why is ______________ from
so far away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More like this?
SELECT e.id,e.revision,
e.payload,
e.modified,
e.status,
ps.id AS project_schema_id,
ps.name AS project_schema_name,
s.name AS schema_name,
s.id AS schema_id,
s.revision AS schema_revision
FROM kernel_entity e
INNER JOIN kernel_projectschema ps
ON e.projectschema_id = ps.id
INNER JOIN kernel_schema s
ON ps.schema_id = s.id
WHERE e.modified > '%s'
ORDER BY e.modified ASC;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😂 🙈
for x, row in enumerate(cursor): | ||
if x >= max_size - 1: | ||
raise StopIteration | ||
yield {key : row[key] for key in row.keys()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you pretend with this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you asking why it's not just a list comprehension instead of a generator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the point of using a generator, but I don't get the point of using a generator with enumerate(cursor)
(maybe I need to read again the python docs) that actually it isn't a generator itself. 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think enumerate consumes the whole cursor on one go, but I could be wrong. Only reason for enumerate was to implement the limit functionality, but as you said that should go in the SQL query anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I found, https://stackoverflow.com/questions/23663231/does-enumerate-produce-a-generator-object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, interesting.
@@ -1,17 +0,0 @@ | |||
# Copyright (C) 2018 by eHealth Africa : http://www.eHealthAfrica.org |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you removing the file or the license header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the module's folder structure so the file was removed. Could be a merge error since Fredrik was adding the headers on dev at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also looks like the license headers haven't been applied to empty init.py files
inner join kernel_projectschema ps on e.projectschema_id = ps.id | ||
inner join kernel_schema s on ps.schema_id = s.id | ||
WHERE e.modified > '%s' | ||
ORDER BY e.modified ASC; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you want to use the LIMIT
option with the max_size
value?
https://www.postgresql.org/docs/9.6/static/queries-limit.html
SELECT select_list
FROM table_expression
[LIMIT { number | ALL }] [OFFSET number]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, should have done that.
import aether.saladbar.saladbar.parsers as Parsers | ||
# from saladbar import parsers as Parsers | ||
# from saladbar import salad_handler as salad | ||
import aether.saladbar.saladbar.salad_handler as salad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you remove the commented imports or are there for a reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a little bit of cleaning but nothing that cannot be done later if this is very urgent
Changed naming convention of schemas.
For example;
http://demo.eha.org/Person -> org.eha.demo.Person
Which is more idiomatic for Avro. As Salad prefers the previous convention, we do this after a schema has been turned into Avro, if we're using salad-bar.
Consolidated avro libraries.
We've also tried to limit our usage of avro libraries that are not spavro. (removed fastavro completely). However, a bug in spavro means we've been forced to use avro-python3 in the aether-producer for now. It's been reported, and hopefully will be addressed, but their extensive use of Cython makes it hard to patch and submit a PR.
Removed Python2 Dependencies.
We were maintaining compatability of some utilities (saladbar, producer, etc) for dependency reasons. We've been able to remove those dependencies. Now all apps and utilities run python3. Only aether-client and the consumer-sdk will need to be compatible with both python2 and python3.