Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Add a level of record abstraction #380

Conversation

AndrewCiambrone
Copy link
Contributor

Summary of Changes

RFC: amundsen-io/rfcs#5

Tests

I modified all the model tests to incorporate the new serializable functionality.

Documentation

What documentation did you add or modify and why? Add any relevant links then remove this line

CheckList

Make sure you have checked all steps below to ensure a timely review.

  • PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
  • PR includes a summary of changes.
  • PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
  • PR passes make test

@AndrewCiambrone AndrewCiambrone changed the title [refactor] Databuilder record abstraction [refactor] Add a level of record abstraction Oct 13, 2020
@AndrewCiambrone AndrewCiambrone changed the title [refactor] Add a level of record abstraction refactor: Add a level of record abstraction Oct 13, 2020
@AndrewCiambrone AndrewCiambrone changed the title refactor: Add a level of record abstraction refactor: Add a level of record abstraction [WIP] Oct 13, 2020
@feng-tao
Copy link
Member

hey @AndrewCiambrone , thanks for the pr, has been a busy week..Do you think you could fix the conflict and CI? I would like to use the branch to do a quick lyft staging test. thanks a lot!

@feng-tao feng-tao added the keep fresh Disables stalebot from closing an issue label Oct 16, 2020
@feng-tao
Copy link
Member

ping @AndrewCiambrone , I think this generic is super useful if we would like to support neptune in databuilder. Let us know if you have the chance to update the pr and I would like to do a quick test in our staging as well. cc @allisonsuarez in case I don't have a chance to test it.

@AndrewCiambrone
Copy link
Contributor Author

Hey @feng-tao sorry its been a busy week. I will pull in the latest changes tomorrow.

Signed-off-by: Andrew <andrjc4@vt.edu>
@feng-tao
Copy link
Member

thanks for the rebase!

@feng-tao
Copy link
Member

just did a quick staging test with the pr with a few tasks and it runs fine so far :) I will look at the pr in more detail tomorrow and may ask questions as well, thanks for the great work!

Copy link
Member

@feng-tao feng-tao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to go through every line, but overall, I think the change lgtm and it is well written! Once the conflict is fixed, we could merge it ! thanks.



class ColumnMetadata:
COLUMN_NODE_LABEL = 'Column'
COLUMN_KEY_FORMAT = '{db}://{cluster}.{schema}/{tbl}/{col}'
COLUMN_NAME = 'name'
COLUMN_TYPE = 'type'
COLUMN_ORDER = 'sort_order{}'.format(UNQUOTED_SUFFIX) # int value needs to be unquoted when publish to neo4j
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @AndrewCiambrone , I think we still need to keep this? or i miss the fix somewhere else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, I see the latter part

@feng-tao
Copy link
Member

feng-tao commented Nov 4, 2020

I end up to fix the merge conflict

@feng-tao
Copy link
Member

feng-tao commented Nov 4, 2020

@AndrewCiambrone given the change is complete, could you remove the WIP? or let me know if there are any models that haven't done the refactors.

For amundsen-io/rfcs#5, do you plan to implement the Neptune serializer as part of the same RFC?

@AndrewCiambrone AndrewCiambrone changed the title refactor: Add a level of record abstraction [WIP] refactor: Add a level of record abstraction Nov 4, 2020
@AndrewCiambrone
Copy link
Contributor Author

If no new models were added in the last rebase I believe every model is accounted for.

As for the RFC I felt like they were two separate topics so I was planning on writing a second rfc for the Neptune databuilder.

@feng-tao
Copy link
Member

feng-tao commented Nov 4, 2020

thanks @AndrewCiambrone !

@feng-tao feng-tao merged commit 414e825 into amundsen-io:master Nov 5, 2020
@dorianj
Copy link
Contributor

dorianj commented Nov 10, 2020

When I run sample_data_loader.py with this changeset applied, I get this error:

Traceback (most recent call last):
  File "example/scripts/sample_data_loader.py", line 285, in <module>
    'databuilder.models.table_stats.TableColumnStats')
  File "example/scripts/sample_data_loader.py", line 115, in run_csv_job
    publisher=Neo4jCsvPublisher()).launch()
  File "/home/nada/paymob/data_discovery/data-discovery-env/lib/python3.7/site-packages/amundsen_databuilder-4.0.3-py3.7.egg/databuilder/job/job.py", line 77, in launch
  File "/home/nada/paymob/data_discovery/data-discovery-env/lib/python3.7/site-packages/amundsen_databuilder-4.0.3-py3.7.egg/databuilder/job/job.py", line 67, in launch
  File "/home/nada/paymob/data_discovery/data-discovery-env/lib/python3.7/site-packages/amundsen_databuilder-4.0.3-py3.7.egg/databuilder/task/task.py", line 65, in run
  File "/home/nada/paymob/data_discovery/data-discovery-env/lib/python3.7/site-packages/amundsen_databuilder-4.0.3-py3.7.egg/databuilder/loader/file_system_neo4j_csv_loader.py", line 119, in load
  File "/home/nada/anaconda3/lib/python3.7/csv.py", line 155, in writerow
    return self.writer.writerow(self._dict_to_list(rowdict))
  File "/home/nada/anaconda3/lib/python3.7/csv.py", line 151, in _dict_to_list
    + ", ".join([repr(x) for x in wrong_fields]))
ValueError: dict contains fields not in fieldnames: 'stat_val'

I haven't been able to debug it further, any ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep fresh Disables stalebot from closing an issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants