Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added collect-to-arrow support #284

Merged
merged 28 commits into from
Nov 5, 2020

Conversation

behrica
Copy link
Contributor

@behrica behrica commented Nov 2, 2020

refactored

more data types and refactored

more data types and refactored

make use of type hints

added docstring

clean up

adapted to latest upstream

coomit auto-changing file

@codecov-io
Copy link

codecov-io commented Nov 2, 2020

Codecov Report

Merging #284 into develop will decrease coverage by 0.221%.
The diff coverage is 93.750%.

Impacted file tree graph

@@              Coverage Diff               @@
##            develop      #284       +/-   ##
==============================================
- Coverage   100.000%   99.778%   -0.222%     
==============================================
  Files            36        37        +1     
  Lines          3050      3162      +112     
  Branches          0         7        +7     
==============================================
+ Hits           3050      3155      +105     
- Partials          0         7        +7     
Impacted Files Coverage Δ
src/clojure/zero_one/geni/arrow.clj 93.693% <93.693%> (ø)
src/clojure/zero_one/geni/core.clj 100.000% <100.000%> (ø)

@anthony-khong
Copy link
Member

Thank you for the PR! This looks great! Let me review the code 😄

docker/project.clj Outdated Show resolved Hide resolved
:profiles
{:provided {:dependencies ~spark-deps}
:uberjar {:aot :all :dependencies ~spark-deps}
:dev {:dependencies [[enlive "1.1.6"]
[midje "1.9.9"]]
[midje "1.9.9"]
[techascent/tech.ml.dataset "5.00-alpha-19"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this one should go to provided dependencies too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having it in profile "dev" is not enough ?
This makes sure, is does not gets pulled in using geni.

docker/project.clj Outdated Show resolved Hide resolved
src/clojure/zero_one/geni/arrow.clj Outdated Show resolved Hide resolved
src/clojure/zero_one/geni/arrow.clj Outdated Show resolved Hide resolved
test/zero_one/geni/arrow_test.clj Outdated Show resolved Hide resolved
src/clojure/zero_one/geni/arrow.clj Show resolved Hide resolved
src/clojure/zero_one/geni/arrow.clj Outdated Show resolved Hide resolved
src/clojure/zero_one/geni/arrow.clj Outdated Show resolved Hide resolved
src/clojure/zero_one/geni/arrow.clj Outdated Show resolved Hide resolved
@behrica
Copy link
Contributor Author

behrica commented Nov 3, 2020

I addressed most of your points, thanks for comments.

I left function typed-action as-is.
I will have a look, on how to address duplication while preserving type hints.

refactored

more data types and refactored

more data types and refactored

make use of type hints

added docstring

clean up

adapted to latest upstream

coomit auto-changing file
@behrica
Copy link
Contributor Author

behrica commented Nov 4, 2020

I merged manually all your cosmetic changes.
So this branch should have them all.

@behrica
Copy link
Contributor Author

behrica commented Nov 4, 2020

Please have a look and lt me know, if fine.
I can the rebase it as well to squash all commits

@anthony-khong
Copy link
Member

Hi @behrica, I believe this is good to merge! Just one final thing to pass the CI jobs:

lein cljfmt fix src test doc

Then it should be good! Thank you for the awesome PR, this is a great addition to the library!

@behrica
Copy link
Contributor Author

behrica commented Nov 5, 2020

Should I write some lines here:

https://github.com/zero-one-group/geni/blame/6c540e4b19a0e888faecb8da4b5f8c52fc437c74/docs/cookbook/part_01_reading_and_writing_datasets.md#L122

to explain the different options Geni gives to "collect" data to the driver ?
(or in an other place)

We could mention as well, that there is now tight integration with TMD, as an other form to "collect" data to work with it further.

@behrica
Copy link
Contributor Author

behrica commented Nov 5, 2020

I added some docu , please have a look

@anthony-khong
Copy link
Member

Great! It's looking good! I'll merge as soon as the pipeline passes.

As for the docs, I suppose it's good to have. When we add TMD support in Geni, we should be able to do TMD-Spark interop from within Geni (i.e. no extra requires). When that happens, we just change the doc!

Again, thank you so much for this PR!!

@anthony-khong anthony-khong merged commit 83b24ea into zero-one-group:develop Nov 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants