Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kedro.io.CSVS3DataSet does not use load_args #2

Closed
2 tasks
Lucianois opened this issue May 20, 2019 · 4 comments
Closed
2 tasks

kedro.io.CSVS3DataSet does not use load_args #2

Lucianois opened this issue May 20, 2019 · 4 comments

Comments

@Lucianois
Copy link
Contributor

Description

When to loading a dataframe from S3, the arguments used are the default, instead of the configured under load_args on the catalog.yml

Context

Trying to load CSV from S3 with custom load_args

Steps to Reproduce

  1. Change catalog.yml with
    XXX.csv.s3:
    type: CSVS3DataSet # https://kedro.readthedocs.io/en/latest/kedro.io.CSVS3DataSet.html
    load_args: # https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
    sep: '\t'
    encoding: "ISO-8859-1"

Create pipeline to display data in XXX.csv.s3
example:
node(
read_display,
["XXX.csv.s3"],
None
)

  1. kedro run
  2. Check if data is being displayed correctly.

Expected Result

Data should be split by tab

Actual Result

Data is loaded without custom sep.

-- If you received an error, place it here.
-- Separate them if you have more than one.

Your Environment

Include as many relevant details about the environment you experienced the bug in

  • Kedro version used: v0.14
  • Python version used: 3.6.8
  • Operating system and version:MacOSX 10.14.5

DOES NOT HAPPEN in KEDRO v0.13.1.dev53

Checklist

Include labels so that we can categorise your issue

  • Add a "Component" label to the issue
  • Add a "Priority" label to the issue
@Pet3ris
Copy link

Pet3ris commented May 21, 2019

Hey @Lucianois - you can submit code snippets using "```" like so.

From the example you have submitted, it's not clear if you have applied the correct yaml formatting to the example. For instance, sep should be a load_arg, nested under that parameter.

@Lucianois
Copy link
Contributor Author

Lucianois commented May 21, 2019

    type: CSVS3DataSet # https://kedro.readthedocs.io/en/latest/kedro.io.CSVS3DataSet.html
    load_args: # https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
        sep: '\t'
        encoding: "ISO-8859-1"
    credentials: prod_s3
    bucket_name: bucket1
    filepath: path-to-csv 

@tsanikgr already identified the problem.

@yetudada
Copy link
Contributor

Thanks for the comment @Pet3ris! And, @Lucianois, thank you so much for submitting this issue and for the updated the comment. We have a fix for this that we're about to push through.

@tsanikgr
Copy link
Contributor

Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants