Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in splita/splitax when field contains a single non-string value #1589

Closed
valxntine opened this issue Jun 15, 2024 · 4 comments · Fixed by #1629
Closed

Error in splita/splitax when field contains a single non-string value #1589

valxntine opened this issue Jun 15, 2024 · 4 comments · Fixed by #1629

Comments

@valxntine
Copy link

valxntine commented Jun 15, 2024

I have a CSV that has a field EAN, this is either a single value:

7622210953278

or a concatenated string of values:

"7622210953278#7622210953279#7622281623271"

When I split the field, it works when it's a case where there's multiple values, otherwise it ends up replacing the field with (error). I'm converting this CSV to JSON, for what it's worth.

The command I'm running is:

mlr --icsv --ojson --from products.csv put '$EAN = splita($EAN, "#")' then cut -f SKU,TITLE,LONG_DESC,EAN

Which outputs something like this (albeit longer):

{
  "SKU": 0000000,
  "LONG_DESC": "some description",
  "EAN": (error),
  "TITLE": "some title",
},
{
  "SKU": 0000000,
  "LONG_DESC": "some description",
  "EAN": [5000159372343, 5000159470490],
  "TITLE": "some title",
},
{
  "SKU": 0000000,
  "LONG_DESC": "some description",
  "EAN": (error),
  "TITLE": "some title",
},
{
  "SKU": 0000000,
  "LONG_DESC": "some description",
  "EAN": (error),
  "TITLE": "some title",
},

I've only just come across miller, so I may be missing something obvious, I've tried with both splita and splitax. Any advice or is this not something I can work around?

@valxntine
Copy link
Author

As soon as I finished typing this I realised a possible solution and turns it out it worked:
mlr --icsv --ojson --from products.csv put '$EAN = string($EAN)' then put '$EAN = splita($EAN, "#")' then cut -f SKU,TITLE,LONG_DESC,EAN | jq

Apologies for the short-lived issue, feel free to close!

@johnkerl
Copy link
Owner

johnkerl commented Jun 15, 2024

Yes casting to string works since

7622210953278

type-infers as int but

7622210953278#7622210953279#7622281623271

type-infers to string.

And, splita returns error when given a string.

[Typo found 2024-08-25]: And, splita returns error when given a string.

Maybe splita ought to do to-string for the user, even when not asked to do so ...

I've not been super-consistent with this TBH.

On the one hand I never ever wanted Miller to be as loosey-goosey with string vs int as, say, JavaScript or PHP -- I always wanted "2"+3 to be error not 5. So most of the time I prefer to return error rather than doing implicit type-casts.

On the other hand, I do some implicit type-casts -- the dot operator for example, since I felt there was nothing else meaningful for . to do so I might as well stringify all inputs. So "3" . 4 is "34". Also substr will stringify its first argument: substr1(123456789, 3, 5) gives "345".

The split functions should probably do the same ...

@valxntine
Copy link
Author

The splita and splitax docs both say that they split a string into array, I think it could be fair to assume as a user that the argument would be converted to a string.

I'm happy to take a stab and raise a PR for this in the next few days, if you'd like?

@johnkerl
Copy link
Owner

The splita and splitax docs both say that they split a string into array, I think it could be fair to assume as a user that the argument would be converted to a string

Cool!

Yes, happy to take a PR if you want! 😎

@johnkerl johnkerl changed the title splita/splitax error when field contains a single value Error in splita/splitax when field contains a single non-string value Aug 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants