to_json/to_jsonl does not have encoding options when exporting file? #70

chslink · 2016-05-09T07:05:04Z

when using to_json/to_jsonl to export files.I found the methon no encoding option.I read the code,and both functions using json.dump and utf-8 charset.
And thanks for your awesome work,I really love this functional toolkits.

EntilZha · 2016-05-09T15:39:45Z

Thanks @chslink for the bug report and thanks! I'll look into it, but if you have a test case that would be helpful to reproduce the bug with.

chslink · 2016-05-10T01:02:16Z

That's strange,maybe i found another bug...

My test environment:

Python 2.7.9 (default, Dec 10 2014, 12:28:03) [MSC v.1500 64 bit (AMD64)] on win32

My test code:

testlist = [
        {"id": 1, "key": u"some unicode 哈哈哈 1"},
        {"id": 2, "key": u"some unicode 哈哈哈 2"}
    ]

data = seq(testlist)
print data.find(lambda x: x['id'] == 1)
print data.drop_while(lambda x: ['id'] == 1)
data.to_json("r.json")
data.to_jsonl("r.jsonl")

Output:

{'id': 1, 'key': u'some unicode \u54c8\u54c8\u54c8 1'}
[{'id': 1, 'key': u'some unicode \u54c8\u54c8\u54c8 1'}, {'id': 2, 'key': u'some unicode \u54c8\u54c8\u54c8 2'}]

I thought two files will be encoded with utf-8. But r.jsonl file was encoded with my system default charset.
The other bug seems drop_while() not working?

EntilZha · 2016-05-11T03:10:58Z

In your code, you wrote lambda x: ['id'] instead of lambda x: x['id']. Since the array is non-empty, it would return true. After fixing that, things seem to work as you would expect.

I took a closer look at the code, and your bug report is correct. Currently, the default json writer is used on the to_json call which by default uses utf 8. The jsonl writer uses the builtin open and its default encoding which is the system default.

My proposal to fix this issue would be:

Change the default encoding of to_jsonl to utf8
Include an encoding parameter for both to_json and to_jsonl to let user change this if needed.

The outcome is that jsonl and json outputs are consistent, but it allows users to change these if needed just as with to_file.

chslink · 2016-05-11T07:47:40Z

Sorry,I was too careless...
My question is this:

data = seq(1,2,3,4)
print data.drop(1)
print data

Output:

[2, 3, 4]
[1, 2, 3, 4]

Obviously, element isn't really drop from the sequence...

EntilZha · 2016-05-11T09:42:25Z

The behavior you describe is actually exactly the intended behavior. Once a seq is initialized it is for most purposes an immutable stream.

It would be better to think about the function calls as a declarative sequence of transformations rather than procedural directives. For example, I would interpret your example as:

data.drop(1): Give me data after one element has been dropped
data: Give me data

This can be seen a little more clearly by doing this:

print(data.drop(1)._lineage)
# Lineage: sequence -> drop(1)
print(data._lineage)
# Lineage: sequence

In other words, any mutations to a sequence are not in place, but instead describe a pipeline of transformations to apply to the original sequence.

On the bug, I just pushed a fix in 4a3735b

UPDATE: I decided to remove the encoding parameter/option. In python 3 json gets loaded/dumped as unicode so it makes sense to keep consistency with this (python 2 has the encoding option, but that is removed/deprecated in python 3).

Here is the code output now:

$ python
Python 2.7.11 (default, Jan 19 2016, 14:37:25)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from functional import seq
>>> testlist = [
...         {"id": 1, "key": u"some unicode 哈哈哈 1"},
...         {"id": 2, "key": u"some unicode 哈哈哈 2"}
...     ]
>>>
>>> data = seq(testlist)
>>> data.to_json("r.json")
>>> data.to_jsonl("r.jsonl")
>>> exit()
$ cat r.json
[{"id": 1, "key": "some unicode \u54c8\u54c8\u54c8 1"}, {"id": 2, "key": "some unicode \u54c8\u54c8\u54c8 2"}]
$ cat r.jsonl
{"id": 1, "key": "some unicode \u54c8\u54c8\u54c8 1"}
{"id": 2, "key": "some unicode \u54c8\u54c8\u54c8 2"}

It looks like it works, but could you verify on your end to be sure? The best way to do this would be to do one of these inside a virtualenv:

pip install git+git://github.com/EntilZha/PyFunctional.git
clone the github repo and do python setup.py install

chslink · 2016-05-12T01:39:20Z

I'd known what you explaining about drop() behavior.
I'd already tested,nice work.

EntilZha · 2016-05-12T03:28:42Z

Sounds like everything is good so resolving this issue. Feel free to open another issue if you find another bug. Thanks for the bug report!

EntilZha added the bug label May 9, 2016

EntilZha added this to the 0.7.0 milestone May 9, 2016

EntilZha self-assigned this May 9, 2016

EntilZha added a commit that referenced this issue May 11, 2016

Work to fix bug found in issue #70

4a3735b

EntilZha closed this as completed May 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

to_json/to_jsonl does not have encoding options when exporting file? #70

to_json/to_jsonl does not have encoding options when exporting file? #70

chslink commented May 9, 2016 •

edited

Loading

EntilZha commented May 9, 2016

chslink commented May 10, 2016

EntilZha commented May 11, 2016

chslink commented May 11, 2016 •

edited

Loading

EntilZha commented May 11, 2016 •

edited

Loading

chslink commented May 12, 2016

EntilZha commented May 12, 2016

to_json/to_jsonl does not have encoding options when exporting file? #70

to_json/to_jsonl does not have encoding options when exporting file? #70

Comments

chslink commented May 9, 2016 • edited Loading

EntilZha commented May 9, 2016

chslink commented May 10, 2016

EntilZha commented May 11, 2016

chslink commented May 11, 2016 • edited Loading

EntilZha commented May 11, 2016 • edited Loading

chslink commented May 12, 2016

EntilZha commented May 12, 2016

chslink commented May 9, 2016 •

edited

Loading

chslink commented May 11, 2016 •

edited

Loading

EntilZha commented May 11, 2016 •

edited

Loading