-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Apache Commons CSV for parsing external secondary instances #644
Conversation
} | ||
} | ||
|
||
@Before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @Before
at the bottom really threw me off. Obviously still works here, but I think for readability it should move up to the top.
try (BufferedReader reader = new BufferedReader(new FileReader(path))) { | ||
String header = reader.readLine(); | ||
|
||
if (header.contains(";")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this means a CSV like hello; world, blah
would have columns hello
and world, blah
rather than hello; world
and blah
(which I might have meant). I think that's ok, as parsing these things with more than one delimiter and always getting it right is impossible, but just thinking out loud.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to call this out, sorry! It feels like having punctuation in column headers would be very unusual. I can kind of imagine it with commas maybe? Something cruel like a column header of "First, Last Name"
? But I really can't imagine using semicolons in a column header so that's why it felt safest to explicitly test for semicolon. If we really wanted to be fancy we could try to account for quotes in headers but that really felt like overkill. I'd like to go with the limitation you pointed out and adjust if someone has a legitimate usecase for a semicolon in the header (please, no!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which reminds me of something else I meant to mention -- I briefly considered also supporting tabs but again, I think we should wait until someone explicitly asks for it. The C in CSV is comma.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C in CSV is comma.
This.
@@ -184,14 +183,14 @@ public void realInstanceIsResolved_whenFormIsDeserialized_afterPlaceholderInstan | |||
|
|||
@Test | |||
// Clients would typically catch this exception and try parsing the form again which would succeed by using the placeholder. | |||
public void fileNotFoundException_whenFormIsDeserialized_afterPlaceholderInstanceUsed_andFileStillMissing() throws IOException, DeserializationException { | |||
public void ioException_whenFormIsDeserialized_afterPlaceholderInstanceUsed_andFileStillMissing() throws DeserializationException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might just be being slow today, but I'm missing the reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! In versions 1.6+ of the lib some other IOException
exception is thrown. It's not in Android 21 so that's why I had to downgrade the library further. Forgot to clean this up. I rebased -- hopefully you won't find it too hard to follow.
import org.junit.Before; | ||
import org.junit.Test; | ||
|
||
public class CsvExternalInstanceTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having multiple cases in two separate files made this a little hard to follow. I kept having to jump around to see what was really being tested. Not a big deal, but maybe a tweak would be to just have smaller CSVs inline and written to a file that gets parsed in the tests or just smaller more specific CSV test resources? Maybe I'm missing the point, and we need multiple cases in one file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't disagree with you. I used the existing tests to get to the coverage I wanted as quickly as possible. I split the window vertically to see the test and resource side by side. If you feel strongly about this, can you please push a commit? I doubt we'll need to look at these tests again any time soon so only if it's really important to you to spend the extra time!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went to spend 10 mins doing this but caught up with some caching problem in IntelliJ so will leave it for the moment. I did a quick rename though: I think part of my problems reading the test were probably due to the tabSeprated
field, which was the more obvious problem we both missed!
3f1060c
to
c2d206c
Compare
Excel export format for CSV is based on the document locale. For some locales such as French/France, semicolon is used as the delimiter.
c2d206c
to
d176628
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to merge if you're happy with my tweak
Oh dear. Thank you and I’m sorry. 🤪 |
Closes #616
Replaces #628
What has been done to verify that this works as intended?
Added tests for both comma-separated and semicolon-separated external secondary instances that have quotes, missing values, etc. I've also tried this in Collect with https://github.com/lognaturel/collect/tree/jr-616
Why is this the best possible solution? Were any other approaches considered?
Using an external library means we don't have to think through all the cases.
commons-csv
is broadly used. We had to use v1.4 or prior in order to support a minsdk of 21. If I'd thought of it sooner, I would have suggested we look intoopencsv
which Collect uses but I don't think it's a big deal that the libs aren't the same.Because Excel exports CSVs with semicolons for certain locales, I think it's good to support that case. The best way I could come up with to do that is to check whether the header contains a semicolon.
How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?
This should only expand the kind of external secondary instance CSVs that can be successfully parsed. I can't think of a regression risk. Risk would be isolated to parsing external secondary CSVs.
Do we need any specific form for testing your changes? If so, please attach one.
See issue or use any form that uses a secondary instance such as https://github.com/getodk/javarosa/blob/3f1060ce34a2155856d0f819ce42451897e428a4/src/test/resources/org/javarosa/core/model/instance/external-secondary-comma-complex.csv
Does this change require updates to documentation? If so, please file an issue here and include the link below.
No.