Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve specification conformance of unicode bidi library-Initial step #19

Merged
merged 2 commits into from
Nov 13, 2015
Merged

Improve specification conformance of unicode bidi library-Initial step #19

merged 2 commits into from
Nov 13, 2015

Conversation

moharnab123saikia
Copy link

Files changed: src/lib.rs and tools/generate.py

•added code to tools/generate.py to download the files: http://www.unicode.org/Public/UNIDATA/BidiTest.txt and http://www.unicode.org/Public/UNIDATA/BidiCharacterTest.txt
•wrote manual rust tests in lib.rs that can be run automatically

Review on Reviewable

@eefriedman
Copy link
Contributor

Please squash the commits.

Please remove the extra lib.rs~.

Stray tab in generate.py.

for class in rem_classes etc. in test_removed_by_x9.

In Rust code, space before function opening brace, and no blank lines after the opening brace or before the closing brace.

@metajack
Copy link

metajack commented Nov 3, 2015

@eefriedman can you re-review? Or should @mbrubeck review?

@eefriedman
Copy link
Contributor

@metajack Re-review what? My initial review comments still weren't addressed.

@moharnab123saikia We can't merge a PR with failing tests; change the test to reflect the current state of the code, and add a comment like // TODO(#999): This should equal "foo" (replace 999 with the issue number for the issue you filed).

@metajack
Copy link

metajack commented Nov 3, 2015

@eefriedman Sorry. I assumed he had addressed them since there were new commits.

@samruds1
Copy link

samruds1 commented Nov 8, 2015

@eefriedman - we have made the changes. Can you please review the changes?

@eefriedman
Copy link
Contributor

Review status: 0 of 5 files reviewed at latest revision, 8 unresolved discussions.


src/lib.rs, line 980 [r8] (raw file):
Needs some sort of note about why the test is commented out.


src/lib.rs, line 1003 [r8] (raw file):
Missing newline.


src/lib.rs, line 1009 [r8] (raw file):
for x in rem_classes.


src/lib.rs, line 1012 [r8] (raw file):
for x in not_classes


src/lib.rs, line 1019 [r8] (raw file):
Space after commas.


src/lib.rs, line 1020 [r8] (raw file):
for x in non_x9_classes.


src/lib.rs, line 1021 [r8] (raw file):
Space after comma.


tools/generate.py, line 199 [r8] (raw file):
Indentation, and please limit comments to 100 columns where possible.


Comments from the review on Reviewable.io

@samruds1
Copy link

samruds1 commented Nov 9, 2015

@eefriedman Please review.

@eefriedman
Copy link
Contributor

Review status: 0 of 5 files reviewed at latest revision, 5 unresolved discussions.


src/lib.rs, line 980 [r9] (raw file):
Indentation. (Tab?)


src/lib.rs, line 1013 [r9] (raw file):
Space before opening brace (same for other loops).


tools/generate.py, line 196 [r9] (raw file):
Are you sure reindenting the if statement is correct? It changes the semantics...


tools/generate.py, line 198 [r9] (raw file):
Trailing whitespace


Comments from the review on Reviewable.io

@eefriedman
Copy link
Contributor

@mbrubeck Do we want the testcase files in this repository?

@vicky-katara
Copy link
Contributor

@eefriedman, we have made the requested changes. Please let us know if you need any other changes. The only change that we have left out is the commented test case. This test case was failing and should get handled in the next steps. We have left it commented there for reference, and will remove the steps when the further steps have been completed.

@samruds1
Copy link

@eefriedman, @mbrubeck - Do we have to incorporate anymore changes? Can you please review.

@eefriedman
Copy link
Contributor

I'm waiting for an answer to #19 (comment) ; looks fine otherwise.

@mbrubeck
Copy link
Contributor

Thanks! The new tests look good.

Since the .txt files aren't used yet, let's leave them out of the repo (but keep the code to fetch them). Once we have some code to process the files, we can decide whether it makes sense to check them in.


Reviewed 3 of 4 files at r1, 1 of 1 files at r10, 1 of 1 files at r11.
Review status: all files reviewed at latest revision, 6 unresolved discussions.


src/BidiCharacterTest.txt, line 0 [r11] (raw file):
I think we should leave these files out of the repo at least for now.


Comments from the review on Reviewable.io

@samruds1
Copy link

@mbrubeck - I have commented out the code and deleted those two files from the repository. Please review the changes.

@eefriedman
Copy link
Contributor

Please fix the commit message (it currently contains a bunch of useless text from squashing).

@moharnab123saikia
Copy link
Author

@ https://github.com/eefriedmaneefriedman https://github.com/eefriedman -
we have deleted the garbage messages and only kept the details about the
changes we made. Please review.

Thanks and Regards,
Moharnab Saikia

Please fix the commit message (it currently contains a bunch of useless
text from squashing).


Reply to this email directly or view it on GitHub
#19 (comment).

added test is_ltr

Added test is_rtl

added test removed_by_X9

added test not_removed_by_x9()

Added test case to test_reorder_line()
@eefriedman
Copy link
Contributor

It looks like the new version isn't rebased correctly (GitHub shows two commits).

@vicky-katara
Copy link
Contributor

@eefriedman, we only see one commit from our side which is 'unicode-bidi/ 2a99bca'. Do you want us to squash our commit into your commit that is 'unicode-bidi/ 69e42af'

@eefriedman
Copy link
Contributor

You branch contains commit 69e42af, which doesn't exist on master (the equivalent commit on master is a202520).

@samruds1
Copy link

@eefriedman - We are planning to reset and rebase our commits again. Maybe that will solve the issue.

@samruds1
Copy link

@eefriedman - When I clone the unicode-bidi repository (with none of our changes added) and run the tests defined in lib.rs, a folder called target is getting created. After I add our changes, should I let the folder remain in the repository and push the changes?

@mbrubeck
Copy link
Contributor

target is where Cargo puts compiled code and other files it generates. Git is configured to ignore this directory (using the .gitignore file), so you can leave it in place as you commit and push changes.

@samruds1
Copy link

@mbrubeck - Thanks for the clarification. Can you please review the changes.

@vicky-katara
Copy link
Contributor

@mbrubeck @eefriedman , we wanted some clarification.

  • How are generate.py and Cargo linked?
  • Should 'cargo test' call generate.py somehow?
  • What should be the location of 'UnicodeData.txt' and 'ReadMe.txt'?

@eefriedman
Copy link
Contributor

Currently, generate.py and Cargo aren't linked; generate.py modifies the source code. We could change this so generate.py generates a build artifact instead, I guess... but that isn't obviously worthwhile.

Note that you wouldn't want the build process to fetch data from unicode.org: that would make the resulting binary unpredictable.

What should be the location of 'UnicodeData.txt' and 'ReadMe.txt'?

Currently, nowhere; the fact that the current generate.py saves the files into the current working directory is a bug.

If we wanted to add the files to the repository, we could add a unicode/ directory, I guess, and add a separate script to update those files. That said, I'm not sure that's necessary.

@mbrubeck
Copy link
Contributor

How are generate.py and Cargo linked?

Currently, generate.py is run manually, and its output (src/tables.rs) is checked in to the git repository. This way, we only need to run generate.py when upgrading to a new version of Unicode, which happens relatively infrequently.

Should 'cargo test' call generate.py somehow?

I suggest we do the same thing we do for the non-test code: Run generate.py manually, and check in all the files needed for running tests (either the .txt files, or generated Rust files that contain the same data). This has the disadvantage of making the repository larger, but it should make our test process more reproducible (since it doesn't depend on network resources outside our control).

What should be the location of 'UnicodeData.txt' and 'ReadMe.txt'?

src/UnicodeData.txt and src/ReadMe.txt are temporary files used only by generate.py and should not be checked in to the repository. They are listed in the .gitignore file.

@vicky-katara
Copy link
Contributor

@eefriedman and @mbrubeck, thanks for the detailed replies.

As far as we see, all are changes are in place. Do you see any more changes? Or are we good to go on the merge?

@mbrubeck
Copy link
Contributor

This looks good to me. Thanks!


Reviewed 3 of 4 files at r12.
Review status: all files reviewed at latest revision, 6 unresolved discussions.


Comments from the review on Reviewable.io

@mbrubeck
Copy link
Contributor

Oops, there's still a duplicate commit in there (accidentally created in a rebase?) but it doesn't seem to cause any issues so I'll just let this go through.

@mbrubeck
Copy link
Contributor

@bors-servo r+

@bors-servo
Copy link
Contributor

📌 Commit 2a99bca has been approved by mbrubeck

@bors-servo
Copy link
Contributor

⌛ Testing commit 2a99bca with merge 06dedd1...

bors-servo pushed a commit that referenced this pull request Nov 13, 2015
Improve specification conformance of unicode bidi library-Initial step

Files changed: src/lib.rs and tools/generate.py

•added code to  tools/generate.py  to download the files: http://www.unicode.org/Public/UNIDATA/BidiTest.txt and http://www.unicode.org/Public/UNIDATA/BidiCharacterTest.txt
•wrote manual rust tests in lib.rs that can be run automatically

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/servo/unicode-bidi/19)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

☀️ Test successful - travis

@bors-servo bors-servo merged commit 2a99bca into servo:master Nov 13, 2015
@vicky-katara
Copy link
Contributor

Hi @mbrubeck , @eefriedman

We have the following queries about the subsequent steps of our project as mentioned here: https://github.com/servo/servo/wiki/Improve-specification-conformance-of-unicode-bidi-library

  • How do we interpret test cases mentioned in BidiTest.txt?
    For example: @levels: x
    @reorder:
    LRE; 7
    LRO; 7
    RLE; 7
    RLO; 7
    PDF; 7
    BN; 7
            #Count: 6

What function from lib.rs should we call to create test cases present in BidiTest.txt.

  • What function from lib.rs should we call to create test cases present in BidiTest.txt? Is it reorder-line?
  • What do 'passes' specifically mean in context of "applies steps W1-W7 in a single pass". Are they character by character scans of the input line? Or are they procedure calls of resolve_weak?

@eefriedman
Copy link
Contributor

There's some basic documentation for the meaning of the bidi tests at the top of BidiTest.txt. Each test line consists of a list of bidi character classes. For each test line (lines without a starting # or @), two things should get checked: the level of each character (essentially the levels array returned from process_text()) and the visual order of the resulting text (essentially the result of reorder_line()). The relevant level and ordering are specified by the @levels and @reorder lines. (The very first test in the file is kind of confusing to look at because it's a test which doesn't actually contain any text.)


I'm pretty sure https://github.com/servo/unicode-bidi/blob/master/src/lib.rs#L687 is referring to the fact that the function only examines each character in the string once; it has nothing to do with the callers.

@mbrubeck mbrubeck mentioned this pull request Nov 30, 2015
@samruds1
Copy link

samruds1 commented Dec 1, 2015

@eefriedman Thanks for the explanation. I am unable to understand what the number 7 denotes in the test cases given below:
L LRE R R; 7
L; 7
L LRE; 7
Can you please help me out.

@eefriedman
Copy link
Contributor

# A data line has the following format:
# <input> ; <bitset>
#   <input>  =      An ordered list of BIDI property values
#   <bitset> =      A hex bitset for paragraph levels (P): 1 = auto-LTR, 2 = LTR, 4 = RTL
#                   Auto-LTR (standard BIDI) uses the first L/R/AL character, and is LTR if none is found.

It's a bitset, so 7 means 1|2|4, i.e. the test applies in contexts with an automatic paragraph level, a forced LTR paragraph level, and a forced RTL paragraph level; see http://unicode.org/reports/tr9/#The_Paragraph_Level .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants