-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dialect support #1060
Dialect support #1060
Conversation
edd3545
to
ec0c789
Compare
The tag field can be used as a general expression tag. In this branch only dialect component tag will be implemented. To keep 32-byte size on 64-bit machines, change the cost from double to float, and the type field to 8-bits.
Forced-push a rebase on #1061. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging.
and the dialect components are assigned costs as defined there. | ||
Names with values are dialect components and their values. A missing | ||
value after a ":" delimiter denotes a "very high" cost (to disable | ||
the related disjuncts). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you say ":" but in the example below you use an equals-sign bad-spelling=2.2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be fixed.
drinkin.#drinking-v: [[drinking.v]0.05]bad-spelling; | ||
drinkin'.#drinking-v: [[drinking.v]0.05]bad-spelling; | ||
runnin'.#running-v: [[running.v]0.05]bad-spelling; | ||
kidnappin'.#kidnapping-v: [[kidnapping.v]0.05]bad-spelling; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after the merge, I plan to change these and many others to "colloquial" instead of "bad-spelling"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also define a parent dialect that will include both, e.g.
[alt-spell]
bad-spelling
colloquial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since bad-spelling
and colloquial
are the default, maybe instead:
[no-alt-spelling]
bad-spelling:
colloquial:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, don't get too fancy too quick. I'm already finding this combination confusing:
[default]
no-headline
[no-headline]
headline: 4
[headline]
headline: 0
I mean, I understand what it does, but I have to think about it first, and its confusing and for a while it seems broken or wrong somehow. it matches the spec, but ... it's confusing.
@@ -61,25 +62,27 @@ int size_of_expression(Exp * e) | |||
return size; | |||
} | |||
|
|||
Exp *copy_Exp(Exp *e, Pool_desc *Exp_pool) | |||
Exp *copy_Exp(Dictionary dict, Exp *e, Pool_desc *Exp_pool, Parse_Options opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that dict
isn't used anywhere; I assume that maybe an earlier version had the cost_table
in dict
instead of opts
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is indeed copy&paste from an earlier version. But I checked and that earlier version doesn't use this parameter too, and it is a copy&paste from even an earlier one...
The implemented API here is similar to my original proposal.
The code contains also an incomplete and undebugged implementation of the "many API functions" approach for user dialect definitions, but I abandoned this code path when I realized it would be a nightmare to actually use it in
link-parser
.The implemented
link-parser
UI is a string value assigned to the!dialect
variable, which is just a section in the dialect file4.0.dialect
, written as one line with commas instead of newlines.!help dialect
explains this format and provides examples (of course this explanation should be rewritten to use the correct terminology, and the examples should be corrected to real ones). Additional explanations are in the the dialect file. An additional major benefit of this "string" dialect interface is that if the dialect support of the library is enhanced, no changes are needed in client code (likelink-parser
) in order to use new features).Some variables names are left from my old code which implements "expression tags". I would like to rename them to the corresponding "dialect" terminology after we decide on it. The terminology in the dialect file and code comments may need changes too.
Dialect definitions in the dialect file may be hierarchical, when a dialect may contain sub-dialects. Dialect components (the expression tags that actually appear in the dictionary) are a kind of terminal sub-dialects in this hierarchy, and of course only them can be assigned costs. The dialect definition decoding code checks for loops.
The current default cost for dialect components is 10000 (it could be 10 too). For normal disjunct_cost values, it is just as infinity. The reason
FLT_MAX
is not used instead is that a value like 1000 is displayed nicely on debugging. It also has a "benefit" that all the dialects can be used at once by e.g. issuing!cost_max=10001
.I already have code improvement ideas, but I think it is better to wait to the final terminology decision and to a wider dialect support in the dict.
Edit: This PR implements the dialect idea from issue #402 and the bad-spelling dialect from issue #404.