Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tx] Replace UFO .glif file parsing with libxml2 #1556

Merged
merged 46 commits into from
Oct 12, 2022
Merged

Conversation

kaydeearts
Copy link
Collaborator

@kaydeearts kaydeearts commented Aug 31, 2022

Description

This PR replaces UFO .glif file parsing code with libxml2 #1522 as part of the tx parser replacement with libxml2 project. It replaces quite a few parsing functions as well as adds some.

I'll be adding tests soon as part of this PR.

Checklist:

  • I have followed the Contribution Guidelines
  • I have added test code and data to prove that my code functions correctly
  • I have verified that new and existing tests pass locally with my changes
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation

@lgtm-com
Copy link

lgtm-com bot commented Aug 31, 2022

This pull request introduces 4 alerts when merging 5aa50b1 into adb9656 - view on LGTM.com

new alerts:

  • 3 for Missing return statement
  • 1 for Wrong type of arguments to formatting function

@kaydeearts
Copy link
Collaborator Author

kaydeearts commented Sep 1, 2022

@josh-hadley @skef this is marked as a draft while I add more tests to the PR, but if you'd like to, you can start reviewing whenever available!

Note 1: I'll fix the lgtm alerts soon!

Note 2: I think there will be some refactoring needed in the future, perhaps not necessarily in this PR— I don't like that I had to keep adding global state tracking variables defined at the top of uforead. I would like to make this cleaner. I will still need to track which part of the UFO is being parsed, but I'll manage this better. Along with this, I'm planning on splitting the setFontDictKey function into a function for each part of the UFO, rather than have all of those keys parsed in one function. I may do all of this refactoring in a different PR, but am open to feedback on this anytime!

Thanks 😄

c/shared/source/uforead/uforead.c Outdated Show resolved Hide resolved
c/shared/source/uforead/uforead.c Outdated Show resolved Hide resolved
c/shared/source/uforead/uforead.c Outdated Show resolved Hide resolved
c/shared/source/uforead/uforead.c Outdated Show resolved Hide resolved
c/shared/source/uforead/uforead.c Outdated Show resolved Hide resolved
c/shared/source/uforead/uforead.c Show resolved Hide resolved
Copy link
Collaborator

@josh-hadley josh-hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor twiddles

c/shared/source/uforead/uforead.c Outdated Show resolved Hide resolved
tests/makeotf_test.py Show resolved Hide resolved
@lgtm-com
Copy link

lgtm-com bot commented Sep 24, 2022

This pull request introduces 2 alerts when merging efa4bb7 into f0e7023 - view on LGTM.com

new alerts:

  • 2 for Declaration hides parameter

@kaydeearts kaydeearts force-pushed the kd-libxml2-glyphs branch 6 times, most recently from 3b7fe6f to c895138 Compare September 29, 2022 01:16
@kaydeearts kaydeearts marked this pull request as ready for review September 29, 2022 02:12
@@ -271,6 +282,14 @@ struct ufoCtx_ {
ctlStreamCallbacks stm;
} cb;
dnaCtx dna;
struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a request for a change, but generally with these state structs you can rely on the struct name being part of the string. So no need to include "parsing" in the field name when the struct name is "parseState".

const char* filetype = "glyph";
int char_begin = 0;
int char_end = 0;
unsigned long *unicode = memNew(h, sizeof(unsigned long));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see a need for allocating this as opposed to declaring unicode as unsigned long and passing &unicode where needed. And that's certainly better than leaking the data, which I think is what is currently happening.

return cur;
}

static bool strEqual(char* string1, char* string2) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not declaring these string arguments as const char * will lead to many compiler warnings.

if ((!xmlStrEqual((cur)->name, (const xmlChar *) "dict"))) {
xmlFreeDoc(doc);
fatal(h, ufoErrSrcStream, "Error reading outermost <dict> in %s.\n", filename);
if (!strcmp("plist", filetype)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're doing strEqual() should we use it here too?

if (h->parseState.UFOFile == preParsingGLIF) { /* when called from preParseGLIF */
if (strEqual(keyName, "com.adobe.type.cid.CID")) {
keyValue = parseXMLKeyValue(h, cur);
h->parseState.GLIFInfo.currentCID = atoi(keyValue);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to pass NULL to atoi()? We advanced cur above so it could be NULL (?) and therefore the value passed back from parseXMLKeyValue() could be NULL. (Same question WRT next code section below.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I just created a test case for this and it crashes. Would you suggest I add the NULL check to right before setting h->parseState.GLIFInfo.currentCID, or should I return a value like -1 instead of NULL in parseXMLKeyValue so that it fixes this scenario for other callers?

I also realized that if the value of com.adobe.type.cid.CID is a blank value (like the example below) then com.adobe.type.cid.CID is set to 0. I feel like it also probably needs to remain -1 (which is currently what it starts with when unset).

      <key>com.adobe.type.cid.CID</key>
      <integer></integer>

I'll look for more places where NULL may be returned from parseXMLKeyValue and the other functions. Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think being more explicit with NULL checks is probably better for this. Returning "fake" values can eventually result in those values being used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Although I'd have to look in more detail to give you a conclusive answer ...)

parseType1HintDataV2(h, cur->children);
}
}
cur = cur->next;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could cur be NULL here because we advanced it once already?

@lgtm-com
Copy link

lgtm-com bot commented Oct 6, 2022

This pull request introduces 4 alerts when merging de56e35 into 3920d69 - view on LGTM.com

new alerts:

  • 4 for Missing return statement

@@ -1682,32 +1681,39 @@ static char* getXmlAttrValue(xmlAttr *attr){
return (char*) attr->children->content;
}

static void parseXMLLib(ufoCtx h, xmlNodePtr cur) {
static bool setXMLLib(ufoCtx h, xmlNodePtr cur, char* keyName) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've declared this bool, presumably to track whether you ran into some problem parsing (given the check on the call), but you're not returning anything.

} else if (h->parseState.UFOFile == parsingGLIF) { /* when called from parseGLIF */
if (strEqual(keyName, "com.adobe.type.autohint.v2")) {
h->parseState.type1HintDataV2 = true;
if (cur != NULL)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If cur is NULL, do we want to just ignore the fact there was a recognized key but no value here? Seems like that should be a warning at least.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, we've been aiming to reduce the amount of warnings until we add the verbosity flag option. However, I'm curious what @frankrolf thinks about this specifically. Frank, if within a UFO's .glif file's there is an empty com.adobe.type.autohint.v2, would it be helpful to the user to receive a warning?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My experimental repo branch (not yet copied back from my machine) has already pulled all the C-code-level logging messages into a single framework with multiple "levels". So after we start working with that code it will be easier to unify the logging across all of our tools.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! So we can keep the warning messages for when we work with your changes

static bool setType1HintDataV2(ufoCtx h, char* keyName, HintMask* curHintMask, xmlNodePtr cur) {
char* keyValue;
if (strEqual(keyName, "hintSetList")) {
parseHintSetListV2(h, cur);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all of the uses of cur in this method NULL-safe? Seems like since you parsed a key at the level above already, it should always have a value, it would be easier to check whether cur is NULL above, deal with that condition there, and always be sure cur is not NULL here.

@kaydeearts
Copy link
Collaborator Author

@skef @josh-hadley This PR is up to date with the latest changes you both requested, and I've added more tests covering edge cases & etc. Let me know if you have any more thoughts on this!

Copy link
Collaborator

@skef skef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we should nail down what the return value of stm_xml_read represents, as it's sort of being used as a weird boolean right now, but I don't see that as a reason to hold things up.

@kaydeearts kaydeearts merged commit 775c140 into develop Oct 12, 2022
@kaydeearts kaydeearts deleted the kd-libxml2-glyphs branch October 12, 2022 00:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants