-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VAL_ parsing error #61
Comments
Hello @seobilee , you are right, unfortunately at the moment the library relies on the file being escaped in a specific way, we do expect a command to start at the beginning of the line. Multiline support is present, but only for some commands (e.g. comments)
Unfortunately, this would not be an improvement, this would be a complete rewriting of the parsers. We do contribute to this project in our spare time, so I cannot promise this rewriting will happen anytime soon. To be fair, that would be right thing to do but it would take a lot of time. In the vast majority of cases line by line works fine. Bottom line: thanks for spotting this, we'll keep you posted but realistically nothing will happen in the short term Regards |
Hello @Adhara3, I want to express my gratitude for your prompt response. In light of your guidance, I will make the necessary adjustments by writing the dbc file in one line to address the issue. Additionally, your information regarding I also wanted to inform you that, due to the circumstances mentioned, we are presently utilizing tag version 1.2.0. I am eager to contribute to a resolution code-wise and will explore opportunities to do so. Once again, thank you for your assistance. Warm regards |
Hello @seobilee , sorry, could you please elaborate on why you are using version 1.2.0? Is it related to the issue above? Regards |
Hello @Adhara3 Yes, the observed issue is indeed connected. There has been a modification in the parser's behavior, starting from version 1.3.0 or later. The change primarily involves parsing using regular expressions. In version 1.2.0, a line is parsed even if it does not end with ";".
To illustrate, consider the following example:
In version 1.2.0, only the first line is parsed as a VALUE TABLE, with the second line being omitted. However, later versions cannot parse the above example, and there are some additional minor issues, such as quotes(") being included in the value of the Dictionary of the Value Table. Regards |
Hi @seobilee, you are right, since version Regards |
I just did a quick check if you could add general multi line support for the parser (in this case firstly the val parser) without completly changing it up and with minimal effort yet probably not and amzing solution too. i came up with the following (one a proof of concept) : private static readonly IReadOnlyCollection<string> keywords = new List<string> { "SG_", "CM_", "BO_", "BA_", "VAL_" }; //No complete list just a mock
private static string TryparseNextLines(string currentLine, IParseFailureObserver observer, INextLineProvider nextLineProvider)
{
var stringBuilder = new StringBuilder();
stringBuilder.AppendLine(currentLine);
while (nextLineProvider.TryGetLine(out var nextLine))
{
var cleanLine = nextLine.Trim(' ');
if (cleanLine == string.Empty)
{
observer.CurrentLine++;
break;
}
if (keywords.Any(keyword => cleanLine.StartsWith(keyword)))
{
break;
}
observer.CurrentLine++;
stringBuilder.Append(" " + cleanLine);
if (cleanLine.EndsWith(";"))
{
break;
}
}
return stringBuilder.ToString().Replace("\r\n", string.Empty) + "\r\n";
} this would read lines until either it sees an empty line or the line ends with ";" or the line is the next value (marked by a keyword). if you the use if (!cleanLine.EndsWith(";"))
{
cleanLine = TryparseNextLines(cleanLine, m_observer, nextLineProvider);
} before the actual regex match in the ValueTableLineParser it combines the two rows into one for parsing. this worked for me in some test however i notices that:
@Whitehouse112 could you check on 1 and 2 when you find time? |
so i looked into this further specially in the part that Actually there is an exact testcase currently present that even checks that this doesnt get parserd in ValueTableLineParserTests.cs: I dont know why this testcase is there as the syntax is valid and just missing a final " " which isnt required in my opinion. this is the parsing string: You can fix that by just replacing it with \s* and that would only fail the one UnitTest listed above which in my opinion is wrong. All other tests are successfull. For the test case i would change it to something like this as this in my opinion is a bit clearer than using the mock behavior strict. public void ValueTableSyntaxErrorIsObserved(string line)
{
var observerMock = m_repository.Create<IParseFailureObserver>();
var dbcBuilderMock = m_repository.Create<IDbcBuilder>();
var nextLineProviderMock = m_repository.Create<INextLineProvider>();
int counter = 0;
observerMock.Setup(o => o.ValueTableSyntaxError())
.Callback(() => counter++);
var lineParser = new ValueTableLineParser(observerMock.Object);
lineParser.TryParse(line, dbcBuilderMock.Object, nextLineProviderMock.Object);
Assert.That(counter, Is.EqualTo(1));
} This part of the issure could be solved. The other part with the multiline support is just a matter of if the ";" is actually required and if it is then use the same code as in the comment multiline stuff or if it isnt using the code i posted before @Adhara3, @Whitehouse112 or @EFeru could one of you check the thing i just posted about with the testcase actually being valid. In that case i could create a fix for that and change the testcase (move it to the valid testcases and use the \s* instead. |
Ok, let's stop a second here. VAL_ parsing without space before ";"This is a bug in the regex due to the fact that Multiline supportHistoryIn the past we allowed multiline support in an akward way but for a very specific reason and in a very specific scenario:
This comment is somehow formatted, the spacing and the newline have a meaning for the user to help visualize. Other examplesCurrent issue is about a TAG (
Note that
The first NB: Also note that both the above examples
Even moreWhat about this, a single line containing 2 separate definitions?
We currently skip the second one, both CANdb++ and Kvaser parse both correctly. Bottom lineDBCParserLib is currently relying on some sort of line based formatting that is respected in most of the examples we found on the net, including the ones we use for unit tests. Moreover the spacing/new line is also applied by CANdb++ when it saves a dbc file. We are basically assuming that there is a value in the file formatting. The doc is not clear on this and even CANdb++ and Kvaser do behave differently. So current issue spots a potential parsing issue that is nevertheless coherent with the current library approach that requires a dbc file to be gracefully formatted. Good or bad, this is currently a requirement for us. So to me this issue currently is a won't fix. If we want to support unformatted dbc files, then we need to (hopefully partially) rewrite the parsing approach not considering line as something valuable, meaningful or complete/single item. Cheers |
…dded and updated some tests
Hi @Uight,
Regarding ";": it's a required final char as stated in vector dbc file format documentation |
#61 updated VAL_ parsing regex to correctly handle last chars. Added …
@Adhara3 i would think that vector mostly scans for the keywords. A bit like what i tried with the comment from 2 weeks ago but not line based but char based; |
this is a 13 year old code lying around in one of our old testsoftwares in my company. I adjusted it for .net and to read the two dbc files taht are in this repo. string fileContent = File.ReadAllText(filePath);
string[] keywords = new[] {
"VERSION",
"FILTER",
"NS_DESC_",
"NS_",
"CM_",
"BA_DEF_DEF_REL_",
"BA_DEF_REL_",
"BA_REL_",
"BA_DEF_SGTYPE_",
"BA_SGTYPE_",
"BA_DEF_DEF_",
"BA_DEF_",
"BA_",
"CAT_DEF_",
"CAT_",
"SGTYPE_VAL_",
"SGTYPE_",
"SIGTYPE_VALTYPE_",
"VAL_TABLE_",
"VAL_",
"SIG_GROUP_",
"SIG_VALTYPE_",
"SIG_TYPE_REF_",
"EV_DATA_",
"ENVVAR_DATA_",
"BO_TX_BU_",
"BO_",
"BU_SG_REL_",
"BU_EV_REL_",
"BU_BO_REL_",
"BU_",
"SG_MUL_VAL_",
"SG_",
"BS_",
};
string pattern = $@"(?={string.Join("|", keywords.Select(Regex.Escape))})";
string[] parts = Regex.Split(fileContent, pattern, RegexOptions.IgnoreCase);
List<string> resultLines = new List<string>();
foreach (string part in parts)
{
resultLines.Add(part.Trim());
} i dont like that it reads the whole file in one swoop but with the pcs that run modern .net i would expect an issue even with big files. in fact i tried it with an 2.5MB file that im not allowed to share and it takes 1,8seconds for just the line readings. Its seems it takes around 1s for 1MB but its not fully linear. For the tesla dbc it takes 50ms. Another thing i dont really like is that you have to order the keywords. If this could be done in with a file stream instead i think this would be a good option. You can then parse the "lines" with the current parsers. Another thing is stuff like this: |
There are tons of subtle problems with this approach, like the Loading the whole file should not be a problem, but you would lose the ability to work over a network. We switched to A |
Closing as won't fix now |
@Adhara3 i would still consider doing some improvments to the code for this. I did some stuff in: https://github.com/Uight/DbcParser/tree/TestBranchMultiLineSupport In this case i moved the m_obersver.CurrentLine++ logic to the Nextline Provider an generally added some stuff in there. However moving the trim and m_obersver.CurrentLine++ logic to the nextline provider is a cleaner solution than now and i would still want to see if we could adjust the NextLine provider in a similar way to what i did in my branch (which was based partly on this: https://stackoverflow.com/questions/842465/reading-a-line-from-a-streamreader-without-consuming If using virtual lines like in my code this would actually be 1 line containing 2 virtual lines; This could be parsed by peaking nextline and checking if its starts with an definition identifier. This is based on that in EBNF you can not have an definition identifier at line start if its not a definition (at least thats what i think) The case below shows two things im not sure about: CM_ EV_ envVarData "We would like to format this comment, with a sort of bulletlist:
Ok, done!"; |
I personally would not mess the code for a partial solution. A |
@Adhara3 i believe i could make it work as a general solution. At least i like a challenge ;) https://github.com/Uight/DbcParser/tree/TestBranchMultiLineSupport i did some stuff in this test branch moving multiline support fully into the NextLineProvider. I removed it from the comment. Also it supports multiple definitions per line. Some tests fail atm:
And then theres this: Overall it would be easy to fix these problems (2nd i would change the testcase). As i also moved the string.Trim() options to the NextLineProvider you could remove it from all LineParsers or better replace it with something that replaces "\r\n" with " " in that way all parsers would have multiline support. In the comments if you want to keep "\r\n" in there just dont remove them. @Whitehouse112 and @Adhara3, |
@Uight I have no problem with you fixing, my concern is timeline, milestone, small increment releases and above all, have the time to manage/review/organize changes. We now have: ext mux, immutability (with API changes), advanced reading, several fixes, multiline support and probably even more, I do not remember. That is huge All of that is great stuff but I can't handle all this burden alone being this a spare time collaboration. So please, slow down. Can we agree on this? |
@Adhara3 its all fine for me. You dont have to review changes right away i would also not have a problem having to merge my stuff a few times. My timeline for needing the message stuff is probably end of this year early next year anyway. And with the other stuff i just do it to get more into programming again and its alway interesting to build stuff on codebases your not used to. Anyway:
All of that gets parsed correctly. The string for the value table code is currently checked against: notice that the "\r\n" is still in there. But then theres are some really strange testcases (here in MessageLineParserTests.cs): [Test]
public void OnlyPrefixIsIgnored()
{
var dbcBuilderMock = m_repository.Create<IDbcBuilder>();
var messageLineParser = CreateParser();
var nextLineProviderMock = m_repository.Create<INextLineProvider>();
Assert.That(messageLineParser.TryParse("BO_ ", dbcBuilderMock.Object, nextLineProviderMock.Object), Is.False);
} i have no idea why this is not failing atm. The signal clearly starts with BO_ why is it not detected? As for the comment parsing which i did first i allready changed it to this: which would be right in my opinion [Test]
// Should parse as it is a comment but should be observed as error
// This however would be catched previously by the IgnoreLineParser
public void OnlyPrefixIsIgnored()
{
var dbcBuilderMock = m_repository.Create<IDbcBuilder>();
var counter = 0;
var failureObserverMock = new Mock<IParseFailureObserver>();
failureObserverMock
.Setup(observer => observer.CommentSyntaxError())
.Callback(() => counter++);
var commentLineParser = new CommentLineParser(failureObserverMock.Object);
var nextLineProviderMock = m_repository.Create<INextLineProvider>();
Assert.That(commentLineParser.TryParse("CM_ ", dbcBuilderMock.Object, nextLineProviderMock.Object), Is.True);
Assert.That(counter, Is.EqualTo(1));
} Edit: theres a cheeky space at the end of MessageLineStarter = "BO_ "; which causes the timed string to missmatch as the final " " is missing. However the test is still flawed a bit i think Also more testcases might be appropriate and if you find time you can do a code review whenever you like. |
Just a note: do not trust Cheers |
@Adhara3 i was wanting to use ReplacelineEndings() but i cant as its .not available in .net standard and .net462.
Then i wanted to use a System.Environment.Newline but that would still work if the file was created on the same system. still searching atm. Also for a better way to do the netline parse check. there must be something faster that that. btw i now changes all test and code and check against more dbc files from: Edit: // Sequence of return codes was taken from the internals of "String.ReplaceLineEndings" method.
private const string NewLineCharsExceptLineFeed = "\r\f\u0085\u2028\u2029\n";
private static readonly string pattern = $"[{Regex.Escape(NewLineCharsExceptLineFeed)}]+";
public static string ReplaceNewlinesWithSpace(this string input)
{
// Would like to use "String.ReplaceLineEndings" but its unavailable because of the target frameworks
// Feel free to optimate
return Regex.Replace(input, pattern, " ");
} and added a test for using \n only which works |
Got the multiline parser running with the method read to nextDefinition. (WIP) Is it valid for the a comment to have a line ending on ";"?
This would fail with the current parser but should be possible according to @Adhara3's comment where he stated And then theres some additional cases i didnt bother to implement (yet):
I assume this would be possible but very unlikely. (Atm if in multiline mode i dont check multidefinitions two)
This cases could be handled but are they even allowed?
reading the first ; in the line and seeing that its not proceeded by a keyword i assume the whole line is a comment; Some other querk of this solution is visible in this case:
This would be a parsing fail due to duplicated nodes in line one. However the observer would observe |
From the docs, this is comment definition
So comment text is defined as a
So any In my opinion, as already stated before, support multiline isn't the right definition of what we need, the right one would be parse treating newlines and indenting as a human only thing which would imply supporting multiline and multi-definitions per line (which is the other side of the coin). And when I say wothless I know that every improvement is welcome but, if the users pattern is:
then a whole core parser rewrite to have half of the problem fixed has a low the cost benefit ratio to me. Current implementation is lacking a feature in a coherent way, which is bad but also good (the coherent part), so it could be listed among known issues knowing that the cases where this happen are quite few. A |
@Adhara3 i have it running in https://github.com/Uight/DbcParser/tree/TestBranchMultiLineSupport No idea what the current performance is compared to the old code didnt check that. (i would guess bad by the number of string compared => maybe improvable?) You should maybe check out the code at some point but my opinion would be to probably not use it. I think youll get it when you see the code. I mean its readable but the NextLineProvider is a complex mess. (it provides all the lines allready put together for the other parsers to use). If we decide to not use it we could move it to a stale branch or something or just delete it. Maybe keep the testcases around if someone wants to try it at some point. And then at that point remove it from 1.8. Edit: Its a bit better now ;) i removed some stuff that wasnt needed anymore Another Edit:
to:
This would allow to cleanup LineParsing even more as you could Trim() and remove newlines chars dependend on the parser that is going to parse it meaning that each line parser can fully focus on parsing the line. |
Dear all, I upgraded home page readme with specifically addressing this issue and there are ongoin discussions about how to improve parsing. I close this specific issue to avoid noise. |
Hello,
I wanted to bring to your attention an issue I recently discovered with the signal value table in some dbc files.
It appears that the contents of the dbc file are not parsing properly, and they seem somewhat weird.
The syntax of the dbc file does not necessarily consist of just one line, which seems to be causing problems with line-level parsing with Regex.
Here's an example:
I noticed that this dbc file is written over two lines, and it's actually being used. Unfortunately, this syntax is also parsed correctly in Vector CANdb++.
Could you please consider including improvements in this area in your plans? If not, should we provide guidance to writers on writing the dbc file accurately, line by line?
Your assistance in addressing this matter would be greatly appreciated.
Thank you for your time and attention.
The text was updated successfully, but these errors were encountered: