-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
benchmarking spike 2: QA Spec for evaluation processors #123
Comments
Could you please give me some feedback regarding the JSON output suggested above? |
Thank you for your example on the workflow JSON. I have questions about the comibnation of different aspects that we discussed on the basis of the Workflow Tab Mockup. So there we have a matrix view on the workflows that were grouped by benchmark types (in our case now "metrics"?) and models. As far as I understand it here you would return an array of workflow objects like above, right? How would we achieve that matrix view then? By which attributes should group the workflows? How do the crucial properties that you describe above play together with the matrix view? |
As far as I understood the draft users should be able to select different benchmark metrics according to their needs. Therefore I chose a rather flat nesting for the metrics that are relevant. My understanding was that the front end takes care of this sorting and displaying the workflows. Does that shift too much work to the web app?
I can add the relevant findings of Nextflow to the JSON output we produce instead of an URL if that is what you mean. If not, could you elaborate? |
Yea, so your array structure is fine. Also we have a list view of workflow where the array response is also useful.
But here I'm not quite sure. What we thought is to provide some filters, is it what you mean?
Yes, that would be good. |
Yes.
I think we have to add two properties, the [
{
"workflow-id": "1",
"workflow_steps":
{
"0": "Processor A",
"1": "Processor B"
}
"ocrd-workspace": "https://some-url-pointing-to.a/mets.xml",
"properties":
{
"font": "antiqua",
"date-of-creation": "19. century",
"no-of-pages": "100",
"layout": "simple"
},
"wall_time": 1234,
"cer_total": "5.7",
"cer_per_page": "0.92",
"time_per_page_in_seconds": 15
}
] Is this what you mean? Or do you want several JSON files for each representation? |
It might be an idea to have individual arrays for different things, like: {
"models": [
{
"id": 1,
"name": "Model A"
}
],
"works": [
{
"id": 1,
"name": "Work A"
}
],
"workflows": [
{
"id": 1,
// ...
}
],
} Everything would be referenced by The workflow steps is some additional topic I think but yea we also need this. For me they are not that critical to create the main functionalities of the front-end. They just add some additional info about the workflow. More important I would consider (if our plans haven't changed) basic rendering of the list and matrix views. So |
After a discussion we came to the conclusion that we stick with the originally proposed format, but
|
[
{
"workflow_id": "wf1-data345-eval1",
"label": "Workflow 1 on Data 345",
"metadata": {
"workflow": "https://example.org/workflow/1",
"workflow_steps": {
"0": "Processor A",
"1": "Processor B"
},
"workflow_model": "Fraktur_GT4HistOCR",
"eval_workflow": "https://example.org/workflow/eval1",
"eval_data": "https://example.org/workspace/345",
"eval_tool": "dinglehopper",
"gt_data": "https://gt.ocr-d.de/workspace/789",
"document": {
"fonts": ["antiqua", "fraktur"],
"publication_year": "19. century",
"number_of_pages": "100",
"layout": "simple"
}
},
"evaluations": {
"document_wide": {
"wall_time": 1234,
"cer": 0.57,
"cer_min_max": [0.2, 0.57]
},
"by_page": [
{
"page_id": "PHYS_0001",
"cer": 0.8,
"processing_time": 2.1
}
]
}
},
{
"workflow_id": "wf2-data345-eval1",
"label": "Workflow 2 on Data 345",
"metadata": {
"workflow": "https://example.org/workflow/2",
"workflow_steps": {
"0": "Processor A",
"1": "Processor B"
},
"workflow_model": "Fraktur_GT4HistOCR",
"eval_workflow": "https://example.org/workflow/eval1",
"eval_data": "https://example.org/workspace/345",
"eval_tool": "dinglehopper",
"gt_data": "https://gt.ocr-d.de/workspace/789",
"document": {
"fonts": ["antiqua", "fraktur"],
"publication_year": "19. century",
"number_of_pages": "100",
"layout": "simple"
}
},
"evaluations": {
"document_wide": {
"wall_time": 1234,
"cer": 0.88,
"cer_min_max": [0.2, 0.57]
},
"by_page": [
{
"page_id": "PHYS_0001",
"cer": 0.9,
"processing_time": 2.0
}
]
}
}
]
@kba @paulpestov I tried to integrate Konstantin's proposal and also added some key that might come in handy for the frond end app (e.g. EDIT: I added the second example of ocrd_eval.sample.yml. |
What is the difference between I assume the former is the ID of the evaluation workflow and the later of the data generation workflow? Since both are to be Nextflow scripts, they should both be addressable as a URL via the Web API. Maybe we can rename them |
I assumed that what you proposed in OCR-D/spec@master...qa-spec#diff-3ca00602cf767fb4a01ea3267035a87437cc087ccf4aecb252942431e9e1411bR1 should be an ID of a discrete evaluation workflow and just adopted the rest. :D So yeah, maybe we should be a bit verbose with our keys. |
@kba What about this: [
{
"eval_workflow_id": "wf1-data345-eval1",
"label": "Workflow 1 on Data 345", // for UI display
"metadata": {
"data_creation_workflow": "https://example.org/workflow/1",
"workflow_steps": {
"0": "Processor A",
"1": "Processor B"
},
"workflow_model": "Fraktur_GT4HistOCR", // for UI display
"eval_workflow_url": "https://example.org/workflow/eval1",
"eval_data": "https://example.org/workspace/345",
"eval_tool": "dinglehopper",
"gt_data": "https://gt.ocr-d.de/workspace/789",
"data_properties": {
"fonts": ["antiqua", "fraktur"],
"publication_year": "19. century",
"number_of_pages": "100",
"layout": "simple"
}
},
"evaluation_results": {
"document_wide": {
"wall_time": 1234,
"cer": 0.57,
"cer_min_max": [0.2, 0.57]
},
"by_page": [
{
"page_id": "PHYS_0001",
"cer": 0.8,
"processing_time": 2.1
}
]
}
},
{
"eval_workflow_id": "wf2-data345-eval1",
"label": "Workflow 2 on Data 345",
"metadata": {
"data_creation_workflow": "https://example.org/workflow/2",
"workflow_steps": {
"0": "Processor A",
"1": "Processor B"
},
"workflow_model": "Fraktur_GT4HistOCR",
"eval_workflow_url": "https://example.org/workflow/eval1",
"eval_data": "https://example.org/workspace/345",
"eval_tool": "dinglehopper",
"gt_data": "https://gt.ocr-d.de/workspace/789",
"data_properties": {
"fonts": ["antiqua", "fraktur"],
"publication_year": "19. century",
"number_of_pages": "100",
"layout": "simple"
}
},
"evaluation_results": {
"document_wide": {
"wall_time": 1234,
"cer": 0.88,
"cer_min_max": [0.2, 0.57]
},
"by_page": [
{
"page_id": "PHYS_0001",
"cer": 0.9,
"processing_time": 2.0
}
]
}
}
] Changes:
|
Concept for benchmarking / data for the workflow tab
Data
In a discussion we identified the following properties of a workspace as crucial: publication date, font, layout, pages.
Metadata for data sets
Next steps for data
Ground Truths
❓ At this point I'm not sure if we simple use an existing GT or if we create ones ourselves.
Workflows
The main idea of the workflow tab is to enable OCR-D users to identify suitable workflows for their data (where suitability means CER/WER and/or performance of the workflow). Since we have a lot of processors, it's not feasible to perform a simple permutation of all processors for all data sets. A good starting point might be to use the findings and recommendations the KIT had in the second project phase combined with examples obtained from people using OCR-D on a daily basis (Maria?).
The first evaluation of the workflow results could be done with
dinglehopper
, which is suitable for simple text evaluation.Next steps for workflows
re-do the evaluation done by the KIT with newer processor versions and check if CER/WER and/or performance changed(this doesn't seem feasible)Getting the data relevant for the front end
JSON Output
The dashboard should be fed with JSON containing all relevant information. A first draft of the data looks like this:
… and how to get it (maybe not in this spike)
In order to get a better understanding of how this is done, I will probably have to have a look at Nextflow and Mehmed's findings first.
Entscheidung für Metriken von Text und Layout treffen und kommunizieren
Ziel: Messbarkeit (OCR besser oder schlechter)
AC for this sprint (30.08.2022)
The text was updated successfully, but these errors were encountered: