forked from markjeee/edi4r
-
Notifications
You must be signed in to change notification settings - Fork 4
/
Tutorial
609 lines (430 loc) · 20 KB
/
Tutorial
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
= Tutorial
== Getting started
=== Installation
sudo gem install edi4r
sudo gem install edi4r-tdid
=== Require statements
require "rubygems"
require "edi4r" # Try require_gem if this fails on your site
require "edi4r/edifact"
require "edi4r-tdid" # Try require_gem if this fails on your site
== Creating an (outbound) UN/EDIFACT interchange
=== An empty interchange
ic = EDI::E::Interchange.new
creates an empty interchange object with syntax version 3
and charset UNOB. You can make this a bit more explicit
by passing parameters as hash components:
ic = EDI::E::Interchange.new( :version => 3, :charset => 'UNOB' )
See the source for more parameters.
=== An empty message
msg = ic.new_message
creates an empty message in the context of the given interchange,
i.e. the syntax version, charset, UNA settings, interactive or batch EDI.
By default, the message type is <tt>ORDERS D.96A</tt>. Select any
message from any UN/TDID by passing the corresponding parameters
as hash components:
msg = ic.new_message( :msg_type=>'ORDERS', :version=>'D', :release=>'96A',
:resp_agency=>'UN' )
Hash components which you do not specify are taken from a set of defaults.
=== Filling an interchange
You may add messages to the interchange any time by calling method add():
ic.add( msg )
When adding new messages to an interchange, they get appended to the
current interchange content. There is no method to insert a message
at any other location. If you need to do that, hold your messages
in an array, sort them any way you like, and finally add them
to the interchange in the desired sequence.
Note that each messag gets validated by default when you add it to
the interchange. If your message needs to be completed only later,
you may disable validation by calling:
ic.add( msg, false )
=== Filling a message
A freshly created message is empty, aside from its header and trailer
which we shall discuss later. Simply create the segments you want to add,
fill them, and add them to the message:
seg = msg.new_segment( 'BGM' )
Here, we derived a BGM segment from the current context,
i.e. an UN/TDID like D.96A which we specified when creating the message given.
Note that <tt>new_segment()</tt> accepts all segment tags available in the
whole TDID's segment directory - not just those usable within
this message type.
Add content to the segment (see below) and add it to the message:
msg.add( seg )
Like with messages added to an interchange, it is your responsibility
to assure the proper sequence of segments. You will need the UN/EDIFACT
message structure, a subset description, or a message implementation
guideline (MIG) handy in order to comply.
It is possible to add empty or partially filled segments to a message.
Just keep a reference to them and fill in their required data elements later.
== Accessing Composites and Data Elements
=== Background
While interchanges and messages are basically empty when created,
segments are not: They come equipped with the composites (CDE, composite
data elements ) and data elements (DE) they comprise in their current
context. Likewise, CDEs come equipped with the sequence of DEs
which they contain according to the underlying TDID.
Segments and CDEs are basically sequences (arrays) of their component
elements. This sequence depends on the TDID of their context.
E.g., a BGM segment from D.96A looks different from a BGM in D.01B.
These sequences are fixed and cannot/should not be altered after
creation of a segment or CDE.
=== Getters for CDE
There is a getter method for each CDE of a segment.
Its name is simply prefix 'c' and the CDE name.
In order to access, say, C002 in segment BGM, the getter
is named 'c' + 'C002':
cde = seg.cC002
(See below how to handle arrays)
In most cases you'll need the CDE object only to access
its component data elements. Let's see how that works:
=== Getters for DE values
During mapping tasks, we normally do not deal with the
internal organization of DE objects. All we need is access
to their <em>values</em>.
Similarly to CDE getters, we build a DE getter by prepending
a 'd' to the DE name. The result is a method that returns
the current value of this DE (nil if unassigned), not the
DE object itself:
order_number = seg.d1004 if cde.d1001 == 220
The example shows that this concept is very convenient and works
both with component DEs in a CDE and plain DEs in a segment.
=== Setters for DE values
We use the same approach for setters - a DE setter actually
changes its value, not the DE object itself:
bgm = msg.new_segment( 'BGM' )
bgm.d1004 = '123456ABC'
bgm.cC002.d1001 = 220
Well, that's both easy and readable! But what about the
integer assigned to DE 1004? Don't worry - its string value
is still what we later want to see in the interchange file.
=== Setters for CDE and segments?
There is no such thing! A CDE should always be derived
from its proper context and must not be changed thereafter.
Likewise, a segment's content is nothing a user should
interfere with.
Does that sound like dictatorship? Well, there *are* ways
to manipulate CDEs and segments, but that's an advanced
and rarely used topic well out of scope of this tutorial ...
=== DE and CDE arrays
Sometimes, a DE occurs multiple times within a segment or CDE,
and a CDE may occur multiple times within a segment.
Before syntax version 4, EDIFACT does not really employ
the concept of an array. Instead, there are multiple
occurrences of a particular DE or CDE listed in a row.
In such a case, the corresponding getters and setters
won't work. Actually, they raise a TypeError to make sure
that you don't accidentally overlook that there is more
than one instance of the given (C)DE.
We obtain a whole array of *all* matching (C)DE instead
and indicate this with a prefix 'a':
seg = msg.new_segment('PIA')
cde_list = seg.aC212
cde_list[0].d7140 = '54321'
cde_list[0].d7143 = 'SA'
cde_list[0].d3055 = 91
cde_list[1].d7140 = ... # etc
seg = msg.new_segment('NAD')
seg.cC080.a3036[0].value = 'E. X. Ample'
seg.cC080.a3036[1].value = 'Sales dept.'
Note that a3036 returns an array of DE objects, not their values!
We thus use DE setter <tt>value</tt> to actually assign
new values to those DE objects.
Sometimes, a CDE or segment contains the same DE more than once
even if both instances are separated by a different element,
like DE 3055 and 1131 in C088 of segment FII, which you may
find in invoices. In that case the same concept holds: cde.a1131
would fetch all instances, no matter if some other elements
occur in between or not.
== Building it all together
OK, so we keep generating segments, filling their data elements
with content, and adding them to the message that we are
about to build - fine.
Likewise, we add messages to the interchange. Fine - but how do we
eventually get the output, how do we make sure that we have
not forgotten anything, and how do we deal with the
service segments (e.g. to define sender and recipient IDs) ?
=== Headers and trailers
Interchanges and messages are objects which may come with
a header and trailer segment. In UN/EDIFACT, these are
UNB/UNZ and UNH/UNT, respectively.
In order to let us focus on the content, edi4r keeps these
service segments away from us and tries its best to
treat them automatically. For example, we do not have to
count segments or messages - edi4r takes care of that
and updates the corresponding DE in UNT and UNZ, respectively.
If we really need to access data there,
that's possible anytime through getters <tt>header</tt> and
<tt>trailer</tt>. From then on, we just use the usual DE and
CDE getters and setters. E.g., setting the UNB sender ID
works like this:
ic.header.cS002.d0004 = '1234567'
Setting the test indicator may look like this:
ic.header.d0035 = 1
=== UNA handling
The pseudo segment UNA commonly introduces an UN/EDIFACT
interchange. It is shown there by default, which is a good
idea in most cases. If you really have to switch it off, use:
ic.show_una = false
It can be easily viewed e.g. by:
puts ic.una
The six special characters it containes are both readable
and modifiable. Please note that these characters are represented
as their ASCII integer codes, not as one-character strings.
Here is the list of corresponding getter methods:
ce_sep() # Component data element separator, default ?:
de_sep() # Data element separator, default: ?+
decimal_sign() # Both ?. and ?, are eligible
esc_char() # Escape character, default: ??
rep_sep() # Repetition occurrence indicator, default:
# ? (SV1-3), ?* (syntax version 4)
seg_term() # Segment terminator, default: ?'
Corresponding setters allow you to change all of them.
Remember to pass ASCII values, not strings. Example:
pri.cC509.d5118 = 30.1
pri.to_s --> "PRI+AAA:30.1::LIU"
ic.una.decimal_sign = ?,
pri.to_s --> "PRI+AAA:30,1::LIU"
ic.una.ce_sep = ?/
pri.to_s --> "PRI+AAA/30,1//LIU"
=== Validation
Edi4r comes with a set of built-in validation rules. In order to
validate your interchange, just call
ic.validate
This method will return the number of (recoverable) issues found.
Error messages and warnings are written to $stderr. There are
a few non-recoverable errors which raise exceptions.
Actually, you may apply method validate() to almost all of the
EDI objects mentioned so far. However, doing this once for the
whole interchance will validate everything.
=== Printing and saving
Once our data is validated, we want to send it to our business partner.
Actually, that's very simple: Output it e.g. to stdout or to a file
by just printing it! This works nicely, because our EDI objects are
all equipped with reasonable methods "to_s()".
== Processing (inbound) interchanges
=== Building the interchange object
We build a whole interchange with a single call by passing
its character representation as a String or IO object to class method
parse():
ic = nil
File.open("inbound_01.edi") {|hnd| ic = EDI::E::Interchange.parse( hnd )}
That's it - from now on we may access all its contents in much the
same way as we did during generation.
=== Iterating over messages
Typically we'd like to loop through the messages contained:
ic.each { |msg| map_message( msg ) }
with some suitably created mapping procedure map_message().
In this context, the interchange is treated as a container
of messages. We therefore use the standard iterator each().
Actually, index access is also available, as are following
array-like methods:
[](int), index(obj), each(&b), find_all(&b), size(),
length(), first(), last().
Examples:
second_msg = ic[1]
last_msg = ic.last
=== Awaiting segments of a message
Similarly, we iterate through the segments of a message. The following
construction lets you select segments in their proper context
(which is the segment group):
def map_message( msg )
# do your initialization here, then
msg.each do |seg|
seg_name = seg.name
seg_name += ' ' + seg.sg_name if seg.sg_name
case seg_name
when "BGM"
# do this ...
when "DTM"
# do that ...
when 'NAD SG2'
# react only if NAD occurs in segment group 2
# ... etc., finally:
default
raise "Segment #{seg_name}: Not accounted for!"
end
end
If you need to obtain all segments with a given tag, pass the
tag as a string to the index operator:
d = msg['DTM'] # Array of all DTM segments, any segment group
Actually, a message behaves like a container just as an interchange does,
so the array-like methods listed in the previous section also apply here:
d = msg.find_all {|seg| seg.name == 'DTM' && seg.sg_name == 'SG20'}
=== Selecting segment group instances
Iterating over all segments of a message sequentially tends to produce
cluttered code when messages grow complex. Wouldn't it be nice
if we could delegate e.g. the mapping of a whole NAD group to a
specialized procedure? Still better - could we delegate mapping
of a whole item group to a specialized routine?
Actually that's quite easy when we recall that segment groups
are side branches of the main trunk (or a higher-level branch)
of a message diagram, with their trigger segments being the
T-shaped "joints". Thus, segments of a segment group are mere
descendants of their trigger segment in the branching diagram.
Here are some helpful methods to select segments of a group:
# ...
when 'NAD SG2'
map_nad_sg2( seg.children_and_self ) # skip segment COM...
# ...
when 'LIN SG28'
map_item( seg.descendants_and_self )
# ...
Methods
descendants(), descendants_and_self(), children(), and
children_and_self()
are inspired by XPath axes. They return an array of segments
which depend on the given (trigger) segment as their common
ancestor.
Using these selectors, writing modular mapping code
now becomes an easy task.
== Peeking into interchanges
=== Background
Sometimes we only need to extract some header information
out of EDI files, e.g. in order to find out whether the content
is UN/EDIFACT, who sent it to whom, or just to see if the
UNB test indicator is set.
We could of course apply EDI::E::Interchange.parse() and access
the header of the resulting interchange object when we know that
a given file contains UN/EDIFACT data. However, that would be
a big waste of resources, especially for large interchanges.
=== Method "peek()" for UN/EDIFACT data
Edi4r instead offers method "peek()". It reads just enought bytes
from the file to determine its contents and to decode the header.
Like "parse()" it returns an Interchange object, but that one
is empty except for the header (and a dummy trailer) segment.
You can then extract any header element you need through the
usual getters. Example: Find out if the test indicator is set.
def is_testdata?( hnd )
ic = EDI::E::Interchange.peek( hnd )
ic.d0035 == '1' || ic.d0035 == 1
end
=== Auto-detection and implicit decompression
Regular EDI users need to archive their business data.
In simple cases, moving interchange files into proper folders
after successfully processing them already does the job.
You can save a lot of space though by compressing them.
Applying "gzip" to EDIFACT data easily shrinks them to
10 % of their original volume.
So far, so good. Later though, you may need to extract
a specific file from the archive, e.g. the interchange
with control reference "ref3456" from a customer
with sender id "xyz". Well, you do not need to maintain
a separate index or decompress all files in the archive
in order to find it. There is a generic class method
"Interchange.peek()" that does all this for you.
Consider the following code fragment that assumes
that ARGV contains the list of files to search:
require 'zlib'
found = ARGV.find do |fname|
ic = EDI::Interchange.peek(File.open(fname))
h = ic.header
ic.syntax=='E' && h.cS002.d0004=='xyz' && h.d0020=='ref1234'
end
ic = EDI::Interchange.parse(File.open(found))
# ...
Note that this code will work with both zipped and
unzipped data, and with UN/EDIFACT as well as other content.
== XML representation
=== Background
EDI interchanges may be regarded as abstract objects which need
some representation when stored or exchanged. E.g., UN/EDIFACT
interchanges may be expressed (represented) by syntax version 1-4
and a choice of separator and termination characters
without changing their identity. Likewise, interchanges
may be represented by some suitable XML document type.
There was a time when classical EDI representations were
considered outdated and to be replaced by XML documents.
We know by now that a mere change in representation does not help
to resolve the real issues of e-business. Nonetheless,
XML-based technology has become much more wide-spread than EDI,
so chances are high that one has to integrate classical EDI data
into a XML-driven architecture.
There are (too) many ways to do this, and attempts have been made
to standardize them. In particular, DIN 16557-4
(http://www.beuth.de/cmd?workflowname=CSVList&websource=&artid=43768898)
describes a way how to represent UN/EDIFACT interchanges as XML documents
and how to describe them with DTDs.
The DIN approach however focuses only on UN/EDIFACT and does not represent
the logical structure of documents. It merely encodes segments as
XML elements and data elements and composites as their attributes.
The value of DTDs is limited, as they need to be generated for
any particular interchange and lack information about mandatory
data elements and CDEs, let alone code lists.
This library therefore supports a generic approach that allows us
to represent any interchange object as an XML document, be it
a UN/EDIFACT interchange, a file of SAP IDocs, or an ANSI
X.12 or other interchange when such a module becomes available.
Native and XML representation are fully interchangeable,
and the XML representation reflects information from the
branching diagram, thus supporting considerably XPath-based
processing (e.g. you could easily select an instance of
a line item group). Formal validation through a single generic DTD
is available, while in-depth validation remains available
through this library at the abstract level.
=== Generating an XML representation of an interchange
The current implementation of XML features is built on
Ruby's REXML module (alternative implementations are conceivable).
Simply load the additional methods <em>after</em> loading all other
optional EDI4R modules, then use method "to_xml()" like this:
# Other require statements, finally:
require "edi4r/rexml"
# Generate your interchange "ic", then:
xdoc = REXML::Document.new # Empty REXML document
ic.to_xml( xdoc ) # Fill it
# The rest is standard REXML handling. Here, we write the xdoc to a file.
# (See REXML::Document.write() for details on indenting)
xdoc.write( File.open( 'mydata.xml','w'), 0 )
=== Building an interchange from its XML representation
No matter what EDI standard the interchange represents,
its corresponding EDI4R object can be re-generated easily.
Just make sure that you have loaded the corresponding module(s):
# Other require statements, finally:
require "edi4r/rexml"
ic = EDI::Interchange.parse( File.open('mydata.xml') )
Yes, that's right: It's the same statement that would
also load UN/EDIFACT data!
If you know already what to expect, you might bypass EDI4R's
auto-detection and directly call one of the parse_xml() methods:
xdoc = REXML::Document.new( File.open('mydata.xml') )
ic = EDI::E::Interchange.parse_xml( xdoc )
=== Utilities: edi2xml.rb, xml2edi.rb
These two scripts are included to further simplify your transition
between traditional EDI representation and XML representation.
They are command-line tools that simply wrap the library calls
mentioned above. Example:
$ edi2xml.rb foo.edi > foo.xml
$ xml2edi.rb foo.xml > bar.edi
$ diff foo.edi bar.edi # There should be no differences
== Tools
=== editool.rb
Use this command-line tool e.g. when you want to
* <b>list</b> UN/EDIFACT data in a readable way (one segment per line,
optionally indented according to segment level),
* <b>validate</b> your EDI data
* <b>analyze</b> the data more thoroughly through method "inspect()"
* <b>report</b> header data quickly, one line per file
Called without option, it just builds an internal memory model
of the passed file(s) and raises an exception upon parsing errors.
Thorough validation can be requested optionally. Example:
$ editool.rb -l foo.edi bar.edi
$ editool.rb -p *.edi *.xml
== Further reading
A pair of full-blown mappings (inhouse-to-EANCOM, EANCOM-to-inhouse)
shows in much more detail how to do outbound and inbound mapping.
See the source codes in the "test" folder!
== Misc topics
To be supplied later; currently just a room to collect keywords to cover.
=== Debugging and viewing
:linebreak, :indented
inspect()
Exception classes
Segments: T-nodes, ordinal number, index, occurrence, max. occurrence
empty?, required?
=== More advances features
Validation: warnings and exceptions (logging?)
Add-ons to class Time
Consistency of references common to headers and trailers
Inheriting settings from parent: UNH from UNG, UNG from UNB
Low-level access to Collection contents
Enjoy,
-- Heinz