forked from PrismJS/prism
-
Notifications
You must be signed in to change notification settings - Fork 0
/
extending.html
635 lines (485 loc) · 33.7 KB
/
extending.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<link rel="icon" href="assets/favicon.png" />
<title>Extending Prism ▲ Prism</title>
<link rel="stylesheet" href="assets/style.css" />
<link rel="stylesheet" href="themes/prism.css" data-noprefix />
<link rel="stylesheet" href="plugins/line-highlight/prism-line-highlight.css" data-noprefix />
<script src="assets/vendor/prefixfree.min.js"></script>
<style>
ol.indent {
margin: 1em 0;
padding-left: 2em;
}
</style>
<script>var _gaq = [['_setAccount', 'UA-33746269-1'], ['_trackPageview']];</script>
<script src="https://www.google-analytics.com/ga.js" async></script>
</head>
<body class="language-javascript">
<header>
<div class="intro" data-src="assets/templates/header-main.html" data-type="text/html"></div>
<h2>Extending Prism</h2>
<p>Prism is awesome out of the box, but it’s even awesomer when it’s customized to your own needs. This section will help you write new language definitions, plugins and all-around Prism hacking.</p>
</header>
<section id="language-definitions">
<h1>Language definitions</h1>
<p>Every language is defined as a set of tokens, which are expressed as <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions" target="_blank" rel="noopener noreferrer">regular expressions</a>. For example, this is the language definition for JSON:</p>
<pre data-src="components/prism-json.js"></pre>
<p>At its core, a language definition is just a JavaScript object, and a token is just an entry of the language definition. The simplest language definition is an empty object:</p>
<pre><code class="language-javascript">Prism.languages['some-language'] = { };</code></pre>
<p>Unfortunately, an empty language definition isn't very useful, so let's add a token. The simplest way to express a token is using a regular expression literal:</p>
<pre><code class="language-javascript">Prism.languages['some-language'] = {
'token-name': /regex/,
};</code></pre>
<p>Alternatively, an object literal can also be used. With this notation, the regular expression describing the token is the <code>pattern</code> property of the object:</p>
<pre><code class="language-javascript">Prism.languages['some-language'] = {
'token-name': {
pattern: /regex/
},
};</code></pre>
<p>So far, the functionality is exactly the same between the regex and object notations. However, the object notation allows for <a href="#object-notation">additional options</a>. More on that later.</p>
<p>The name of a token can theoretically be any string that is also a valid CSS class, but there are <a href="#token-names">some guidelines to follow</a>. More on that later.</p>
<p>Language definitions can have any number of tokens, but the name of each token must be unique:</p>
<pre><code class="language-javascript">Prism.languages['some-language'] = {
'token-1': /I love regexes!/,
'token-2': /regex/,
};</code></pre>
<p>Prism will match tokens against the input text one after the other, in order, and tokens cannot overlap with the matches of previous tokens. So in the above example, <code class="language-none">token-2</code> will not match the substring "regex" inside of matches of <code class="language-none">token-1</code>. More information about <a href="#matching-algorithm">Prism's matching algorithm </a> later.</p>
<p>Lastly, in many languages, there are multiple different ways of declaring the same constructs (e.g. comments, strings, ...) and sometimes it is difficult or unpractical to match all of them with one single regular expression. To add multiple regular expressions for one token name, an array can be used:</p>
<pre><code class="language-javascript">Prism.languages['some-language'] = {
'token-name': [
/regex 1/,
/regex 2/,
{ pattern: /regex 3/ }
],
};</code></pre>
<p>Note: An array <strong>cannot</strong> be used in the <code>pattern</code> property.</p>
<h2 id="object-notation">Object notation</h2>
<p>Instead of using just plain regular expressions, Prism also supports an object notation for tokens. This notation enables the following options:</p>
<dl>
<dt id="object-notation-pattern"><code>pattern: RegExp</code></dt>
<dd>
<p>This is the only required option. It holds the regular expression of the token.</p>
</dd>
<dt id="object-notation-lookbehind"><code>lookbehind: boolean</code></dt>
<dd>
<p>This option mitigates JavaScript's poor browser support for lookbehinds. When set to <code>true</code>, the first capturing group in the <code>pattern</code> regex is discarded when matching this token, so it effectively functions as a lookbehind.</p>
<p>For an example of this, check out how the C-like language definition finds <code class="language-none">class-name</code> tokens:</p>
<pre><code>Prism.languages.clike = {
// ...
'class-name': {
pattern: /(\b(?:class|extends|implements|instanceof|interface|new|trait)\s+)\w+/i,
lookbehind: true
}
};</code></pre>
</dd>
<dt id="object-notation-greedy"><code>greedy: boolean</code></dt>
<dd>
<p>This option enables greedy matching for the token. For more information, see the section about <a href="#matching-algorithm">the matching algorithm</a>.</p>
</dd>
<dt id="object-notation-alias"><code>alias: string | string[]</code></dt>
<dd>
<p>This option can be used to define one or more aliases for the token. The result will be that the styles of the token name and the alias(es) are combined. This can be useful to combine the styling of a <a href="/tokens.html">standard token</a>, which is already supported by most of the themes, with a more precise token name. For more information on this topic, see <a href="#granular-highlighting">granular highlighting</a>.</p>
<p>E.g. the token name <code class="language-none">latex-equation</code> is not supported by most themes, but it will be highlighted the same as a string in the following example:</p>
<pre><code class="language-javascript">Prism.languages.latex = {
// ...
'latex-equation': {
pattern: /\$.*?\$/,
alias: 'string'
}
};</code></pre>
</dd>
<dt id="object-notation-inside"><code>inside: Grammar</code></dt>
<dd>
<p>This option accepts another object literal, with tokens that are allowed to be nested in this token. All tokens in the <code>inside</code> grammar will be encapsulated by this token. This makes it easier to define certain languages.</p>
<p>For an example of nested tokens, check out the <code>url</code> token in the CSS language definition:</p>
<pre><code>Prism.languages.css = {
// ...
'url': {
// e.g. url(https://example.com)
pattern: /\burl\(.*?\)/i,
inside: {
'function': /^url/i,
'punctuation': /^\(|\)$/
}
}
};</code></pre>
<p>The <code>inside</code> option can also be used to create recursive languages. This is useful for languages where one token can contain arbitrary expressions, e.g. languages with a string interpolation syntax.</p>
<p>For example, here is how JavaScript implements template string interpolation:</p>
<pre><code>Prism.languages.javascript = {
// ...
'template-string': {
pattern: /`(?:\\.|\$\{[^{}]*\}|(?!\$\{)[^\\`])*`/,
inside: {
'interpolation': {
pattern: /\$\{[^{}]*\}/,
inside: {
'punctuation': /^\$\{|\}$/,
'expression': {
pattern: /[\s\S]+/,
inside: null // see below
}
}
}
}
}
};
Prism.languages.javascript['template-string'].inside['interpolation'].inside['expression'].inside = Prism.languages.javascript;</code></pre>
<p>Be careful when creating recursive grammars as they might lead to infinite recursion which will cause a stack overflow.</p>
</dd>
</dl>
<h2 id="token-names">Token names</h2>
<p>The name of a token determines the <em>semantic meaning</em> of matched text of the token. Tokens can capture anything from simple language constructs, like comments, to more complex ones, like template string interpolation expressions. Token names differentiate these language constructs.</p>
<p>A token name can theoretically be any string that is a valid CSS class name. However, in practice, it makes sense for token names to follow some rules. In Prism's code, we enforce that all token names use kebab case (<code class="language-none">foo-bar</code>) and contain only lower-case ASCII letters, digits, and hyphen characters. E.g. <code class="language-none">class-name</code> is allowed but <code class="language-none">Class_name</code> is not.</p>
<p>Prism also defines some <a href="tokens.html">standard tokens names</a> that should be used for most tokens.</p>
<h3 id="token-names-and-themes">Themes</h3>
<p>Prism's themes assign color (and other styles) to tokens based on their name (and aliases). This means that the language definition does not control the color of tokens, themes do.</p>
<p>However, themes only support a limited number of <strong>known token names</strong>. If a theme does not know a particular token name, no styles will be applied. While different themes may support different token names, all themes are guaranteed to support Prism's <a href="tokens.html">standard tokens</a>. Standard tokens as special token names with specific semantic meanings. They are the common ground all language definitions and themes agree on and must follow. Standard tokens should be preferred when choosing token names.</p>
<h3 id="granular-highlighting">Granular highlighting</h3>
<p>While standard tokens should be the preferred choice, they are also quite general. This is by design as they have to apply to a large number and variety of different languages, but sometimes more fine-grained tokenization (and subsequent highlighting) is desirable.</p>
<p>Granular highlighting is a method of choosing token names to enable fine control for themes, while also ensuring compatibility with all themes.</p>
<p>Let's look at an example. Say we had a language that supported both decimal and binary literals for numbers, and we wanted to give binary number special highlighting. We might implement it like this:</p>
<pre><code class="language-javascript">Prism.languages['my-language'] = {
// ...
'number': /\b\d+(?:\.\d+)?\b/,
'binary-number': /\b0b[01]+\b/,
};</code></pre>
<p>But this has a problem. <code class="language-none">binary-number</code> is not a standard token, so almost no theme is going to given binary numbers any color.</p>
<p>The solution to this problem is to use an <em>alias</em>:</p>
<pre><code class="language-javascript">Prism.languages['my-language'] = {
// ...
'number': /\b\d+(?:\.\d+)?\b/,
'binary-number': {
pattern: /\b0b[01]+\b/,
alias: 'number'
},
};</code></pre>
<p>Aliases allow themes to apply the styles of multiple names to one token. This means that themes that do support the <code class="language-none">binary-number</code> token name can assign a special color, and themes don't support it will fallback to their usual color for numbers.</p>
<p>This is granular highlighting: using a non-standard token name and a standard token as an alias.</p>
<h2 id="matching-algorithm">The matching algorithm</h2>
<p>The job of Prism's matching algorithm is to produce a token stream given a language definition and some text. A token stream is Prism's representation of (partially or fully) tokenized text and is implemented as a list of strings (representing literal text) and tokens (representing tokenized text).</p>
<p><em>Note:</em> The word "token" is ambiguous here. We use "token" to refer to both the entry of a language definition (as described in above sections) and a <a href="docs/Token.html">Token object</a> inside a token stream. Which type of "token" is meant can be inferred from context.</p>
<p>The simplified token stream notation will be used in this section. Briefly, the notation uses JSON to represent a token stream. E.g. <code>["foo ", ["keyword", "bar"], " baz"]</code> is the simplified token stream notation for the token stream that starts with the string <code class="language-none">foo </code>, is followed by a token of type <code class="language-none">keyword</code> and text <code class="language-none">bar</code>, and ends with the string <code class="language-none"> baz</code>.</p>
<p>Back to the matching algorithm: Prism's matching algorithm is a hybrid with two modes: first-come, first-served (FCFS) matching and greedy matching.</p>
<h3 id="fcfs-matching">FCFS matching</h3>
<p>This is Prism default matching mode. All tokens are matched one after the other, in order, tokens cannot overlap, and tokens cannot match text that is already matched by previous tokens.</p>
<p>The algorithm itself is quite simple. Let's say we wanted to tokenize the JS code <code>max(3, 5, exp2(7));</code> and that function tokens had already been processed. The current token stream would be:</p>
<pre><code>[
["function", "max"],
"(3, 5, ",
["function", "exp2"],
"(7));"
]</code></pre>
<p>Next, we would tokenize numbers with the token <code>'number': /[0-9]+/</code>.</p>
<p>FCFS matching will go through all strings in the current token stream to find matches for the number regex. The first string is <code>"(3, 5, "</code>, so the match <code>3</code> is found. A new token is created for <code>3</code> and inserted into the token stream to replace the matching text. The token stream is now:</p>
<pre><code>[
["function", "max"],
"(",
["number", "3"],
", 5, ",
["function", "exp2"],
"(7));"
]</code></pre>
<p>Now, the algorithm goes to the next string <code>", 5, "</code> and finds another match. A new token is created for <code>5</code> and the token stream is now:</p>
<pre><code>[
["function", "max"],
"(",
["number", "3"],
", ",
["number", "5"],
", ",
["function", "exp2"],
"(7));"
]</code></pre>
<p>The next string is <code>", "</code> and no matches are found. The string after that is <code>"(7));"</code> and a new token is create for <code>7</code>:
<pre><code>[
["function", "max"],
"(",
["number", "3"],
", ",
["number", "5"],
", ",
["function", "exp2"],
"(",
["number", "7"],
"));"
]</code></pre>
<p>The last string to check is <code>"));"</code> and no matches are found. The number token has now been processed and the algorithm will go process the next token in the language definition.</p>
<p>Notice how FCFS matching did not find the <code>2</code> in <code>exp2</code>. Since FCFS matching completely ignores existing tokens in the token stream, the number regex cannot see already-tokenized text. This is a very useful property. In the above example, <code>2</code> is a part of the function name <code>exp2</code>, so highlighting it as a number would be incorrect.</p>
<h3 id="greedy-matching">Greedy matching</h3>
<p>Greedy matching is very similar to FCFS matching. All tokens are matched in order and tokens cannot overlap. The defining difference is that greedy tokens <strong>can</strong> match the text of previous tokens.</p>
<p>Let's look at an example to see why greedy matching is useful and how it works <em>conceptually</em>. A very simplified version of JavaScript's comment and string syntax might be implemented like this:</p>
<pre><code>Prism.languages.javascript = {
'comment': /\/\/.*/,
'string': /'(?:\\.|[^\\\r\n])*'/
};</code></pre>
<p>To understand why greedy matching is useful, let's look at how FCFS matching would tokenize the text <code>'http://example.com'</code>:</p>
<p>FCFS matching starts with the token stream <code>["'http://example.com'"]</code> and tries to find matches for <code>'comment': /\/\/.*/</code>. The match <code class="language-none">//example.com'</code> is found and inserted into the token stream:</p>
<pre><code>[
"'http:",
["comment", "//example.com'"]
]</code></pre>
<p>Then FCFS matching will search for matches for <code>'string': /'(?:\\.|[^'\\\r\n])*'/</code>. The first string of the token stream, <code>"'http:"</code>, does not match the string regex, so the token stream remains unchanged. The string token has now been processed and the above token stream is the final result.</p>
<p>Obviously, this is bad. The code <code>'http://example.com'</code> is clearly just a string containing a URL, but FCFS matching doesn't understand this.</p>
<p>An obvious, but incorrect, fix might be to swap the order of <code>comment</code> and <code>string</code>. This would fix <code>'http://example.com'</code>. However, the problem was merely moved. Comments like <code>// it's my co-worker's code</code> (note the two single quotes) will now be tokenized incorrectly.</p>
<p>This is <strong>the problem greedy matching solves</strong>. Let's make the tokens greedy and then see how this affects the result:</p>
<pre><code>Prism.languages.javascript = {
'comment': {
pattern: /\/\/.*/,
greedy: true
},
'string': {
pattern: /'(?:\\.|[^'\\\r\n])*'/,
greedy: true
}
};</code></pre>
<p>While the actual greedy matching algorithm is quite complex and littered with subtle edge cases, its effect quite simple: a list of <strong>greedy tokens will behave as if they were matched by a single regex</strong>. This is how greedy matching works <em>conceptually</em> and how you should think about greedy tokens.</p>
<p>This means that the greedy comment and string tokens will behave like the following language definition, but the combined token will result in the correct token names of the original greedy tokens:</p>
<pre><code>Prism.languages.javascript = {
'comment-or-string': /\/\/.*|'(?:\\.|[^'\\\r\n])*'/
};</code></pre>
<p>In the above example, <code>'http://example.com'</code> will be matched by <code>/\/\/.*|'(?:\\.|[^'\\\r\n])*'/</code> completely. Since the <code class="language-none">'(?:\\.|[^'\\\r\n])*'</code> part of the regex caused the match, a token of type <code>string</code> will be created and the following token stream will be produced:</p>
<pre><code>[
["string", "'http://example.com'"]
]</code></pre>
<p>Similarly, the tokenization will also be correct for the <code>// it's my co-worker's code</code> example.</p>
<p>When deciding whether a token should be greedy, use the following guide lines:</p>
<ol class="indent">
<li>
<p>Most tokens are not greedy.</p>
<p>Most tokens in most languages are not greedy, because they don't need to be. Typically only the comment, string, and regex literal tokens need to be greedy. All other tokens can use FCFS matching.</p>
<p>Generally, a token should only be greedy if it can contain the start of another token.</p>
</li>
<li>
<p>All tokens before a greedy token should also be greedy.</p>
<p>Greedy matching works subtly differently if there are non-greedy tokens before a greedy token. This typically leads to subtle and hard-to-catch bugs that sometimes take years to uncover.</p>
<p>To make sure that greedy matching works as expected, the greedy tokens should be the first tokens of a language.</p>
</li>
<li>
<p>Greedy tokens come in groups.</p>
<p>If a language definition contains only a single greedy token, then the greedy token shouldn't be greedy. As explained above, greedy matching conceptually combines the regexes of all greedy tokens into one. If there is only one greedy token, greedy matching will behave like FCFS matching.</p>
</li>
</ol>
<h2 id="helper-functions">Helper functions</h2>
<p>Prism also provides some useful function for creating and modifying language definitions. <a href="docs/Prism.languages.html#.insertBefore"><code>Prism.languages.insertBefore</code></a> can be used to modify existing languages definitions. <a href="docs/Prism.languages.html#.extend"><code>Prism.languages.extend</code></a> is useful for when your language is very similar to another existing language.</p>
<h2 id="rest">The rest property</h2>
<p>The <code>rest</code> property in language definitions is special. Prism expects this property to be another language definition instead of a token. The tokens of the grammar in the <code>rest</code> property will be appended to the end of the language definition with the <code>rest</code> property. It can be thought of as a built-in object spread operator.</p>
<p>This is useful for referring to tokens defined elsewhere. However, the <code>rest</code> property should be used sparingly. When referencing another language, it is typically better to encapsulate the text of the language into a token and use <a href="#object-notation-inside">the <code>inside</code> property</a> instead.</p>
</section>
<section id="creating-a-new-language-definition" class="language-none">
<h1>Creating a new language definition</h1>
<p>This section will explain the usual workflow of creating a new language definition.</p>
<p>As an example, we will create the language definition of the fictional <em>Foo's Binary, Artistic Robots™</em> language or just Foo Bar for short.</p>
<ol class="indent">
<li>
<p>Create a new file <code>components/prism-foo-bar.js</code>.</p>
<p>In this example, we choose <code>foo-bar</code> as the id of the new language. The language id has to be unique and should work well with the <code>language-xxxx</code> CSS class name Prism uses to refer to language definitions. Your language id should ideally match the regular expression <code>/^[a-z][a-z\d]*(?:-[a-z][a-z\d]*)*$/</code>.</p>
</li>
<li>
<p>Edit <a href="https://github.com/PrismJS/prism/blob/master/components.json"><code>components.json</code></a> to register the new language by adding it to the <code>languages</code> object. (Please note that all language entries are sorted alphabetically by title.) <br>
Our new entry for this example will look like this:</p>
<pre><code class="language-json">"foo-bar": {
"title": "Foo Bar",
"owner": "Your GitHub name"
}</code></pre>
<p>If your language definition depends any other languages, you have to specify this here as well by adding a <code class="language-js">"require"</code> property. E.g. <code class="language-js">"require": "clike"</code>, or <code class="language-js">"require": ["markup", "css"]</code>. For more information on dependencies read the <a href="#declaring-dependencies">declaring dependencies</a> section.</p>
<p><em>Note:</em> Any changes made to <code>components.json</code> require a rebuild (see step 3).</p>
</li>
<li>
<p>Rebuild Prism by running <code class="language-bash">npm run build</code>.</p>
<p>This will make your language available to the <a href="test.html">test page</a>, or more precise: your local version of it. You can open your local <code>test.html</code> page in any browser, select your language, and see how your language definition highlights any code you input.</p>
<p><em>Note:</em> You have to reload the test page to apply changes made to <code>prism-foo-bar.js</code> but you don't have to rebuild Prism itself. However, if you change <code>components.json</code> (e.g. because you added a dependency) then these changes will not show up on the test page until you rebuild Prism.</p>
</li>
<li>
<p>Write the language definition.</p>
<p>The <a href="#language-definitions">above section</a> already explains the makeup of language definitions.</p>
</li>
<li>
<p>Adding aliases.</p>
<p>Aliases for are useful if your language is known under more than just one name or there are very common abbreviations for your language (e.g. JS for JavaScript). Keep in mind that aliases are very similar to language ids in that they also have to be unique (i.e. there cannot be an alias which is the same as another alias of language id) and work as CSS class names.</p>
<p>In this example, we will register the alias <code>foo</code> for <code>foo-bar</code> because Foo Bar code is stored in <code>.foo</code> files.</p>
<p>To add the alias, we add this line at the end of <code>prism-foo-bar.js</code>:</p>
<pre><code class="language-js">Prism.languages.foo = Prism.languages['foo-bar'];</code></pre>
<p>Aliases also have to be registered in <code>components.json</code> by adding the <code>alias</code> property to the language entry. In this example, the updated entry will look like this:</p>
<pre><code class="language-json">"foo-bar": {
"title": "Foo Bar",
"alias": "foo",
"owner": "Your GitHub name"
}</code></pre>
<p><em>Note:</em> <code>alias</code> can also be a string array if you need to register multiple aliases.</p>
<p>Using <code>aliasTitles</code>, it's also possible to give aliases specific titles. In this example, this won't be necessary but a good example as to where this is useful is the markup language:</p>
<pre><code class="language-json">"markup": {
"title": "Markup",
"alias": ["html", "xml", "svg", "mathml"],
"aliasTitles": {
"html": "HTML",
"xml": "XML",
"svg": "SVG",
"mathml": "MathML"
},
"option": "default"
}</code></pre>
</li>
<li>
<p>Adding tests.</p>
<p>Create a folder <code>tests/languages/foo-bar/</code>. This is where your test files will live. The test format and how to run tests is described <a href="test-suite.html">here</a>.</p>
<p>You should add a test for every major feature of your language. Test files should test the common case and certain edge cases (if any). Good examples are <a href="https://github.com/PrismJS/prism/tree/master/tests/languages/javascript">the tests of the JavaScript language</a>.</p>
<p>You can use this template for new <code>.test</code> files:</p>
<pre><code class="language-json">The code to test.
----------------------------------------------------
----------------------------------------------------
Brief description.</code></pre>
<p>For every test file:</p>
<ol class="indent">
<li>
<p>Add the code to test and a brief description.</p>
</li>
<li>
<p>Verify that your language definition correctly highlights the test code. This can be done using your local version of the test page. <br>
<em>Note:</em> Using the <em>Show tokens</em> options, you see the token stream your language definition created.</p>
</li>
<li>
<p>Once you <strong>carefully checked</strong> that the test case is handled correctly (i.e. by using the test page), run the following command:</p>
<code class="language-bash">npm run test:languages -- --language=foo-bar --accept</code>
<p>This command will take the token stream your language definition currently produces and inserted into the test file. The empty space between the two lines separating the code and the description of test case will be replaced with a <a href="test-suite.html#writing-tests-explaining-the-simplified-token-stream">simplified version of the token stream</a>.</p>
</li>
<li>
<p><strong>Carefully check</strong> that the inserted token stream JSON is what you expect.</p>
</li>
<li>Re-run <code class="language-bash">npm run test:languages -- --language=foo-bar</code> to verify that the test passes.</li>
</ol>
</li>
<li>
<p>Add an example page.</p>
<p>Create a new file <code>examples/prism-foo-bar.html</code>. This will be the template containing the example markup. Just look at other examples to see how these files are structured. <br>
We don't have any rules as to what counts as an example, so a single <em>Full example</em> section where you present the highlighting of the major features of the language is enough.</p>
</li>
<li>
<p>Run <code class="language-bash">npm test</code> to check that <em>all</em> tests pass, not just your language tests.<br>
This will usually pass without problems. If you can't get all the tests to pass, skip this step.</p>
</li>
<li>
<p>Run <code class="language-bash">npm run build</code> again.</p>
<p>Your language definition is now ready!</p>
</li>
</ol>
</section>
<section id="dependencies" class="language-none">
<h1>Dependencies</h1>
<p>Languages and plugins can depend on each other, so Prism has its own dependency system to declare and resolve dependencies.</p>
<h2 id="declaring-dependencies">Declaring dependencies</h2>
<p>You declare a dependency by adding a property to the entry of your language or plugin in the <a href="https://github.com/PrismJS/prism/blob/master/components.json"><code>components.json</code></a> file. The name of the property will be dependency kind and its value will be the component id of the dependee. If multiple languages or plugins are depended upon then you can also declare an array of component ids.</p>
<p>In the following example, we will use the <code>require</code> dependency kind to declare that a fictional language Foo depends on the JavaScript language and that another fictional language Bar depends on both JavaScript and CSS.</p>
<pre class="language-json" data-line="8,12"><code>{
"languages": {
"javascript": { "title": "JavaScript" },
"css": { "title": "CSS" },
...,
"foo": {
"title": "Foo",
"require": "javascript"
},
"bar": {
"title": "Bar",
"require": ["css", "javascript"]
}
}
}</code></pre>
<h3>Dependency kinds</h3>
<p>There are 3 kinds of dependencies:</p>
<dl>
<dt><code>require</code></dt>
<dd>
Prism will ensure that all dependees are loaded before the depender. <br>
You are <strong>not</strong> allowed to modify the dependees unless they are also declared as <code>modify</code>.
<p>This kind of dependency is most useful if you e.g. extend another language or dependee as an embedded language (e.g. like PHP is embedded in HTML).</p>
</dd>
<dt><code>optional</code></dt>
<dd>
Prism will ensure that an optional dependee is loaded before the depender if the dependee is loaded. Unlike <code>require</code> dependencies which also guarantee that the dependees are loaded, <code>optional</code> dependencies only guarantee the order of loaded components. <br>
You are <strong>not</strong> allowed to modify the dependees. If you need to modify the optional dependee, declare it as <code>modify</code> instead.
<p>This kind of dependency is useful if you have embedded languages but you want to give the users a choice as to whether they want to include the embedded language. By using <code>optional</code> dependencies, users can better control the bundle size of Prism by only including the languages they need.<br>
E.g. HTTP can highlight JSON and XML payloads but it doesn't force the user to include these languages.</p>
</dd>
<dt><code>modify</code></dt>
<dd>
This is an <code>optional</code> dependency which also declares that the depender might modify the dependees.
<p>This kind of dependency is useful if your language modifies another language (e.g. by adding tokens).<br>
E.g. CSS Extras adds new tokens to the CSS language.</p>
</dd>
</dl>
<p>To summarize the properties of the different dependency kinds:</p>
<table class="stylish">
<tr>
<th></th>
<th>Non-optional</th>
<th>Optional</th>
</tr>
<tr>
<th>Read only</th>
<td><code>require</code></td>
<td><code>optional</code></td>
</tr>
<tr>
<th>Modifiable</th>
<td></td>
<td><code>modify</code></td>
</tr>
</table>
<style>
table.stylish {
border-collapse: collapse;
}
table.stylish, table.stylish tr, table.stylish td, table.stylish th {
border: 1px solid #CCC;
}
table.stylish td, table.stylish th {
padding: .5em .75em;
}
table.stylish th {
background-color: #F8F8F8;
}
</style>
<p>Note: You can declare a component as both <code>require</code> and <code>modify</code></p>
<h2 id="resolving-dependencies">Resolving dependencies</h2>
<p>We consider the dependencies of components an implementation detail, so they may change from release to release. Prism will usually resolve dependencies for you automatically. So you won't have to worry about dependency loading if you <a href="download.html">download</a> a bundle or use the <code>loadLanguages</code> function in NodeJS, the <a href="plugins/autoloader/">AutoLoader</a>, or our Babel plugin.</p>
<p>If you have to resolve dependencies yourself, use the <code>getLoader</code> function exported by <a href="https://github.com/PrismJS/prism/blob/master/dependencies.js"><code>dependencies.js</code></a>. Example:</p>
<pre><code class="language-js">const getLoader = require('prismjs/dependencies');
const components = require('prismjs/components');
const componentsToLoad = ['markup', 'css', 'php'];
const loadedComponents = ['clike', 'javascript'];
const loader = getLoader(components, componentsToLoad, loadedComponents);
loader.load(id => {
require(`prismjs/components/prism-${id}.min.js`);
});</code></pre>
<p>For more details on the <code>getLoader</code> API, check out the <a href="https://github.com/PrismJS/prism/blob/master/dependencies.js">inline documentation</a>.</p>
</section>
<section id="writing-plugins">
<h1>Writing plugins</h1>
<p>Prism’s plugin architecture is fairly simple. To add a callback, you use <code class="language-javascript">Prism.hooks.add(hookname, callback)</code>.
<code>hookname</code> is a string with the hook id, that uniquely identifies the hook your code should run at.
<code>callback</code> is a function that accepts one parameter: an object with various variables that can be modified, since objects in JavaScript are passed by reference.
For example, here’s a plugin from the Markup language definition that adds a tooltip to entity tokens which shows the actual character encoded:
<pre><code class="language-javascript">Prism.hooks.add('wrap', function(env) {
if (env.token === 'entity') {
env.attributes['title'] = env.content.replace(/&amp;/, '&');
}
});</code></pre>
<p>Of course, to understand which hooks to use you would have to read Prism’s source. Imagine where you would add your code and then find the appropriate hook.
If there is no hook you can use, you may <a href="https://github.com/PrismJS/prism/issues">request one to be added</a>, detailing why you need it there.
</section>
<section id="api">
<h1>API documentation</h1>
<p>All public and stable parts of <a href="docs/">Prism's API are documented here</a>.</p>
</section>
<footer data-src="assets/templates/footer.html" data-type="text/html"></footer>
<script src="assets/vendor/utopia.js"></script>
<script src="prism.js"></script>
<script src="plugins/autoloader/prism-autoloader.js"></script>
<script src="plugins/line-highlight/prism-line-highlight.js"></script>
<script src="components.js"></script>
<script src="assets/code.js"></script>
</body>
</html>