Panini is a flexible toolkit that enables you to generate sentences from a context-free grammar, also known as a CFG.
Informally, a context-free grammar consists of a set of productions rules where a nonterminal on the right hand side of the produces a string of terminals and nonterminals on the right hand side. Like this:
S -> AB A -> a B -> b
I the above example, S, A, and B are all nonterminals. a and b are terminals. Furthermore, the nonterminal S is the start symbol for this CFG. By applying the productions as follows:
S (start symbol) AB (apply S -> AB) aB (apply A -> a) ab (apply B -> b)
The sentence ab is generated. In fact, this is the only sentence this grammar can produce! By adding one additional production to the grammar:
S -> ASB
The grammar may now potentially create an infinite number of sentences. They will all have the form of a<sup>i</sup>b<sup>i</sup> where i > 1. Here is one more example derivation:
S (start symbol) ASB (apply S -> ASB) aSB (apply A -> a) aSb (apply B -> b) aaSbb (apply S -> ASB) aaABbb (apply S -> AB) aaaBbb (apply A -> a) aaabbb (apply B -> b)
You learn more about CFGs, you can reference the CFG article on Wikipedia</a>.
Defining a grammar is easy. Create a grammar object, add some nonterminals and then add the productions to those nonterminals.
Here’s how the grammar from above is defined:
grammar = Panini::Grammar.new nt_s = grammar.add_nonterminal nt_a = grammar.add_nonterminal nt_b = grammar.add_nonterminal n_s.add_production([n_a, n_b]) # S -> AB n_s.add_production([n_a, n_s, n_b]) # S -> ASB n_a.add_production(['a']) # A -> 'a' n_b.add_production(['b']) # A -> 'b'
In order to derive sentences, the grammar needs a start symbol. Any nonterminal in the grammar can be used as the start symbol. If a start symbol is not explicitly set, then the first nonterminal added to the grammar is used.
grammar = Panini::Grammar.new nt_0 = grammar.add_nonterminal nt_1 = grammar.add_nonterminal grammar.start = nt_1
Derivators are objects that take a Panini::Grammar and then apply the rules to generate a sentence. Creating the sentences from the grammar can be tricky, and certain derivation strategies may be better for some grammars. There are currently two main derivators.
This strategy creates random sentences given a grammar. It employs a dampening factor to keep the computation of the sentence from blowing up.
derivator = Panini::DerivationStrategy::RandomDampened.new(grammar)
This will return a different sentence each time it is called. It may (and probably will) return the same sentence multiple times.
This strategy is used to exhaustively create all of the sentences that may be created by a grammar.
derivator = Panini::DerivationStrategy::RandomDampened.new(grammar, length_limit)
This will return a new sentence with each call. If there are no additional sentences to be created it will return nil. The same sentence may be returned multiple times if the grammar can derive the sentence in multiple ways.
You can optionally pass a limit for the size of sentences to be generated.
To generate a sentence, call the derivator’s sentence method like thus:
derivator.sentence -> ['a', 'a', 'b', 'b']
You will get a new sentence (depending on the grammar) with every call:
derivator.sentence -> ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b']
In this example, we create a grammar that generates mathematical expressions.
# ================ # = Nonterminals = # ================ expression = grammar.add_nonterminal("EXPR") term = grammar.add_nonterminal("TERM") factor = grammar.add_nonterminal("FACT") identifier = grammar.add_nonterminal("ID") number = grammar.add_nonterminal("NUM") # ============= # = Terminals = # ============= expression.add_production([term, '+', term]) expression.add_production([term, '-', term]) expression.add_production([term]) term.add_production([factor, '*', term]) term.add_production([factor, '/', term]) term.add_production([factor]) factor.add_production([identifier]) factor.add_production([number]) factor.add_production(['(', expression, ')']) ('a'..'z').each do |v| identifier.add_production([v]) end (0..100).each do |n| number.add_production([n]) end # =============================================== # = Choose a strategy and create some sentences = # =============================================== deriver = Panini::DerivationStrategy::RandomDampened.new(grammar) 10.times do puts "#{deriver.sentence.join(' ')}" end
Contributions to this gem are appreciated!
-
Check out the latest master to make sure the feature hasn’t been implemented or the bug hasn’t been fixed yet
-
Check out the issue tracker to make sure someone already hasn’t requested it and/or contributed it
-
Fork the project
-
Start a feature/bugfix branch
-
Commit and push until you are happy with your contribution
-
Make sure to add tests for it. This is important so I don’t break it in a future version unintentionally.
-
Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
-
Detect invalid grammars
-
Weighted productions.
-
Arbitrary start symbol.
-
Support Enumerator!
-
DSL or string based grammar definitions
-
Purdom Derivator?
-
Actions
-
Natural language
-
XML
-
JSON
-
Address
-
Tree/Flower (PS?)
-
Simulated user actions
Copyright © 2011 Matthew Bellantoni. See LICENSE.txt for further details.