dcsimg

Programming Reusable PIR: Learn how to write and test PIR subroutines

PIR is Parrot's native programming language. In this conclusion of his series, chromatic presents a variety of techniques to program and validate subroutines. Learn how to create libraries and test your code in this hands-on primer.

PIR is Parrot’s native programming language. In this conclusion of his series, chromatic presents a variety of techniques to program and validate subroutines. Learn how to create libraries and test your code in this hands-on primer.

The first article in this series explained Parrot and its philosophy. The second article introduced PIR, Parrot’s native programming language. PIR is a line-oriented assembly language. Even though it has plenty of high level features, everything is either a compiler directive or an opcode.

To recap, a simple program is:

.sub 'greet_and_count' :main
.local string hello
hello     = "Hello, world!"
.local int num_chars
num_chars = hello
print "'"
print hello
print "' has "
print num_chars
print "chars\n"
.end

Each line starting with a period contains a compiler directive (.sub, .local), and every remaining line is an opcode (print, or == which translates to the set opcode).

Parrot can do much more.

Powerful, Powerful Subroutines

Until now, all of the PIR subroutines shown have used positional arguments. The arguments you pass to a subroutine map directly to the parameter list inside the subroutine. Parrot also supports named arguments, where you can pass the arguments in any order as long as you give them appropriate names. It’s easier to show the example first:


.sub 'show_stats'
  .param string name  :named( 'name' )
  .param int    years :named( 'age' )
  print name
  print " is "
  print years
  print " years old.\n"
.end
.sub 'main' :main
  show_stats( 'age' => 4, 'name' => 'Jacob' )
.end

To mark a parameter as named, add :named() to its declaration. This adverb is parameterized, in that it takes a single argument itself — the name. At the point of call, use the key/value syntax with the fat arrow (familiar from Perl) to associate names with values. The code above displays:

Jacob is 4 years old.

If you don’t care for the key/value syntax at the point of call, use the :named adverbial modifier on each parameter:


.sub 'main'
  show_stats( 'name' => 'Jacob', 'age' => 4 )
  show_stats( 'Chevelle' :named( 'name' ), 6 :named( 'age' ) )
.end

This is more verbose, but it may be easier if you’re generating PIR code. It

works the same way as the pair syntax:

Jacob is 4 years old.Chevelle is 6 years old.

Names are optional, however. You can still make the call positionally:


.sub 'main'
  show_stats( 'name' => 'Jacob', 'age' => 4 )
  show_stats( 'Chevelle' :named( 'name' ), 6 :named( 'age' ) )
  show_stats( 30, 'Uncle Dude' )
.end

… and the results are as you expect. If you looked closely, you may have noticed that the symbolic register name doesn’t need to match the argument name. That’s a feature (but don’t leave off the parameter to :named(), or you’ll get an error.)

If you can name parameters, you ought to be able to mark parameters as optional, as well. Suppose you have either a name or an age, but you also have default values for both. You could write:


.sub 'main'
  show_stats_default( 'Jack' :named( 'name' ) )
  show_stats_default(  14    :named( 'age'  ) )
.end
.sub 'show_stats_default'
  .param string name :named( 'name' ) :optional
  .param int    have_name             :opt_flag
  .param int    age  :named( 'age' )  :optional
  .param int    have_age              :opt_flag
  if have_name goto check_age
  name = 'John Doe'
check_age:
  if have_age goto show_stats
  age  = 10
show_stats:
  print name
  print " is "
  print age
  print " years old.\n"
.end

The adverbial modifier for optional parameters is :optional. The presence of this modifier implies that the next parameter will be an integer flag that, when used as a boolean value, will be true if the caller passed in that argument and false otherwise. To distinguish these automatic flag variables from normal parameters, mark them with the :opt_flag adverb.

It’s not a requirement that you pair these flag variables with optional parameters, but it’s almost always the clearest and easiest way to write PIR.

You can mark positional parameters as optional too:


.sub 'main'
  .local int result
  result = increment_number( 10 )
  print "10 incremented is "
  print result
  print "\n"
  result = increment_number( 20, 2 )
  print "20 incremented by 2 is "
  print result
  print "\n"
.end
.sub 'increment_number'
  .param int value
  .param int step      :optional
  .param int have_step :opt_flag
  if have_step goto increment
  step = 1
increment:
  value += step
  .return( value )
.end

Of course, any optional positional parameters must come at the end of the list of arguments, otherwise all calls to such subroutines would have ambiguous argument lists. In practice, optional parameters are most useful with named parameters, except in cases such as increment_number, where there’s a very clear default value that you might wish to override infrequently.

Occasionally, the default behavior of Perl 5 subroutines is useful, especially when dealing with variadic lists (where you may pass zero or more optional arguments). In Perl 5, all parameters are available through the @_ array. It’s common to shift off or assign the first few elements of that array to variables, while leaving the remainder of the array unmodified. You can do this in PIR as well, through the use of slurpy parameters.

A slurpy parameter slurps up all remaining positional parameters into a ResizablePMCArray, boxing all primitives appropriately in PMCs, such that a parameter in an integer register becomes an Integer PMC in a PMC register. (Remember the previous article? You can only store PMCs in aggregate PMCs. Fortunately, Parrot handles boxes any primitives for you into the corresponding PMCs at the point of insertion.) You only get one :slurpy parameter per subroutine (for the most part), and they look like:


.sub 'main'
.local pmc my_array
my_array = make_array( 'my_array', 4, 5, 2, 3, 1 )
.end
.sub 'make_array'
  .param string name
  .param pmc    array :slurpy
  $I0 = array
  print "Slurped "
  print $I0
  print " elements into array '"
  print name
  print "'\n"
  .return( array )
.end

To declare a slurpy parameter, first handle all of the positional arguments you want to bind to names and types. Then declare a PMC parameter with the :slurpy adverb. That’s all. Parrot does the rest for you. Though this example is very simple, it does show an interesting and useful Parrot idiom for converting a literal list into a PMC array.

The other side of slurping parameters into an array is flattening an array into individual parameters. The modifier for this occurs on the calling side, not the callee. Suppose you want to print the first three elements of an array but you don’t want to extract them from the array explicitly before calling a function. You might write:


.sub 'print_three'
  .param int first
  .param int second
  .param int third
  print first
  print ", "
  print second
  print ", "
  print third
  print "\n"
.end
.sub 'main'
.local pmc my_array
my_array = make_array( 'my_array', 4, 5, 2, 3, 1 )
print_three( my_array :flat )
.end

Unfortunately, this doesn’t work; Parrot has flattened the array into individual list elements. In this case, there are five, and print_three only takes three arguments. Adding…

.param pmc rest :slurpy

… to the end of the parameter list in print_three solves the problem. It’s not the only option, but it’s a decent option for the list-processing style that many flat and slurpy calls encourage.

I’ve left the particulars of the interactions between named, optional, and slurpy parameters as an exercise for you. Feel free to experiment.

Multi-Dispatch

Sometimes the presence or absence of arguments or their names isn’t as important as the type of argument passed to a function. Take addition, for example. Adding two integers is obvious: you receive an integer as a result, right? What if there’s an overflow? What if one integer is only eight bits long while the other is sixteen? What happens when you add an integer to a floating point value?

One function doesn’t necessarily meet all of your needs. Sometimes you can perform complex data flow analysis and optimize to a specific “add two integers” function call during compilation. Sometimes you can’t.

You could put complex redispatching logic or if- else blocks in a single function to check types as the program runs, but that can get complex and messy.

Another option is to take advantage of multi-dispatch, if your language supports it. This is where multiple functions share the same name but differ on the types of arguments they support. If it helps, think of how different objects can have methods of the same name. When you call a method on an object, the compiler dispatches the call on the type of the object — it uses the object’s class to decide which method to call. Multi-dispatch can use far more than just the invocant to make that choice.

Consider the silly example of a function named double. When given an integer or number, it multiples the value by two and returns the result. When given a string, it appends a copy of the string to itself and returns the result.


.sub 'double' :multi( string )
  .param string value
  value .= value
  .return( value )
.end
.sub 'double' :multi( int )
  .param int value
  value *= 2
  .return( value )
.end
.sub 'main'
  $S0 = double( 'some string' )
 say $S0
  $I0 = double( 44 )
  say $I0
.end

Mark a multi-dispatch variant with the :multi modifier. This is yet another parametrized modifier. It takes a list of positional types with which to perform the check when dispatching to the appropriate variant. (double could also handle Parrot num s; in this case, you would need to copy and paste the full int variant, changing both occurrences of int to num.)

say() was added to print the value of the given register (calling get_string() on PMCs) with a trailing newline just to shorten these examples slightly.

You can dispatch on PMC types as well — not just the type of pmc but the actual type of the PMC. For example, you could reimplement the typeof operator in pure PIR along the lines of:


.sub 'typeof' :multi( Array )
  .return( 'Array' )
.end
.sub 'typeof' :multi( Hash )
  .return( 'Hash' )
.end
...

That would be silly, but you could do it.

Multi-dispatch can also handle multiple dispatchable parameters. The PIR-based Test::More testing library uses this technique in the is function, which takes two values and an optional test description and compares the two values for equality:


.sub 'is' :multi( string, string )
  .param string left
  .param string right
  .param string description :optional
  ...
.end
.sub 'is' :multi( int, int )
  .param int    left
  .param int    right
  .param string description :optional
  ...
.end
.sub 'is' :multi( num, num )
  .param num    left
  .param num    right
  .param string description :optional
  ...
.end

There’s also a variant with no dispatchable parameters (:multi(), with an empty parameter list); there’s no requirement that all type lists have to have the same length, or take the same number of parameters at all. I chose to write these four variants explicitly because it was most appropriate for the kinds of calls made to this function, not because of any limitation in the multi-dispatch model.

Multi-dispatch is a good and useful tool, but it can get complex with more than a few variants. Sometimes it’s the simplest way to solve a thorny problem. If you find yourself facing a combinatorial explosion, consider rethinking your design.

Namespaces

Multi-dispatch is one way to share the same name with dissimilar functions, but it’s a poor mechanism to use if your functions names are homophonic. Use multi-dispatch when you have one function which needs to do different things based on the types of its parameters, not to re-use the same name in different ways.

Instead, use namespaces to group functions into larger semantic units which give context to their names.

To declare a namespace, use the .namespace compiler directive. It works much like the package keyword in Perl 5; all functions declared after that point (and all other global symbols not otherwise qualified) get stored in that namespace. You can have multiple namespace declarations in a file, and there’s no requirement which ties the name of a file to any namespace. They’re independent.

To declare a single namespace of Main, write…

.namespace [ 'Main' ]

… before any subroutine declarations you want to store in the Main namespace. (There’s nothing special about the name Main, nor do you have to use any particular namespace for your own main function. To test this, prepend a namespace directive to any of the earlier examples and change Main to any other string.)

The syntax for identifying the namespace’s name may look odd, but it’s actually the same syntax as accessing an element within an aggregate PMC (such as an array or a hash). This is the syntax for keyed access, and it’s the clearest way to work with namespaces.

Namespaces in Parrot, unlike those in Perl 5, actually nest. That means that the Test namespace can actually contain other namespaces, such as Builder and More, as well as its own subroutines and variables.

To be even more specific, namespaces in Parrot are actually PMCs themselves–instances of the NameSpace PMC. If you’ve read between the lines, you may rightly assume that you can nest namespaces manually by creating them and storing them in parent namespaces, then storing the appropriate symbols in namespaces.

If you’re writing PIR, though, it’s usually easier just to use multi-dimensional keys:


.namespace [ 'Test'; 'More' ]
.sub 'ok'
...
.end
.sub 'is'
...
.end

The entire bracketed construct is a key (represented internally as a Key PMC), with individual levels of the key identified as strings, and separated by semicolons. This snippet tells Parrot to store the ok and is subroutines in the More namespace stored in the Test namespace.

When calling functions within the same namespace, you can use the function name alone. That doesn’t work for calling functions in another namespace. Unlike Perl, there’s no way to call a function by its fully-qualified name. You must look up the symbol from the namespace directly, and then call it. The find_global opcode does this.


.namespace [ 'Other'; 'NS' ]
.sub 'external_sub'
  print "From other NS\n"
.end
.namespace [ 'This'; 'NS' ]
.sub 'internal_sub'
  print "From my NS\n"
.end
.sub 'main' :main
 .local pmc sub_pmc
 sub_pmc = find_global 'internal_sub'
 sub_pmc()
 sub_pmc = find_global [ 'Other'; 'NS' ], 'external_sub'
 sub_pmc()
 sub_pmc = find_global [ 'This'; 'NS' ], 'internal_sub'
 sub_pmc()
.end

There are two forms of find_global. The first form takes a single string as the name of the symbol to find in the current namespace and returns a PMC. This PMC may or may not be a Sub PMC–that is, it may or may not be a callable function. It may be a variable. If there’s anything there, it will always be a PMC, however. (See the op’s documentation for more information about what happens if there’s an error.)

The second form of find_global takes as its first argument a key which indicates the namespace to search for the symbol named as the second argument.

The example code demonstrates fetching Sub PMCs (invokable subroutine objects) relative to the same namespace, explicitly from another namespace, and explicitly from the same namespace.

You can also use the store_global opcode to perform the converse operation. This can be useful to create an alias for a function:


.sub 'make_alias'
  .param pmc    sub_pmc
  .param string alias
  .param pmc    namespace :optional
  .param int    have_ns   :opt_flag
  if have_ns goto store_qualified_alias
  store_global alias, sub_pmc
  .return()
store_qualified_alias:
  store_global namespace, alias, sub_pmc
  .return()
.end

This function takes a Sub PMC and a new name for that PMC and stores the PMC in the current namespace, or in the namespace represented by an optional Key, PMC under the new name. The only information of note is that the key is the first argument to the opcode in the keyed version.

You can also perform a rough form of exporting (in the Perl 5 sense) to insert functions from an external namespace into the current namespace. Here’s a modification of main from the find_global example:


.sub 'main' :main
 .local pmc sub_pmc
 sub_pmc = find_global 'internal_sub'
 sub_pmc()
 sub_pmc = find_global [ 'Other'; 'NS' ], 'external_sub'
 store_global 'external', sub_pmc
 store_global [ 'This'; 'NS' ], 'more_external', sub_pmc
 external()
 more_external()
.end

(The binding of the calls does not happen at compile time.)

That code is okay, but it’s somewhat clunky and tedious. Fortunately, the NameSpace PMC has a method called export_to() which can ameliorate this

work.To import one of the test methods (ok() from Test::More, you could write:


.local pmc cur_ns, ext_ns
cur_ns = get_namespace
ext_ns = get_namespace [ 'Test'; 'More' ]
.local pmc export_list
export_list    = new 'Array'
export_list    = 1
export_list[0] = 'ok'
ext_ns.'export_to'( cur_ns, export_list )

Then you can call ok() as if you’d defined it yourself.

The get_namespace opcode works much like find_global and store_global. With a Key PMC as the first (and only argument), it returns the NameSpace PMC of that name. With no arguments, it returns the NameSpace PMC representing the current namespace.

Call the export_to() method on the external namespace, passing the NameSpace to which to export the subroutines named in the second argument, an array.

PIR’s assembly roots really show if you want to import more than one element, but there’s a workaround. Here’s how to import all of the interesting functions from Test::More:


...
  .local pmc exports
 exports = split ' ', 'ok is diag like skip todo is_deeply isa_ok'

   ext_ns.'export_to'(cur_ns, exports)
...

The split opcode takes a pattern and splits its second argument on the string, returning an Array- like PMC. This is easier than creating your own Array and manipulating it manually — and much shorter, once you know the idiom.

Managing Code

So far these examples have all been short and contained in one file. Your programs aren’t likely to be that simple, especially if you write a compiler with Parrot. As well, there are several libraries included with Parrot (and plenty more underway), so you have plenty of opportunities to reuse code and write reusable code.

This is a PIR tutorial, so it’s concentrated on showing you how to work with PIR files. Parrot can also handle PASM (bare-bones Parrot assembly code) and PBC (Parrot bytecode) files, with no additional work. From within Parrot, you can also load and run code written in any language for which Parrot has a registered compiler, but that’s a discussion for another time.

The important opcode to know is load_bytecode. It takes one argument: the path to a file containing PIR, PASM, or PBC. (Filetype detection currently relies on the suffix of the file, so name your PIR files something.pir.)

The current set of reusable Parrot libraries is in runtime/parrot/ within the Parrot source tree. Thus to load the Test::More library, write:

load_bytecode 'library/Test/More.pir'

It’s okay to use Unix-style paths; Parrot will translate them to whatever’s most appropriate for your particular filesystem.

This opcode loads and compiles the named PIR file as appropriate. You don’t have to load a file out of runtime/parrot/; you can just as easily specify an absolute path to your PIR file. As well, you don’t have to load something written specifically as a library. You can load a normal program. Parrot will not run any :main- marked subroutine in the file, however. If the loaded program performs some initialization in that routine, you may have trouble.

If you do have initialization to perform in a library, run it from a subroutine marked with :load. This adverb tells Parrot to run the marked subroutine only when loading this code, whether directly from the command line or when loaded from another file.

Another adverbial marker, :init, tells Parrot to run the named function only when loading this file directly, not when loading it from another file.

There are a few important points about these modifiers. First, their design has a couple of outstanding questions related to their relationship to bytecode, so they may have subtle changes in the future. However, something will provide this behavior; it’s necessary for various types of initialization. Second, you can call any other function defined in the file from an :init or :load function. Unlike BEGIN blocks in Perl 5, Parrot does not run these functions immediately after compiling them. Parrot compiles the entire file first.

Also unlike BEGIN blocks and Perl 5′s use statement, load_bytecode is an opcode, not a compiler directive, so its effect takes place at runtime, as Parrot encounters the opcode.

Finally, :init subroutines get called before :load subroutines.

If these limitations aren’t appropriate for you, IMCC (the PIR compiler) provides another option in the form of the .include directive. When the compiler sees this directive, it loads the given file and inserts its contents directly into the file, then compiles the new code as if it had already been there.

For obvious reasons, the .include d code must be straight PIR or PASM code; PBC will not work.

Why would you use .include over load_bytecode? It’s common when you want to create a single PBC file. For example, the Perl 6 implementation in Parrot uses separate source files to organize code into maintainable chunks. The actual perl6.pir file then includes all of these PIR files, and the Perl 6 Makefile compiles that file into perl6.pbc. This single file contains the entire Parrot compiler.

For libraries, this is less useful, especially where they may change separately from the files that use them.

When would you use PBC over PIR? Right now, there are often no strong advantages. PBC files bypass the PIR compilation and optimization step, so they offer a very slight speed improvement, but they are more difficult to modify as they require recompilation if you change their source code.

Practical PIR with Test::More

What’s all of this good for? By now you know enough to write your PIR code in the test-driven style… almost. Now you need to know how to use the Test::More library from PIR.

If you’ve ever use Test::More in Perl 5, you’re almost completely familiar with the interface (especially given the heritage of the PIR version). It’s time to draw everything in this tutorial together.

Start by loading the Test::More library and importing its testing functions into your namespace:


.sub 'main' :main
  load_bytecode 'library/Test/More.pir'
  .local pmc cur_ns, ext_ns
  cur_ns  = get_namespace
  ext_ns  = get_namespace [ 'Test'; 'More' ]
  .local pmc exports
  exports = split ' ', 'plan ok is like'
  ext_ns.'export_to'( cur_ns, exports )
  ...

With that accomplished, call plan() to tell the library how many tests you plan to run:

plan( 9 )

From there, you can group separate types of tests into their own functions:


test_ok()
test_is()
test_like()
.end

These test functions can use ok(), is(), and like() as much as they like. ok() has one required argument, a boolean value (passed as an integer), and one optional argument, a string describing the test:


.sub 'test_ok'
  ok( 1 )
  $I0 = 1
  ok( $I0, 'int register set to one should be true' )
  ok( 0, '... but this test will fail because zero is false' )
.end

Similarly, is() has two required arguments. It compares them, passing if they’re equal. The third argument is optional, and it’s a description again.


.sub 'test_is'
  $I0 = 2 + 2
  is( $I0, 4 )
  $S0  = 'hello'
  $S0 .= ' '
  $S0 .= 'world'
  is( $S0, 'hello world', 'the .= operator performs string concatenation' )
  $N0  = 1.4
  $N0 *= 3.0
  is( $N0, 4.2, '... watch out for floating point comparisions though' )
.end

like() is a bit more difficult to describe without going into detail about regular expressions and patterns and grammars; the gory details are in Perl 6′s Synopsis 5 (http://dev.perl.org/perl6/doc/design/syn/S05.html). Still, basic pattern matching is simple. The required second argument is a string containing the pattern to match:


.sub 'test_like'
  like( 'foo', 'bar',   '... here is a failure' )
  like( 'TEAM', '<[I]>', '... fails; no I in TEAM' )
  like( 'hello, detroit', 'oi', 'a proper British exclamation' )
.end

By now you should have the hang of the library. With that, you’re probably itching to run the tests and get your results. Either pass the name of the file to parrot

$ parrot my_test_file.pir

… or add a hash-bang line to the top of the program with the path to parrot (#!/path/to/parrot) and run it with the prove utility found in Perl 5′s Test::Harness module:

$ prove my_test_file.pir

The benefit of the prove approach is that you’ll get only a nice summary of the output, if everything passes:


t/library/my_test_file....ok
All tests successful.
Files=1, Tests=9,  0 wallclock secs ( 0.02 cusr +  0.00 csys =  0.02 CPU)

There’s plenty more to learn; Parrot has a lot of remaining features. However, this is at this point you know enough to start writing your own tests and libraries in PIR and to contribute to Parrot development as a PIR programmer.

As was suggested in the previous tutorial, browsing the docs/ops/ and docs/pdds/ directories will help you absorb new concepts. Now you should know enough to read the test files in t/op/ as well.

As usual, we’d love to see you in #parrot on irc.perl.org. Happy hacking!

Comments on "Programming Reusable PIR: Learn how to write and test PIR subroutines"

I’m impressed, I must say. Really not often do I encounter a blog that’s both educative and entertaining, and let me let you know, you will have hit the nail on the head. Your idea is outstanding; the issue is something that not sufficient individuals are speaking intelligently about. I am very comfortable that I stumbled throughout this in my seek for something relating to this.

hPqTbw tthmzyadrynn, [url=http://ylmcnfoqylag.com/]ylmcnfoqylag[/url], [link=http://txuqpsktmphh.com/]txuqpsktmphh[/link], http://gfbzqqeuurjz.com/

Leave a Reply