PIR is Parrot's native programming language. In this conclusion of his series, chromatic presents a variety of techniques to program and validate subroutines. Learn how to create libraries and test your code in this hands-on primer.
PIR is Parrot’s native programming language. In this conclusion of his series, chromatic presents a variety of techniques to program and validate subroutines. Learn how to create libraries and test your code in this hands-on primer.
The first article in this series explained Parrot and its philosophy. The second article introduced PIR, Parrot’s native programming language. PIR is a line-oriented assembly language. Even though it has plenty of high level features, everything is either a compiler directive or an opcode.
To recap, a simple program is:
.sub 'greet_and_count' :main
.local string hello
hello = "Hello, world!"
.local int num_chars
num_chars = hello
print "'"
print hello
print "' has "
print num_chars
print "chars\n"
.end
Each line starting with a period contains a compiler directive (.sub
, .local
), and every remaining line is an opcode (print
, or ==
which translates to the set
opcode).
Parrot can do much more.
Powerful, Powerful Subroutines
Until now, all of the PIR subroutines shown have used positional arguments. The arguments you pass to a subroutine map directly to the parameter list inside the subroutine. Parrot also supports named arguments, where you can pass the arguments in any order as long as you give them appropriate names. It’s easier to show the example first:
.sub 'show_stats'
.param string name :named( 'name' )
.param int years :named( 'age' )
print name
print " is "
print years
print " years old.\n"
.end
.sub 'main' :main
show_stats( 'age' => 4, 'name' => 'Jacob' )
.end
To mark a parameter as named, add :named()
to its declaration. This adverb is parameterized, in that it takes a single argument itself — the name. At the point of call, use the key/value syntax with the fat arrow (familiar from Perl) to associate names with values. The code above displays:
Jacob is 4 years old.
If you don’t care for the key/value syntax at the point of call, use the :named
adverbial modifier on each parameter:
.sub 'main'
show_stats( 'name' => 'Jacob', 'age' => 4 )
show_stats( 'Chevelle' :named( 'name' ), 6 :named( 'age' ) )
.end
This is more verbose, but it may be easier if you’re generating PIR code. It
works the same way as the pair syntax:
Jacob is 4 years old.
Chevelle is 6 years old.
Names are optional, however. You can still make the call positionally:
.sub 'main'
show_stats( 'name' => 'Jacob', 'age' => 4 )
show_stats( 'Chevelle' :named( 'name' ), 6 :named( 'age' ) )
show_stats( 30, 'Uncle Dude' )
.end
… and the results are as you expect. If you looked closely, you may have noticed that the symbolic register name doesn’t need to match the argument name. That’s a feature (but don’t leave off the parameter to :named()
, or you’ll get an error.)
If you can name parameters, you ought to be able to mark parameters as optional, as well. Suppose you have either a name or an age, but you also have default values for both. You could write:
.sub 'main'
show_stats_default( 'Jack' :named( 'name' ) )
show_stats_default( 14 :named( 'age' ) )
.end
.sub 'show_stats_default'
.param string name :named( 'name' ) :optional
.param int have_name :opt_flag
.param int age :named( 'age' ) :optional
.param int have_age :opt_flag
if have_name goto check_age
name = 'John Doe'
check_age:
if have_age goto show_stats
age = 10
show_stats:
print name
print " is "
print age
print " years old.\n"
.end
The adverbial modifier for optional parameters is :optional
. The presence of this modifier implies that the next parameter will be an integer flag that, when used as a boolean value, will be true if the caller passed in that argument and false otherwise. To distinguish these automatic flag variables from normal parameters, mark them with the :opt_flag
adverb.
It’s not a requirement that you pair these flag variables with optional parameters, but it’s almost always the clearest and easiest way to write PIR.
You can mark positional parameters as optional too:
.sub 'main'
.local int result
result = increment_number( 10 )
print "10 incremented is "
print result
print "\n"
result = increment_number( 20, 2 )
print "20 incremented by 2 is "
print result
print "\n"
.end
.sub 'increment_number'
.param int value
.param int step :optional
.param int have_step :opt_flag
if have_step goto increment
step = 1
increment:
value += step
.return( value )
.end
Of course, any optional positional parameters must come at the end of the list of arguments, otherwise all calls to such subroutines would have ambiguous argument lists. In practice, optional parameters are most useful with named parameters, except in cases such as increment_number
, where there’s a very clear default value that you might wish to override infrequently.
Occasionally, the default behavior of Perl 5 subroutines is useful, especially when dealing with variadic lists (where you may pass zero or more optional arguments). In Perl 5, all parameters are available through the @_
array. It’s common to shift off or assign the first few elements of that array to variables, while leaving the remainder of the array unmodified. You can do this in PIR as well, through the use of slurpy parameters.
A slurpy parameter slurps up all remaining positional parameters into a ResizablePMCArray
, boxing all primitives appropriately in PMCs, such that a parameter in an integer register becomes an Integer
PMC in a PMC register. (Remember the previous article? You can only store PMCs in aggregate PMCs. Fortunately, Parrot handles boxes any primitives for you into the corresponding PMCs at the point of insertion.) You only get one :slurpy
parameter per subroutine (for the most part), and they look like:
.sub 'main'
.local pmc my_array
my_array = make_array( 'my_array', 4, 5, 2, 3, 1 )
.end
.sub 'make_array'
.param string name
.param pmc array :slurpy
$I0 = array
print "Slurped "
print $I0
print " elements into array '"
print name
print "'\n"
.return( array )
.end
To declare a slurpy parameter, first handle all of the positional arguments you want to bind to names and types. Then declare a PMC parameter with the :slurpy
adverb. That’s all. Parrot does the rest for you. Though this example is very simple, it does show an interesting and useful Parrot idiom for converting a literal list into a PMC array.
The other side of slurping parameters into an array is flattening an array into individual parameters. The modifier for this occurs on the calling side, not the callee. Suppose you want to print the first three elements of an array but you don’t want to extract them from the array explicitly before calling a function. You might write:
.sub 'print_three'
.param int first
.param int second
.param int third
print first
print ", "
print second
print ", "
print third
print "\n"
.end
.sub 'main'
.local pmc my_array
my_array = make_array( 'my_array', 4, 5, 2, 3, 1 )
print_three( my_array :flat )
.end
Unfortunately, this doesn’t work; Parrot has flattened the array into individual list elements. In this case, there are five, and print_three
only takes three arguments. Adding…
.param pmc rest :slurpy
… to the end of the parameter list in print_three
solves the problem. It’s not the only option, but it’s a decent option for the list-processing style that many flat and slurpy calls encourage.
I’ve left the particulars of the interactions between named, optional, and slurpy parameters as an exercise for you. Feel free to experiment.
Multi-Dispatch
Sometimes the presence or absence of arguments or their names isn’t as important as the type of argument passed to a function. Take addition, for example. Adding two integers is obvious: you receive an integer as a result, right? What if there’s an overflow? What if one integer is only eight bits long while the other is sixteen? What happens when you add an integer to a floating point value?
One function doesn’t necessarily meet all of your needs. Sometimes you can perform complex data flow analysis and optimize to a specific “add two integers” function call during compilation. Sometimes you can’t.
You could put complex redispatching logic or if
- else
blocks in a single function to check types as the program runs, but that can get complex and messy.
Another option is to take advantage of multi-dispatch, if your language supports it. This is where multiple functions share the same name but differ on the types of arguments they support. If it helps, think of how different objects can have methods of the same name. When you call a method on an object, the compiler dispatches the call on the type of the object — it uses the object’s class to decide which method to call. Multi-dispatch can use far more than just the invocant to make that choice.
Consider the silly example of a function named double
. When given an integer or number, it multiples the value by two and returns the result. When given a string, it appends a copy of the string to itself and returns the result.
.sub 'double' :multi( string )
.param string value
value .= value
.return( value )
.end
.sub 'double' :multi( int )
.param int value
value *= 2
.return( value )
.end
.sub 'main'
$S0 = double( 'some string' )
say $S0
$I0 = double( 44 )
say $I0
.end
Mark a multi-dispatch variant with the :multi
modifier. This is yet another parametrized modifier. It takes a list of positional types with which to perform the check when dispatching to the appropriate variant. (double
could also handle Parrot num
s; in this case, you would need to copy and paste the full int
variant, changing both occurrences of int
to num
.)
say()
was added to print the value of the given register (calling get_string()
on PMCs) with a trailing newline just to shorten these examples slightly.
You can dispatch on PMC types as well — not just the type of pmc
but the actual type of the PMC. For example, you could reimplement the typeof
operator in pure PIR along the lines of:
.sub 'typeof' :multi( Array )
.return( 'Array' )
.end
.sub 'typeof' :multi( Hash )
.return( 'Hash' )
.end
...
That would be silly, but you could do it.
Multi-dispatch can also handle multiple dispatchable parameters. The PIR-based Test::More
testing library uses this technique in the is
function, which takes two values and an optional test description and compares the two values for equality:
.sub 'is' :multi( string, string )
.param string left
.param string right
.param string description :optional
...
.end
.sub 'is' :multi( int, int )
.param int left
.param int right
.param string description :optional
...
.end
.sub 'is' :multi( num, num )
.param num left
.param num right
.param string description :optional
...
.end
There’s also a variant with no dispatchable parameters (:multi()
, with an empty parameter list); there’s no requirement that all type lists have to have the same length, or take the same number of parameters at all. I chose to write these four variants explicitly because it was most appropriate for the kinds of calls made to this function, not because of any limitation in the multi-dispatch model.
Multi-dispatch is a good and useful tool, but it can get complex with more than a few variants. Sometimes it’s the simplest way to solve a thorny problem. If you find yourself facing a combinatorial explosion, consider rethinking your design.
Namespaces
Multi-dispatch is one way to share the same name with dissimilar functions, but it’s a poor mechanism to use if your functions names are homophonic. Use multi-dispatch when you have one function which needs to do different things based on the types of its parameters, not to re-use the same name in different ways.
Instead, use namespaces to group functions into larger semantic units which give context to their names.
To declare a namespace, use the .namespace
compiler directive. It works much like the package
keyword in Perl 5; all functions declared after that point (and all other global symbols not otherwise qualified) get stored in that namespace. You can have multiple namespace declarations in a file, and there’s no requirement which ties the name of a file to any namespace. They’re independent.
To declare a single namespace of Main
, write…
.namespace [ 'Main' ]
… before any subroutine declarations you want to store in the Main
namespace. (There’s nothing special about the name Main
, nor do you have to use any particular namespace for your own main function. To test this, prepend a namespace directive to any of the earlier examples and change Main
to any other string.)
The syntax for identifying the namespace’s name may look odd, but it’s actually the same syntax as accessing an element within an aggregate PMC (such as an array or a hash). This is the syntax for keyed access, and it’s the clearest way to work with namespaces.
Namespaces in Parrot, unlike those in Perl 5, actually nest. That means that the Test
namespace can actually contain other namespaces, such as Builder
and More
, as well as its own subroutines and variables.
To be even more specific, namespaces in Parrot are actually PMCs themselves–instances of the NameSpace
PMC. If you’ve read between the lines, you may rightly assume that you can nest namespaces manually by creating them and storing them in parent namespaces, then storing the appropriate symbols in namespaces.
If you’re writing PIR, though, it’s usually easier just to use multi-dimensional keys:
.namespace [ 'Test'; 'More' ]
.sub 'ok'
...
.end
.sub 'is'
...
.end
The entire bracketed construct is a key (represented internally as a Key
PMC), with individual levels of the key identified as strings, and separated by semicolons. This snippet tells Parrot to store the ok
and is
subroutines in the More
namespace stored in the Test
namespace.
When calling functions within the same namespace, you can use the function name alone. That doesn’t work for calling functions in another namespace. Unlike Perl, there’s no way to call a function by its fully-qualified name. You must look up the symbol from the namespace directly, and then call it. The find_global
opcode does this.
.namespace [ 'Other'; 'NS' ]
.sub 'external_sub'
print "From other NS\n"
.end
.namespace [ 'This'; 'NS' ]
.sub 'internal_sub'
print "From my NS\n"
.end
.sub 'main' :main
.local pmc sub_pmc
sub_pmc = find_global 'internal_sub'
sub_pmc()
sub_pmc = find_global [ 'Other'; 'NS' ], 'external_sub'
sub_pmc()
sub_pmc = find_global [ 'This'; 'NS' ], 'internal_sub'
sub_pmc()
.end
There are two forms of find_global
. The first form takes a single string as the name of the symbol to find in the current namespace and returns a PMC. This PMC may or may not be a Sub
PMC–that is, it may or may not be a callable function. It may be a variable. If there’s anything there, it will always be a PMC, however. (See the op’s documentation for more information about what happens if there’s an error.)
The second form of find_global
takes as its first argument a key which indicates the namespace to search for the symbol named as the second argument.
The example code demonstrates fetching Sub
PMCs (invokable subroutine objects) relative to the same namespace, explicitly from another namespace, and explicitly from the same namespace.
You can also use the store_global
opcode to perform the converse operation. This can be useful to create an alias for a function:
.sub 'make_alias'
.param pmc sub_pmc
.param string alias
.param pmc namespace :optional
.param int have_ns :opt_flag
if have_ns goto store_qualified_alias
store_global alias, sub_pmc
.return()
store_qualified_alias:
store_global namespace, alias, sub_pmc
.return()
.end
This function takes a Sub
PMC and a new name for that PMC and stores the PMC in the current namespace, or in the namespace represented by an optional Key
, PMC under the new name. The only information of note is that the key is the first argument to the opcode in the keyed version.
You can also perform a rough form of exporting (in the Perl 5 sense) to insert functions from an external namespace into the current namespace. Here’s a modification of main
from the find_global
example:
.sub 'main' :main
.local pmc sub_pmc
sub_pmc = find_global 'internal_sub'
sub_pmc()
sub_pmc = find_global [ 'Other'; 'NS' ], 'external_sub'
store_global 'external', sub_pmc
store_global [ 'This'; 'NS' ], 'more_external', sub_pmc
external()
more_external()
.end
(The binding of the calls does not happen at compile time.)
That code is okay, but it’s somewhat clunky and tedious. Fortunately, the NameSpace
PMC has a method called export_to()
which can ameliorate this
work.To import one of the test methods (ok()
from Test::More
, you could write:
.local pmc cur_ns, ext_ns
cur_ns = get_namespace
ext_ns = get_namespace [ 'Test'; 'More' ]
.local pmc export_list
export_list = new 'Array'
export_list = 1
export_list[0] = 'ok'
ext_ns.'export_to'( cur_ns, export_list )
Then you can call ok()
as if you’d defined it yourself.
The get_namespace
opcode works much like find_global
and store_global
. With a Key
PMC as the first (and only argument), it returns the NameSpace
PMC of that name. With no arguments, it returns the NameSpace
PMC representing the current namespace.
Call the export_to()
method on the external namespace, passing the NameSpace to which to export the subroutines named in the second argument, an array.
PIR’s assembly roots really show if you want to import more than one element, but there’s a workaround. Here’s how to import all of the interesting functions from Test::More
:
...
.local pmc exports
exports = split ' ', 'ok is diag like skip todo is_deeply isa_ok'
ext_ns.'export_to'(cur_ns, exports)
...
The split
opcode takes a pattern and splits its second argument on the string, returning an Array
- like PMC. This is easier than creating your own Array
and manipulating it manually — and much shorter, once you know the idiom.
Managing Code
So far these examples have all been short and contained in one file. Your programs aren’t likely to be that simple, especially if you write a compiler with Parrot. As well, there are several libraries included with Parrot (and plenty more underway), so you have plenty of opportunities to reuse code and write reusable code.
This is a PIR tutorial, so it’s concentrated on showing you how to work with PIR files. Parrot can also handle PASM (bare-bones Parrot assembly code) and PBC (Parrot bytecode) files, with no additional work. From within Parrot, you can also load and run code written in any language for which Parrot has a registered compiler, but that’s a discussion for another time.
The important opcode to know is load_bytecode
. It takes one argument: the path to a file containing PIR, PASM, or PBC. (Filetype detection currently relies on the suffix of the file, so name your PIR files something.pir.)
The current set of reusable Parrot libraries is in runtime/parrot/ within the Parrot source tree. Thus to load the Test::More
library, write:
load_bytecode 'library/Test/More.pir'
It’s okay to use Unix-style paths; Parrot will translate them to whatever’s most appropriate for your particular filesystem.
This opcode loads and compiles the named PIR file as appropriate. You don’t have to load a file out of runtime/parrot/; you can just as easily specify an absolute path to your PIR file. As well, you don’t have to load something written specifically as a library. You can load a normal program. Parrot will not run any :main
- marked subroutine in the file, however. If the loaded program performs some initialization in that routine, you may have trouble.
If you do have initialization to perform in a library, run it from a subroutine marked with :load
. This adverb tells Parrot to run the marked subroutine only when loading this code, whether directly from the command line or when loaded from another file.
Another adverbial marker, :init
, tells Parrot to run the named function only when loading this file directly, not when loading it from another file.
There are a few important points about these modifiers. First, their design has a couple of outstanding questions related to their relationship to bytecode, so they may have subtle changes in the future. However, something will provide this behavior; it’s necessary for various types of initialization. Second, you can call any other function defined in the file from an :init
or :load
function. Unlike BEGIN
blocks in Perl 5, Parrot does not run these functions immediately after compiling them. Parrot compiles the entire file first.
Also unlike BEGIN
blocks and Perl 5′s use
statement, load_bytecode
is an opcode, not a compiler directive, so its effect takes place at runtime, as Parrot encounters the opcode.
Finally, :init
subroutines get called before :load
subroutines.
If these limitations aren’t appropriate for you, IMCC (the PIR compiler) provides another option in the form of the .include
directive. When the compiler sees this directive, it loads the given file and inserts its contents directly into the file, then compiles the new code as if it had already been there.
For obvious reasons, the .include
d code must be straight PIR or PASM code; PBC will not work.
Why would you use .include
over load_bytecode
? It’s common when you want to create a single PBC file. For example, the Perl 6 implementation in Parrot uses separate source files to organize code into maintainable chunks. The actual perl6.pir file then includes all of these PIR files, and the Perl 6 Makefile compiles that file into perl6.pbc. This single file contains the entire Parrot compiler.
For libraries, this is less useful, especially where they may change separately from the files that use them.
When would you use PBC over PIR? Right now, there are often no strong advantages. PBC files bypass the PIR compilation and optimization step, so they offer a very slight speed improvement, but they are more difficult to modify as they require recompilation if you change their source code.
Practical PIR with Test::More
What’s all of this good for? By now you know enough to write your PIR code in the test-driven style… almost. Now you need to know how to use the Test::More
library from PIR.
If you’ve ever use Test::More
in Perl 5, you’re almost completely familiar with the interface (especially given the heritage of the PIR version). It’s time to draw everything in this tutorial together.
Start by loading the Test::More
library and importing its testing functions into your namespace:
.sub 'main' :main
load_bytecode 'library/Test/More.pir'
.local pmc cur_ns, ext_ns
cur_ns = get_namespace
ext_ns = get_namespace [ 'Test'; 'More' ]
.local pmc exports
exports = split ' ', 'plan ok is like'
ext_ns.'export_to'( cur_ns, exports )
...
With that accomplished, call plan()
to tell the library how many tests you plan to run:
plan( 9 )
From there, you can group separate types of tests into their own functions:
test_ok()
test_is()
test_like()
.end
These test functions can use ok()
, is()
, and like()
as much as they like. ok()
has one required argument, a boolean value (passed as an integer), and one optional argument, a string describing the test:
.sub 'test_ok'
ok( 1 )
$I0 = 1
ok( $I0, 'int register set to one should be true' )
ok( 0, '... but this test will fail because zero is false' )
.end
Similarly, is()
has two required arguments. It compares them, passing if they’re equal. The third argument is optional, and it’s a description again.
.sub 'test_is'
$I0 = 2 + 2
is( $I0, 4 )
$S0 = 'hello'
$S0 .= ' '
$S0 .= 'world'
is( $S0, 'hello world', 'the .= operator performs string concatenation' )
$N0 = 1.4
$N0 *= 3.0
is( $N0, 4.2, '... watch out for floating point comparisions though' )
.end
like()
is a bit more difficult to describe without going into detail about regular expressions and patterns and grammars; the gory details are in Perl 6′s Synopsis 5 (http://dev.perl.org/perl6/doc/design/syn/S05.html). Still, basic pattern matching is simple. The required second argument is a string containing the pattern to match:
.sub 'test_like'
like( 'foo', 'bar', '... here is a failure' )
like( 'TEAM', '<[I]>', '... fails; no I in TEAM' )
like( 'hello, detroit', 'oi', 'a proper British exclamation' )
.end
By now you should have the hang of the library. With that, you’re probably itching to run the tests and get your results. Either pass the name of the file to parrot
…
$ parrot my_test_file.pir
… or add a hash-bang line to the top of the program with the path to parrot
(#!/path/to/parrot
) and run it with the prove
utility found in Perl 5′s Test::Harness
module:
$ prove my_test_file.pir
The benefit of the prove
approach is that you’ll get only a nice summary of the output, if everything passes:
t/library/my_test_file....ok
All tests successful.
Files=1, Tests=9, 0 wallclock secs ( 0.02 cusr + 0.00 csys = 0.02 CPU)
There’s plenty more to learn; Parrot has a lot of remaining features. However, this is at this point you know enough to start writing your own tests and libraries in PIR and to contribute to Parrot development as a PIR programmer.
As was suggested in the previous tutorial, browsing the docs/ops/ and docs/pdds/ directories will help you absorb new concepts. Now you should know enough to read the test files in t/op/ as well.
As usual, we’d love to see you in #parrot on irc.perl.org. Happy hacking!
Comments on "Programming Reusable PIR: Learn how to write and test PIR subroutines"
I’m impressed, I must say. Really not often do I encounter a blog that’s both educative and entertaining, and let me let you know, you will have hit the nail on the head. Your idea is outstanding; the issue is something that not sufficient individuals are speaking intelligently about. I am very comfortable that I stumbled throughout this in my seek for something relating to this.
hPqTbw tthmzyadrynn, [url=http://ylmcnfoqylag.com/]ylmcnfoqylag[/url], [link=http://txuqpsktmphh.com/]txuqpsktmphh[/link], http://gfbzqqeuurjz.com/