ASPN ActiveState Programmer Network
  ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups | Web Services
SEARCH
advanced | search help

Reference
ActivePerl 5.8
Core Documentation
perl
perlintro
perltoc
perlreftut
perldsc
perllol
perlrequick
perlretut
perlboot
perltoot
perltooc
perlbot
perlstyle
perlcheat
perltrap
perldebtut
perlfaq1
perlfaq2
perlfaq3
perlfaq4
perlfaq5
perlfaq6
perlfaq7
perlfaq8
perlfaq9
perlsyn
perldata
perlop
perlsub
perlfunc
perlopentut
perlpacktut
perlpod
perlpodspec
perlrun
perldiag
perllexwarn
perldebug
perlvar
perlre
perlreref
perlref
perlform
perlobj
perltie
perldbmfilter
perlipc
perlfork
perlnumber
perlthrtut
perlothrtut
perlport
perllocale
perluniintro
perlunicode
perlebcdic
perlsec
perlmod
perlmodlib
perlmodstyle
perlmodinstall
perlnewmod
perlutil
perlcompile
perlfilter
perlembed
perldebguts
perlxstut
perlxs
perlclib
perlguts
perlcall
perlapi
perlintern
perliol
perlapio
perlhack
perlbook
perltodo
perlhist
perl588delta
perl587delta
perl586delta
perl585delta
perl584delta
perl583delta
perl582delta
perl581delta
perl58delta
perl573delta
perl572delta
perl571delta
perl570delta
perl561delta
perl56delta
perl5005delta
perl5004delta
perlcn
perljp
perlko
perltw
perlaix
perlamiga
perlapollo
perlbeos
perlbs2000
perlce
perlcygwin
perldgux
perldos
perlepoc
perlfreebsd
perlhpux
perlhurd
perlirix
perlmachten
perlmacos
perlmacosx
perlmint
perlmpeix
perlnetware
perlopenbsd
perlos2
perlos390
perlos400
perlplan9
perlqnx
perlsolaris
perltru64
perluts
perlvmesa
perlvms
perlvos
perlwin32

MyASPN >> Reference >> ActivePerl 5.8 >> Core Documentation
ActivePerl 5.8 documentation

perlthrtut - Tutorial on threads in Perl


NAME

perlthrtut - Tutorial on threads in Perl


DESCRIPTION

This tutorial describes the use of Perl interpreter threads (sometimes referred to as ithreads) that was first introduced in Perl 5.6.0. In this model, each thread runs in its own Perl interpreter, and any data sharing between threads must be explicit. The user-level interface for ithreads uses the the threads manpage class.

NOTE: There is another older Perl threading flavor called the 5.005 model that used the Threads class. This old model is known to have problems, is deprecated, and support for it will be removed in release 5.10. You are strongly encouraged to migrate any existing 5.005 threads code to the new model as soon as possible.

You can see which (or neither) threading flavour you have by running perl -V and looking at the Platform section. If you have useithreads=define you have ithreads, if you have use5005threads=define you have 5.005 threads. If you have neither, you don't have any thread support built in. If you have both, you are in trouble.

The the threads manpage and the threads::shared manpage modules are included in the core Perl distribution. Additionally, they are maintained as a separate modules on CPAN, so you can check there for any updates.


What Is A Thread Anyway?

A thread is a flow of control through a program with a single execution point.

Sounds an awful lot like a process, doesn't it? Well, it should. Threads are one of the pieces of a process. Every process has at least one thread and, up until now, every process running Perl had only one thread. With 5.8, though, you can create extra threads. We're going to show you how, when, and why.


Threaded Program Models

There are three basic ways that you can structure a threaded program. Which model you choose depends on what you need your program to do. For many non-trivial threaded programs, you'll need to choose different models for different pieces of your program.

Boss/Worker

The boss/worker model usually has one boss thread and one or more worker threads. The boss thread gathers or generates tasks that need to be done, then parcels those tasks out to the appropriate worker thread.

This model is common in GUI and server programs, where a main thread waits for some event and then passes that event to the appropriate worker threads for processing. Once the event has been passed on, the boss thread goes back to waiting for another event.

The boss thread does relatively little work. While tasks aren't necessarily performed faster than with any other method, it tends to have the best user-response times.

Work Crew

In the work crew model, several threads are created that do essentially the same thing to different pieces of data. It closely mirrors classical parallel processing and vector processors, where a large array of processors do the exact same thing to many pieces of data.

This model is particularly useful if the system running the program will distribute multiple threads across different processors. It can also be useful in ray tracing or rendering engines, where the individual threads can pass on interim results to give the user visual feedback.

Pipeline

The pipeline model divides up a task into a series of steps, and passes the results of one step on to the thread processing the next. Each thread does one thing to each piece of data and passes the results to the next thread in line.

This model makes the most sense if you have multiple processors so two or more threads will be executing in parallel, though it can often make sense in other contexts as well. It tends to keep the individual tasks small and simple, as well as allowing some parts of the pipeline to block (on I/O or system calls, for example) while other parts keep going. If you're running different parts of the pipeline on different processors you may also take advantage of the caches on each processor.

This model is also handy for a form of recursive programming where, rather than having a subroutine call itself, it instead creates another thread. Prime and Fibonacci generators both map well to this form of the pipeline model. (A version of a prime number generator is presented later on.)


What kind of threads are Perl threads?

If you have experience with other thread implementations, you might find that things aren't quite what you expect. It's very important to remember when dealing with Perl threads that Perl Threads Are Not X Threads for all values of X. They aren't POSIX threads, or DecThreads, or Java's Green threads, or Win32 threads. There are similarities, and the broad concepts are the same, but if you start looking for implementation details you're going to be either disappointed or confused. Possibly both.

This is not to say that Perl threads are completely different from everything that's ever come before -- they're not. Perl's threading model owes a lot to other thread models, especially POSIX. Just as Perl is not C, though, Perl threads are not POSIX threads. So if you find yourself looking for mutexes, or thread priorities, it's time to step back a bit and think about what you want to do and how Perl can do it.

However, it is important to remember that Perl threads cannot magically do things unless your operating system's threads allow it. So if your system blocks the entire process on sleep(), Perl usually will, as well.

Perl Threads Are Different.


Thread-Safe Modules

The addition of threads has changed Perl's internals substantially. There are implications for people who write modules with XS code or external libraries. However, since Perl data is not shared among threads by default, Perl modules stand a high chance of being thread-safe or can be made thread-safe easily. Modules that are not tagged as thread-safe should be tested or code reviewed before being used in production code.

Not all modules that you might use are thread-safe, and you should always assume a module is unsafe unless the documentation says otherwise. This includes modules that are distributed as part of the core. Threads are a relatively new feature, and even some of the standard modules aren't thread-safe.

Even if a module is thread-safe, it doesn't mean that the module is optimized to work well with threads. A module could possibly be rewritten to utilize the new features in threaded Perl to increase performance in a threaded environment.

If you're using a module that's not thread-safe for some reason, you can protect yourself by using it from one, and only one thread at all. If you need multiple threads to access such a module, you can use semaphores and lots of programming discipline to control access to it. Semaphores are covered in Basic semaphores.

See also Thread-Safety of System Libraries.


Thread Basics

The the threads manpage module provides the basic functions you need to write threaded programs. In the following sections, we'll cover the basics, showing you what you need to do to create a threaded program. After that, we'll go over some of the features of the the threads manpage module that make threaded programming easier.

Basic Thread Support

Thread support is a Perl compile-time option -- it's something that's turned on or off when Perl is built at your site, rather than when your programs are compiled. If your Perl wasn't compiled with thread support enabled, then any attempt to use threads will fail.

Your programs can use the Config module to check whether threads are enabled. If your program can't run without them, you can say something like:

    use Config;
    $Config{useithreads} or die('Recompile Perl with threads to run this program.');

A possibly-threaded program using a possibly-threaded module might have code like this:

    use Config;
    use MyMod;
    BEGIN {
        if ($Config{useithreads}) {
            # We have threads
            require MyMod_threaded;
            import MyMod_threaded;
        } else {
            require MyMod_unthreaded;
            import MyMod_unthreaded;
        }
    }

Since code that runs both with and without threads is usually pretty messy, it's best to isolate the thread-specific code in its own module. In our example above, that's what MyMod_threaded is, and it's only imported if we're running on a threaded Perl.

A Note about the Examples

In a real situation, care should be taken that all threads are finished executing before the program exits. That care has not been taken in these examples in the interest of simplicity. Running these examples as is will produce error messages, usually caused by the fact that there are still threads running when the program exits. You should not be alarmed by this.

Creating Threads

The the threads manpage module provides the tools you need to create new threads. Like any other module, you need to tell Perl that you want to use it; use threads; imports all the pieces you need to create basic threads.

The simplest, most straightforward way to create a thread is with create():

    use threads;
    my $thr = threads->create(\&sub1);
    sub sub1 {
        print("In the thread\n");
    }

The create() method takes a reference to a subroutine and creates a new thread that starts executing in the referenced subroutine. Control then passes both to the subroutine and the caller.

If you need to, your program can pass parameters to the subroutine as part of the thread startup. Just include the list of parameters as part of the threads->create() call, like this:

    use threads;
    my $Param3 = 'foo';
    my $thr1 = threads->create(\&sub1, 'Param 1', 'Param 2', $Param3);
    my @ParamList = (42, 'Hello', 3.14);
    my $thr2 = threads->create(\&sub1, @ParamList);
    my $thr3 = threads->create(\&sub1, qw(Param1 Param2 Param3));
    sub sub1 {
        my @InboundParameters = @_;
        print("In the thread\n");
        print('Got parameters >', join('<>', @InboundParameters), "<\n");
    }

The last example illustrates another feature of threads. You can spawn off several threads using the same subroutine. Each thread executes the same subroutine, but in a separate thread with a separate environment and potentially separate arguments.

new() is a synonym for create().

Waiting For A Thread To Exit

Since threads are also subroutines, they can return values. To wait for a thread to exit and extract any values it might return, you can use the join() method:

    use threads;
    my ($thr) = threads->create(\&sub1);
    my @ReturnData = $thr->join();
    print('Thread returned ', join(', ', @ReturnData), "\n");
    sub sub1 { return ('Fifty-six', 'foo', 2); }

In the example above, the join() method returns as soon as the thread ends. In addition to waiting for a thread to finish and gathering up any values that the thread might have returned, join() also performs any OS cleanup necessary for the thread. That cleanup might be important, especially for long-running programs that spawn lots of threads. If you don't want the return values and don't want to wait for the thread to finish, you should call the detach() method instead, as described next.

NOTE: In the example above, the thread returns a list, thus necessitating that the thread creation call be made in list context (i.e., my ($thr)). See $thr- in the threads manpagejoin()"> and THREAD CONTEXT in the threads manpage for more details on thread context and return values.

Ignoring A Thread

join() does three things: it waits for a thread to exit, cleans up after it, and returns any data the thread may have produced. But what if you're not interested in the thread's return values, and you don't really care when the thread finishes? All you want is for the thread to get cleaned up after when it's done.

In this case, you use the detach() method. Once a thread is detached, it'll run until it's finished; then Perl will clean up after it automatically.

    use threads;
    my $thr = threads->create(\&sub1);   # Spawn the thread
    $thr->detach();   # Now we officially don't care any more
    sleep(15);        # Let thread run for awhile
    sub sub1 {
        $a = 0;
        while (1) {
            $a++;
            print("\$a is $a\n");
            sleep(1);
        }
    }

Once a thread is detached, it may not be joined, and any return data that it might have produced (if it was done and waiting for a join) is lost.

detach() can also be called as a class method to allow a thread to detach itself:

    use threads;
    my $thr = threads->create(\&sub1);
    sub sub1 {
        threads->detach();
        # Do more work
    }


Threads And Data

Now that we've covered the basics of threads, it's time for our next topic: Data. Threading introduces a couple of complications to data access that non-threaded programs never need to worry about.

Shared And Unshared Data

The biggest difference between Perl ithreads and the old 5.005 style threading, or for that matter, to most other threading systems out there, is that by default, no data is shared. When a new Perl thread is created, all the data associated with the current thread is copied to the new thread, and is subsequently private to that new thread! This is similar in feel to what happens when a UNIX process forks, except that in this case, the data is just copied to a different part of memory within the same process rather than a real fork taking place.

To make use of threading, however, one usually wants the threads to share at least some data between themselves. This is done with the the threads::shared manpage module and the :shared attribute:

    use threads;
    use threads::shared;
    my $foo :shared = 1;
    my $bar = 1;
    threads->create(sub { $foo++; $bar++; })->join();
    print("$foo\n");  # Prints 2 since $foo is shared
    print("$bar\n");  # Prints 1 since $bar is not shared

In the case of a shared array, all the array's elements are shared, and for a shared hash, all the keys and values are shared. This places restrictions on what may be assigned to shared array and hash elements: only simple values or references to shared variables are allowed - this is so that a private variable can't accidentally become shared. A bad assignment will cause the thread to die. For example:

    use threads;
    use threads::shared;
    my $var          = 1;
    my $svar :shared = 2;
    my %hash :shared;
    ... create some threads ...
    $hash{a} = 1;       # All threads see exists($hash{a}) and $hash{a} == 1
    $hash{a} = $var;    # okay - copy-by-value: same effect as previous
    $hash{a} = $svar;   # okay - copy-by-value: same effect as previous
    $hash{a} = \$svar;  # okay - a reference to a shared variable
    $hash{a} = \$var;   # This will die
    delete($hash{a});   # okay - all threads will see !exists($hash{a})

Note that a shared variable guarantees that if two or more threads try to modify it at the same time, the internal state of the variable will not become corrupted. However, there are no guarantees beyond this, as explained in the next section.

Thread Pitfalls: Races

While threads bring a new set of useful tools, they also bring a number of pitfalls. One pitfall is the race condition:

    use threads;
    use threads::shared;
    my $a :shared = 1;
    my $thr1 = threads->create(\&sub1);
    my $thr2 = threads->create(\&sub2);
    $thr1->join;
    $thr2->join;
    print("$a\n");
    sub sub1 { my $foo = $a; $a = $foo + 1; }
    sub sub2 { my $bar = $a; $a = $bar + 1; }

What do you think $a will be? The answer, unfortunately, is it depends. Both sub1() and sub2() access the global variable $a, once to read and once to write. Depending on factors ranging from your thread implementation's scheduling algorithm to the phase of the moon, $a can be 2 or 3.

Race conditions are caused by unsynchronized access to shared data. Without explicit synchronization, there's no way to be sure that nothing has happened to the shared data between the time you access it and the time you update it. Even this simple code fragment has the possibility of error:

    use threads;
    my $a :shared = 2;
    my $b :shared;
    my $c :shared;
    my $thr1 = threads->create(sub { $b = $a; $a = $b + 1; });
    my $thr2 = threads->create(sub { $c = $a; $a = $c + 1; });
    $thr1->join;
    $thr2->join;

Two threads both access $a. Each thread can potentially be interrupted at any point, or be executed in any order. At the end, $a could be 3 or 4, and both $b and $c could be 2 or 3.

Even $a += 5 or $a++ are not guaranteed to be atomic.

Whenever your program accesses data or resources that can be accessed by other threads, you must take steps to coordinate access or risk data inconsistency and race conditions. Note that Perl will protect its internals from your race conditions, but it won't protect you from you.


Synchronization and control

Perl provides a number of mechanisms to coordinate the interactions between themselves and their data, to avoid race conditions and the like. Some of these are designed to resemble the common techniques used in thread libraries such as pthreads; others are Perl-specific. Often, the standard techniques are clumsy and difficult to get right (such as condition waits). Where possible, it is usually easier to use Perlish techniques such as queues, which remove some of the hard work involved.

Controlling access: lock()

The lock() function takes a shared variable and puts a lock on it. No other thread may lock the variable until the variable is unlocked by the thread holding the lock. Unlocking happens automatically when the locking thread exits the block that contains the call to the lock() function. Using lock() is straightforward: This example has several threads doing some calculations in parallel, and occasionally updating a running total:

    use threads;
    use threads::shared;
    my $total :shared = 0;
    sub calc {
        while (1) {
            my $result;
            # (... do some calculations and set $result ...)
            {
                lock($total);  # Block until we obtain the lock
                $total += $result;
            } # Lock implicitly released at end of scope
            last if $result == 0;
        }
    }
    my $thr1 = threads->create(\&calc);
    my $thr2 = threads->create(\&