ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


The Importance of Perl

by Tim O'Reilly, O'Reilly & Associates, Inc. and Ben Smith, Ronin House

Despite all the press attention to Java and ActiveX, the real job of "activating the Internet" belongs to Perl, a language that is all but invisible to the world of professional technology analysts but looms large in the mind of anyone -- webmaster, system administrator or programmer -- whose daily work involves building custom web applications or gluing together programs for purposes their designers had not quite foreseen. As Hassan Schroeder, Sun's first webmaster, remarked: "Perl is the duct tape of the Internet."

Perl was originally developed by Larry Wall as a scripting language for UNIX, aiming to blend the ease of use of the UNIX shell with the power and flexibility of a system programming language like C. Perl quickly became the language of choice for UNIX system administrators.

With the advent of the World Wide Web, Perl usage exploded. The Common Gateway Interface (CGI) provided a simple mechanism for passing data from a web server to another program, and returning the result of that program interaction as a web page. Perl quickly became the dominant language for CGI programming.

With the development of a powerful Win32 port, Perl has also made significant inroads as a scripting language for NT, especially in the areas of system administration and web site management and programming.

For a while, the prevailing wisdom among analysts was that CGI programs--and Perl along with them--would soon be replaced by Java, ActiveX and other new technologies designed specifically for the Internet. Surprisingly, though, Perl has continued to gain ground, with frameworks such as Microsoft's Active Server Pages (ASP) and the Apache web server's mod_perl allowing Perl programs to be run directly from the server, and interfaces such as DBI, the Perl DataBase Interface, providing a stable API for integration of back-end databases.

This paper explores some of the reasons why Perl will become increasingly important, not just for the web but as a general purpose computer language. These reasons include:

  • fundamental differences in the tasks best performed by scripting languages like Perl versus traditional system programming languages like Java, C++ or C.
  • Perl's ability to "glue together" other programs, or transform the output of one program so it can be used as input to another.
  • Perl's unparalleled ability to process text, using powerful features like regular expressions. This is especially important because of the re-emergence via the web of text files (HTML) as a lingua-franca across all applications and systems.
  • The ability of a distributed development community to keep up with rapidly changing demands, in an organic, evolutionary manner.

A good scripting language is a high-level software development language that allows for quick and easy development of trivial tools while having the process flow and data organization necessary to also develop complex applications. It must be fast while executing. It must be efficient when calling system resources such as file operations, interprocess communications, and process control. A great scripting language runs on every popular operating system, is tuned for information processing (free form text) and yet is excellent at data processing (numbers and raw, binary data). It is embeddable, and extensible. Perl fits all of these criteria.

When and Why a Scripting Language?
As John Ousterhout has elegantly argued in his paper, Scripting: Higher Level Programming for the 21st Century, "Scripting languages such as Perl and Tcl represent a very different style of programming than system programming languages such as C or Java. Scripting languages are designed for 'gluing' applications; they use typeless approaches to achieve a higher level of programming and more rapid application development than system programming languages. Increases in computer speed and changes in the application mix are making scripting languages more and more important for applications of the future."

Ousterhout goes on:

As we near the end of the 20th century a fundamental change is occurring in the way people write computer programs. The change is a transition from system programming languages such as C or C++ to scripting languages such as Perl or Tcl. Although many people are participating in the change, few people realize that it is occurring and even fewer people know why it is happening....

Scripting languages are designed for different tasks than system programming languages, and this leads to fundamental differences in the languages. System programming languages were designed for building data structures and algorithms from scratch, starting from the most primitive computer elements such as words of memory. In contrast, scripting languages are designed for gluing: they assume the existence of a set of powerful components and are intended primarily for connecting components together. System programming languages are strongly typed to help manage complexity, while scripting languages are typeless to simplify connections between components and provide rapid application development.

Scripting languages and system programming languages are complementary, and most major computing platforms since the 1960's have provided both kinds of languages. However, several recent trends, such as faster machines, better scripting languages, the increasing importance of graphical user interfaces and component architectures, and the growth of the Internet, have greatly increased the applicability of scripting languages. These trends will continue over the next decade, with scripting languages used for more and more applications and system programming languages used primarily for creating components.

System administrators were among the first to capitalize on the power of scripting languages. The problems are everywhere, on every operating system. They usually appear as the requirement to automate repetitive tasks. Even Macintosh operating systems need some user definable automation. It might be as simple as an automated backup and recovery system, or as complex as a periodic inventory of all the files on a disk, or all the system configuration changes in the last 24 hours. Many times, there are existing utilities that do part of the work, but automation requires a more general framework for running programs, capturing or transforming their output, and coordinating the work of multiple applications.

Most systems have included some form of scripting language. VMS's DCL, MS-DOS's .BAT files, UNIX's shell scripts, IBM's Rexx, Windows' Visual Basic and Visual Basic for Applications, and Applescript are good examples of scripting languages that are specific to a single operating system. Perl is fairly unique in that it has broken the tight association with a single operating system and become widely used as a scripting language on multiple platforms.

Some scripting languages, most notably Perl and Visual Basic, and to a lesser extent Tcl and Python, have gained wide use as general purpose programming languages. Successful scripting languages distinguish themselves by the ease with which they call and execute operating system utilities and services. To reach the next level, and function as general purpose languages, they must be robust enough that you can build entire complex application programs. The scripting language is used to prototype, model, and test. If the scripting language is robust and fast enough, the prototype evolves directly into the application.

So why not use a general purpose programming language like C, C++ or Java instead of a scripting language? The answer is simple: Cost. Development time is more expensive than fast hardware and memory. Scripting languages are easy to learn, and simple to use.

As Ousterhout points out, scripting languages typically lack data types. They don't distinguish between integer and floating point numbers. Variables are typeless. This is one of the ways that scripting languages speed up development. The concept is to "leave the details for later." Since scripting languages are generally good at calling system utilities to do the dirty work, for instance, copying files and building directories or file folders, the details can be handled by some small utility that, if it doesn't exist and is necessary, will be easy to write in a compiled language.

What do those data types do for compiled languages? They make memory management easier for the system, but harder for the programmer. Think about this: How much did a programmer make an hour when FORTRAN was on the ascendant? How much did memory cost then? How about now? Times have changed. Memory is cheap; programmers are expensive!

System languages need to have everything spelled out. This makes compilation of complex data structures easier, but programming harder. Scripting languages make as many assumptions as they can. As little as possible needs to be spelled out. This makes the scripting language easier to learn and faster to write in. The price to be paid is difficulty in developing complex data structures and algorithms. Perl, however, is good at both complex data structures and algorithms, without sacrificing ease of use for simple applications.

Interpreted vs. Compiled Languages

Most scripting languages are interpreted languages, which contributes to the perception that they may be inappropriate for large scale programming projects. This perception needs to be addressed.

With the exception of language specific hardware, it is true that interpreted programs are slower than compiled languages. The advantage of interpreted languages is that programs written in that language are portable to any system that the interpreter will run on. The system-specific details are handled by the interpreter, not by the application program. (There are always exceptions to this rule. For example, the application program may explicitly use a non-portable system resource.)

Operating system command interpreters such as MS-DOS's command.com and early versions of the UNIX C shell are good examples of how interpreters work: each command line is fed to the interpreter as it occurs in the script. The worst blow to efficiency is in any looping; each line in the loop is reinterpreted every time it is run. Some people think that all scripting languages work like this... slowly, inefficiently, a line at a time. This is not true.

However, there are middle languages, languages that are compiled to some intermediate code which is loaded and run by an interpreter at run time. Java is an example of this model; this is what will make Java a valuable a cross platform application language. All the Java interpreters on different hardware will be able to communicate and share data and process resources. This is perfect for embedded systems, where each device is actually a different kind of special purpose hardware. Java is not a scripting language, however. It requires data declarations. It is compiled ahead of time (unless you count Just-In-Time compilation -- really just code generation -- as part of the process).

Perl is also a middle language. Blocks of perl are compiled as needed, but the executable image is held in memory instead of written to a file. The compilation only happens once for any block of the perl script. The advantages of Perl's design make all this optimization work worth while. Perl maintains the portability of an interpreted language while achieving nearly the speed of a compiled language. Perl, nearly a decade old, with hundreds of thousands of developers, and now in its fifth incarnation, runs lean and fast. There is some amount of startup latency, as the script is initially compiled, but this is typically small relative to the overall performance of the script. In addition, techniques such as "fast CGI", which keeps the image of a frequently accessed CGI script in memory for repetitive re-execution, avoids this startup latency, except on the very first execution of a script.

In any event, Perl 5.005 will include a compiler, created by Malcolm Beattie of Oxford University. The compiler eliminates the startup latency of in-process compilation, and adds some other small speed-ups as well. It also addresses the psychological barrier programmers of commercial applications sometimes experience with respect to interpreted languages. (With a compiled language, the source code is no longer available for inspection by outside parties.)

Information Processing versus Data Processing

The World Wide Web is only one instance of a fundamental change in how we interact with computers. This change is visible in the very name we now give the industry. It used to be called "Data Processing," as in "I'll have to submit my job to the data processing center at 4 AM so that I can pick up my output before noon." Now we call it "Information Services" as in "the Director of Information Services is working with our planning committee." The interest and emphasis is now on "information" not "data." It is clear there is more interest in information, which typically includes a mix of text and numeric data, rather than just data. Perl excels at handling information.

An important part of Perl's information-handling power comes from a special syntax called regular expressions. Regular expressions give Perl enormous power to perform actions based on patterns that it recognizes in a body of free form text. Other languages support regular expressions as well (there is even a freeware regular expression library for Java), but no other language integrates them as well as Perl.

For many years, the trend was to embed text in specialized application file formats. Except for UNIX, which explicitly specified ASCII text as a universal file format for exchange between cooperating programs, most systems allowed incompatible formats to proliferate. This trend was reversed sharply by the World Wide Web, whose HTML data format consists of ASCII text with embedded markup tags. Because of the importance of the web, HTML -- and ASCII text with it -- is now center stage as an interchange format, exported by virtually all applications. There are even plans by Microsoft to provide an HTML view of the desktop. A successor to HTML, XML (eXtensible Markup Language) is widely expected to become a standard way of exchanging data in a mixed environment.

The increasing prominence of HTML plays directly to Perl's strengths. It is an ideal language for validating user input in HTML forms, for manipulating the contents of large collections of HTML files, or for extracting and analyzing data from voluminous log files.

That is only one side of the text processing power of Perl. Perl not only gives you several ways to pick data apart, but also several ways to glue data back together. Perl is thus ideal for taking apart an information stream and reconfiguring it. This can be done on the fly as a way of transforming information into input to other programs or for analysis and reporting.

One can argue that the next generation of computer applications will not be traditional software applications but "information applications", in which text forms a large percentage of the user interface. Consider the classic "Intranet" web application: a human resources system through which employees can choose which mutual funds in which to invest their retirement savings, track the performance of their account, and access information that helps them to make better investment decisions. The interface to such a system consists of a series of informational documents (typically presented as HTML), a few simple forms-based CGI scripts, and links to back-end systems (which may be outside services accessed via the Internet) for real-time stock quotes.

To build an application like this using traditional software techniques would be impractical. Each company's mix of available investments is unique; the application would not justify the amount of traditional programming required for such a localized application. Using the web as a front end, and perl scripts as a link to back end databases, you are essentially able to create a custom application in a matter of hours.

Or consider Amazon.com, perhaps the most visibly successful new web business. Amazon provides an information front-end to a back-end database and order-entry system, with, you guessed it, Perl, as a major component tying the two together.

Perl access to databases is supported by a powerful set of database-independent interfaces called DBI. Perl + fast-cgi + DBI is probably the most widely used "database connector" on the web. ODBC modules are also available.

Put together Perl's power to handle text on the front end, and connect to databases on the back end, and you begin to understand why it will play an increasingly important role in the new generation of information applications.

Other applications of Perl's ability to recognize and manipulate text patterns include biomedical research and data mining. Any large text database, from the gene sequences analyzed by the Human Genome Project to the log files collected by any large web site, can be studied and manipulated by Perl. Finally, Perl is increasingly being used for applications such as network-enabled research and specialized Internet search applications. Its strength with regular expressions and facility with sockets, the communications building block of the Internet, have made the language of choice for building Web robots, those programs that search the Internet for information.

Perl for Application Development

Developers are increasingly coming to realize Perl's value as an application development language. Perl makes it possible to realistically propose projects that would be unaffordable in the traditional system programming languages. Not only is it fast to build applications with Perl, but they can be very complex, even incorporating the best attributes of object-oriented programming if necessary.

It is easier to build socket-based client-server applications with Perl than with C or C++. It more efficient to build free text parsing applications in Perl than any other language. Perl has a sophisticated debugger (written in Perl), and many options for building secure applications. There are publicly available Perl modules for every sort of application. These can be dynamicly loaded as needed.

Perl can be easily extended with compiled functions written in C/C++ or even Java. This means that it is easy to include system services and functions that may not already be native to Perl. This is particularly valuable when working on non-UNIX platforms since the special attributes of that operating system can be included in the Perl language.

Perl can also be called from compiled applications, or embedded into applications written in other languages. Efforts are underway, for instance, to create a standard way to incorporate Perl into Java, such that Java classes could be created with Perl implementations. Currently, such applications must embed the Perl interpreter. A new compiler back-end, to be available in fourth quarter 1997 in O'Reilly & Associates' Perl Resource Kit, will remove this obstacle, allowing some Perl applications to be compiled to Java byte-code.

Graphical Interfaces

Because it was originally developed for the UNIX environment, where the ASCII terminal was the primary input/output device (and even windowing systems such as X preserved the terminal model within individual windows), Perl doesn't define a native GUI interface. (But in today's fragmented GUI world this can be construed as a feature.) Instead, there are Perl extension modules for creating applications with graphical interfaces. The most widely used is Tk, which was originally developed as a graphical toolkit for the Tcl scripting language, but which was soon ported to Perl. Tcl is still specific to the X Window System, though it is currently being ported to Microsoft Windows.

However, as noted earlier, the development of native windowing applications is becoming less important as the web becomes the standard GUI for many applications. The "webtop" is fast replacing the "desktop" as the universal cross-platform application target. Write one Web interface and it works on UNIX, Mac, Windows/NT, Windows/95...anything that has a Web browser.

In fact, an increasing number of sites use Perl and the Web to create new easier-to-use interfaces to legacy applications. For example, the Purdue University Network Computing Hub provides a web-based front-end to more than thirty different circuit simulation tools, using Perl to interpret user input into web forms and transform it into command sequences for programs connected to the hub.

Multithreading

Threads are a desireable abstraction for doing multiple and concurrent processing, particularly if you are programming for duplex communications or event driven applications. A multi-threading "patch" to Perl has been available since early 1997; it will be integrated into the standard distribution as of Perl version 5.005, in the fourth quarter.

The multitasking model that Perl has historically supported is "fork" and "wait." The granularity is the process. The flavor is UNIX. Unfortunately, the Windows/NT equivalent isn't quite the same. This is where the portability of Perl breaks down, at least for now. By building cross-platform multi-process Perl applications with a layer of abstraction between the process control and the rest of the application, the problems can be avoided. Furthermore, work is underway, to be completed in the fourth quarter of 1997, to reconcile the process-control code in the UNIX and Win32 ports of Perl.

Perl on Win32 Systems

In 1996, Microsoft commissioned ActiveWare Internet Corporation (now ActiveState Tool Corp) to create a port of Perl to Win32 for inclusion in the NT Resource Kit. That port has since become widely available on the net, and reportedly, nearly half of all downloads of the Perl source code are for the Win32 platform.

Perl has taken off on Win32 platforms such as NT for several reasons. Despite the presence of Visual Basic and Visual Basic for Applications, native scripting support on Win32 is relatively weak. While VB is an interpreted scripting language, it is still a typed language, which makes it somewhat more cumbersome to use. It also lacks the advanced string-handling capabilities that are so powerful in Perl. As efforts are underway to create larger-scale NT sites, the limitations of Graphical User Interfaces quickly become evident to administrators; scripting is essential for managing hundreds or thousands of machines.

It is not insignificant that many of the experienced administrators being called on to manage those sites cut their teeth on UNIX. Using Perl is a good way to bring the best of UNIX with you to other platforms.

Nor can you underestimate the drawing power of the web. As thousands of Perl-based CGI programs and site management tools are now available, Perl-support is essential for any web server platform. As NT-based web servers from Microsoft, O'Reilly and Netscape become a more important part of the web, Perl support is essential. In particular, ActiveState's PerlScript(tm) implementation allows Perl to be used as an active scripting engine on NT web servers such as Microsoft's IIS and O'Reilly's WebSite that support the Active Server Pages (ASP) technology.

In addition to the core Perl language interpreter, the ActiveState Perl for Win32(tm) port includes modules specifically targetted to the Win32 environment. For example, it provides full access to Automation objects. As more and more system resources and components support that interface under Windows, more aspects of the operating system are directly accessible by Perl for Win32.

Extending the Power of Perl

Unlike languages such as Microsoft's Visual Basic or Sun's Java, Perl does not have a large corporation behind it. Perl was originally developed by Larry Wall and made available as freeware. Larry is assisted in the further development of Perl by a group of about 200 regular contributors who collaborate via a mailing list called perl5-porters. The list was originally focussed on porting Perl to additional platforms, but gradually became the center for those adding to the core language.

In addition, Perl 5 includes an extension mechanism, by which independent modules can be dynamically loaded into a Perl program. This has led to the development of hundreds of add-in modules. Many of the most important modules have become part of the standard Perl distribution; additional modules are available via the Comprehensive Perl Archive Network (CPAN). The best entry point to the CPAN is probably the www.perl.com site, which also includes book reviews, articles, and other information of interest to Perl programmers and users.

While there has been a historical bias against using freeware for mission critical applications, this bias is crumbling rapidly, as it becomes widely recognized that many of the most significant computing advances of the past few decades have been developed by the freeware community. The Internet itself was largely developed as a collaborative freeware project, and its further development is still guided by a self-organizing group of visionary developers. Similarly, the leading web server platform in terms of market share, by a large margin, is Apache--again, a free software project created, extended and managed by a large collaborative developer community.

In addition to ongoing development, the Perl community provides active support via newsgroups and mailing lists. There are also numerous consultancies and paid support organizations. Excellent documentation is provided by numerous books, including most notably Programming Perl, by Larry Wall, Randal Schwarz and Tom Christiansen. The Perl Journal and www.perl.com provide information about the latest developments.

In short, because of the large developer base and the cooperative history of the freeware community, Perl has access to development and support resources matching those available to the largest corporations.

Application Stories

The following section includes a selection of user application stories, ranging from the quick and dirty "Perl saves the day" applications familiar to so many system administrators, to larger custom applications. Some of these application stories are taken from presentations at the first annual Perl Conference, held in San Jose, CA from August 19-21, 1997. The application descriptions from the conference proceedings are labeled with the names of their authors.

Case 1 - The Programming Language that Saved Netscape Technical Support
Dav Amann (dove@netscape.com)

Ok, so here's the situation. Your brand new exciting Internet company has taken off and you're selling more browsers, servers, and web applications than you ever hoped for, your company is growing by leaps and bounds, and the latest market information says that your customer base has just past the 30 million mark in less than a year.

And the only downside is that these 30 million folks might have a few problems with their browser; they might not know exactly what the Internet is; they might want to call someone for support. They might want to call *you* for technical support.

So, when this happens, you might think, "That's ok I'll just put some technical articles out on the web." But when you first look at the project, you realize that you're going to need some sort of Content Management System, some sort of Distribution system, some logging analysis, and gathering and reporting of feedback of your customers on your site. And you're going to want it yesterday.

Lucky for you, you know Perl. And with Perl you're able to get all of this built in 3 months in the spare time of 4 very busy technical support engineers.

Case 2 - A Quick and Dirty Conversion at BYTE

BYTE Magazine used to maintain its own information network and conferencing system, BIX, that both editors and readers used for exchanging ideas. The conferencing model was quite different from Usenet, somewhat closer to a mail-list. Since several of the BYTE editors were regular Usenet subscribers and preferred that model, BYTE built a gateway that translated and maintained the BIX editorial discussion groups as a private Usenet news group. The language was Perl. It took little more than a hundred lines of code and a few days of work.

Case 3 - Routing customer inquiries to appropriate experts

The performance testing group at one of the world's leading computer companies needed to automate query routing. They were directed to use their world-wide corporate Intranet, but not given any budget to do the project. Two engineers with only a few weeks of Perl experience created a solution. The Perl scripts responded to the query by matching key elements of queries with people with that expertise. The CGI programs not only pointed the client to the experts' Web-pages and E-mail addresses, but also passed the query on to all appropriate experts in their E-mail. The solution took no more than a few man-weeks and so could be asorbed into other budgets.

Case 4 - Collection and analysis of email survey data

An Internet market research firm that does its research using an E-mail survey wants to automate and generalize the handling of the anticipated ten thousand responses. Perl was used to automate the process. The Perl script generated input for SPSS, but would have been capable of doing statistical analysis if the statistician had known Perl.

Case 5 - A Cross-Platform Harness for Running Benchmarks

SPEC (the Standard Performance Evaluation Corporation), a industry consortium for benchmarking computer systems, radically changed the governing program when the SPEC92 benchmarks evolved to SPEC95. SPEC wanted to make it possible for their benchmarks to run other operating systems than UNIX without a major effort. The SPEC92 benchmarks were managed by UNIX shell scripts, unportable and inflexible. The SPEC95 benchmarks are managed by a portable, extensible engine written in Perl. The scripts take advantage of Perl's object oriented capabilities, Perl's extensibility with C, and Perl's dynamic module loading. Porting SPEC95 to Windows/NT was simple. The major problem with porting to VMS is its lack of user level forks.

Case 6 - Consultant working with Perl

Despite the years that I have spent developing in C, I have found little reason to continue to do so. Most of my work in the last ten years has been developing code that retrieves, manages, and converts information, not just data. The application programs I am involved in are merely graphical controls front-ending information retrieval, management, and conversion engines. Perl now fills the need for this kind of development better than any other language--scripting or system programming language. Even though I started using Perl merely as a glue scripting language and prototyping language, I now use it for everything. It has replaced both C and my UNIX shell programs. There will be times, I am sure, that I will have to write, or at least patch, a program in C. I expect that Java will eventually fill those requirements for me.

Cross-platform GUI interfaces are now done in HTML and run locally, in an Intranet, or as part of the Web.

Perl provides me with fast indexing to simple data structures and modules for talking to commercial databases. It provides me with system level tools for process management, file management, and interprocess communications wherever sockets are understood. It allows me to design my applications using libraries, modules, packages, and subroutines. It allows me to write applications that modify themselves; scary as that may seem, it is sometimes necessary.

The greatest benefit of Perl to me is that I can build solutions to complex problems in a fifth the time. This appeals to managers and clients, but particularly to the people paying the bills.

Case 7 - Perl as a Rapid-Prototyping Language for Flight Data Analysis
Phil Brown, Mitre Corporation Center for Advanced Aviation System Development (CAASD) (philsie@crete.mitre.org)

Because of its robustness and flexibility, Perl has become the language of choice by many programmers in CAASD for developing rapid-prototypes of concepts being explored. The Traffic Flow Management Lab (T-Lab) has implemented hundred of Perl programs that range from simple data parsing and generating plots, to measuring the complexity of regions of airspace and calculating the transit times of aircraft over these regions. The size of these applications range from about 10 lines to over 1200. Because many of the applications are very I/O intensive, Perl became the natural choice with its many parsing and searching features.

Case 8 - Online Specialty Printing
Dave Hodson (dave@iprint.com)

The iPrint Discount Printing & CyberStationery Shop (http://www.iPrint.com) is powered by a WYSIWYG, desktop publishing application on the Internet directly connected into a backend printer and sits on top of a sophisticated, real-time, multi-attributed product and pricing database technology. Customers come to our site to create, proof, and order customized popularly printed items--business cards, stationery, labels, stamps, specialty advertising items, etc online.

The iPrint system includes both a front-end (the website) and a back-end process that eliminates nearly all of the manual pre-flight process that printers perform and also provides all pertinent information to iPrint's accounting system. 95% of the approximately 80,000 lines of code used to perform this work is done using Perl v 5.003 with WinNT 4.0 OS. iPrint relies heavily on RDBMS (SQL Server) with all database interaction being performed by Perl and ODBC. iPrint uses many modules from the CPAN archives, including MIME and Win32::ODBC.

Case 9 - The Amazon.com Editorial Production System
Chris Mealy (mookie@amazon.com)

Amazon.com used Perl to develop a CGI-based editorial production system that integrates authoring (with Microsoft Word or Emacs), maintenance (version control with CVS and searching with glimpse), and output (with in-house SGML tools).

Writers use the CGI application to start an SGML document. They fill out a short form and then it generates a partially completed SGML document in the user's home directory, which may be mounted on their Microsoft Windows PC. The writer then uses their favorite editor to finish the document. With the CGI application, users see changes ('cvs diff') and their SGML rendered as HTML before submitting their document ('cvs commit'). Writers can do keyword searches of the SGML repository (by way of glimpse) and track changes ('cvs log'). Editors can also schedule content with the CGI application.

Amazon.com created a base SGML renderer class that is sub-classed to render different sections of the web site in different modes (html with graphics and html without graphics, and in the future, PointCast, XML, braille, etc).

All of the code is in Perl. It uses the CGI and HTML::Parser modules.

Case 10 - Specialty Print Servers at a New England Hospital

A major New England hospital uses twelve operating systems, from mainframes to desktop PCs. It has seven different network protocols. There are roughly twenty thousand PC workstations and two thousand printers of one type and one thousand speciality printers. The network is spread over an entire city using microwave, T1, T3, and private optical fiber. The problem is network printing. Specialty printers are required because the patient registration and billing system runs on IBM and Digital mainframes, the output going through their proprietary networks. The goal is to have all of the operating systems able to print to a standard printer through a standard protocol.

A search for appropriate scalable printer servers uncovered the MIT Project Athena's Palladium as a good starting point. However, its model of standalone print servers didn't fit. The hospital needed a distributed server model. When a two month effort to port Palladium to the hospital platform so that we could make the changes proved that it was not going to be economical, we decided to build exactly what we wanted in fast prototyping languages: Perl for the core application and Tcl/Tk for the GUI administrative interface. Palladium represents 30,000 lines of C. The more complex distributed server model required only 5,000 lines of Perl and only four man-months to achieve a first release. The Perl proved sufficiently fast on a 60MHz Pentium running a UNIX variant that no code required rewriting in C.

Case 11 - The Purdue University Network-Computing Hub
(Nirav H. Kapadia, Mark S. Lundstrom, Jose' A. B. Fortes)

In the future, computing may operate on a network-based and service-oriented model much like today's electricity and telecommunications infrastructures. This vision requires an underlying infrastructure capable of accessing and using network-accessible software and hardware resources as and when required. To address this need, we have developed a network-based virtual laboratory ("The Hub") that allows users to access and run existing software tools via standard world-wide web (WWW) browsers such as Netscape.

The Hub, a WWW-accessible collection of simulation tools and related information, is a highly modular software system that consists of approximately 12,000 lines of Perl5 code. It has been designed to: a) have a universally-accessible user-interface (via WWW browsers), b) provide access-control (security and privacy) and job-control (run, abort, and program status functions), and c) support logical (virtual) resource-organization and management. The Hub allows users to: a) upload and manipulate input-files, b) run programs, and c) view and download output - all via standard WWW browsers. The infrastructure is a distributed entity that consists of a set of specialized servers (written in Perl5) which access and control local and remote hardware and software resources. Hardware resources include arbitrary platforms, and software resources include any program (the current implementation does not support interactive and GUI-based programs).

The Hub allows tools to be organized and cross-referenced according to their domain. Resources can be added incrementally using a resource-description language specifically designed to facilitate the specification of tool and machine characteristics. For example, a new machine can be incorporated into the Hub simply by specifying its architecture (make, model, operating system, etc.) and starting a server on the machine. Similarly, a new tool can be added by "telling" the Hub the tool's location, its input behavior (e.g., command-line arguments), what kinds of machines it can run on (e.g., Sparc5), and how it fits into the logical organization of the Hub (e.g.,circuit simulation tool). Each of these tasks is typically accomplished in less than thirty minutes.

To facilitate this functionality, the Hub interprets the URLs differently from the standard document-oriented web servers. The structure of the URL is decoupled from that of the underlying filesystem and interpreted in a context-sensitive manner (based on user-specific state stored by the server), thus allowing virtual accounting and arbitrary access-control. The lab-engine provides the Hub with its on-demand high-performance computing capabilities. When a user requests the execution of a program, the lab-engine uses information in the user-specified input file to predict (via an artificial intelligence sub-system - also written in Perl5) the resources required for the run, selects an appropriate platform (e.g., workstation for a 2-D problem, supercomputer for a 3-D problem), transfers relevant input files to the selected machine, and initiates the program (via the remote server). When the run is completed, the remote server notifies the lab-engine, which retrieves the output files and informs the user.

The initial prototype, the Semiconductor Simulation Hub, currently contains thirteen semiconductor technology tools from four universities. In less than one year, over 250 users have performed more than 13,000 simulations. New Hubs for VLSI design, computer architectures, and parallel programming have been added in recent months; they currently contain a modest complement of fourteen tools. These Hubs are currently being used in several undergraduate and graduate courses at Purdue as well as to facilitate collaborative research. Regular user include students at Purdue University and researchers at several locations in the U.S. and Europe.





Sponsored by: