OR WAIT 15 SECS
Two of the latest concepts in computing are new twists on old ideas?which could be the eventual path for clinical trials data processing.
Many observers of the computer revolution have noted that old concepts and ideas are frequently reborn with a twist, and relabeled new concepts. For example, the Internet phenomenon is to many simply a return to the dumb terminal days when computer operators used terminals that were little more than television screens to access large computers located elsewhere. This month we look at two related and new examples of this rebirth phenomenonone that some say may be as important as the Internet itself, in the long run. In fact, these new trends may eventually have an impact on clinical trial data analysis. The new concepts are called grid computing and utility computing. To understand them, we must first review what has come before, specifically the notion of time-sharing.
Growth of computers
In a now famous paper written in 1965, Gordon Moore (co-founder of Intel) noted that the effective computer processing power had been growing exponentially and was likely to continue in that direction for some time. (An online copy of the original paper can be found at ftp://download.intel.com/research/silicon/moorespaper.pdf.) Remarkably, his observation of exponential growth from 1965 has remained true, with an overall doubling period now estimated at 24 months. This means that the top of the line workstation you purchase today with a 3.06-GHz Xeon Intel processor will be surpassed by a 6-GHz processor when your kids are going back to school in 2004. Over the years whenever theoretical barriers loomed that threatened this relentless march, computer scientists would develop new methodologies of chip development that would allow computer chips to contain more and more raw processing power.
Computers were very different back when Moore first wrote about them. In 1965, IBM had just launched the IBM 360, an enormous computer housed in rooms larger than a gymnasium. The gigantic IBM 360 computer had less processing power and memory than todays Palm Pilot.
Despite the relatively primitive nature of a computer in those days, a networking phenomenon arose to unleash its potential. In the 1950s and early 1960s, mainframes were expensive, difficult to maintain computers that could process only a single job at one time. The computer was a shared resource for many individuals. The most common method of sharing was to run batch jobs that occupied the entire processing power of the computer until they finished. An operator would be able to prioritize jobs in a queue, and kill jobs that threatened to crash the computer.
Many computer tasks could not be run in batch mode, because they required interactive user input. For a trivial example, consider the original adventure computer gameAdventure (for a full history and downloadable, PC-based copy go to www.rickadams.org/adventure/a_history.html).
In this game, the computer presents a description of a location inside a cave that requires one of a few possible responses. After a 10-second pause, the player types an answer, the computer outputs a response, the player answers, and so on. In fact, computer calculations occupy the minutest fraction of actual elapsed time in this and other software. A 12-hour session (not much for Adventure fanatics) of Adventure used a trivial amount of the computers processing power (measured as CPU time), but required sole access to a hugely expensive resource for half a day. In addition, the computer processor sat idle during the loading of data, printing of results, and other taskswhile people waiting to use the computer lined up.
The answer to this conundrum was to somehow share the computer so that many users could run programs during the time that the computer was waiting for a response or loading data. A concept known as time-sharing was developed to provide a continuous, smooth experience for multiple users on a single computer sharing the idle time in each others jobs. Time-sharing, and its related network structure, was revolutionary in many ways. They made it possible for multiple users to access a computer at a distanceeach user could sit at a terminal attached to the mainframe by the network and use the computer. Furthermore, users could work with the computer interactively by telephone, using a modem. The modem in those days used the speaker and earpiece of the telephone to pass audible signals back and forth between a remote computer and a local terminal.Time-sharing evolved from a concept in 1959 to reality in 1964 when it was launched at Dartmouth University on May 1. Within a few years, the Dartmouth time-sharing computer was being used nationwide at high schools offering computer science classes. MIT and Bell Labs soon followed with a mainframe time-sharing system called the Multiplexed Information and Computing Service (Multics). Companies soon began to sell time on time-sharing computers for commercial uses as well.
Computer folks often speak of CPU (central processor unit) time as cycles, referring to the cycle time of the processors clock. What time-sharing did was to use the wasted cycles of the mainframe computer by distributing it among other users. In effect, time-sharing provided a computing utility for those who couldnt afford to have their own computer. This is not unlike what the electric company does for those who cant afford to own their own power plant.
Those with a business orientation might be wondering how Dartmouth and others charged the various users for the computer. Actual elapsed time was no longer a relevant measure, so the time-sharing computer owner charged by the processor CPU time used. Each user received a monthly bill listing the dates and times of their CPU usage, along with a charge. This was the beginning of utility computing.
Three parallel trends in computing began to converge to shape the progress of the computer revolution. The first trend was that of the machines themselves. Time marched on from those original mainframes to minicomputers and eventually to the personal computer. The second trend in computing was networking. In parallel with the development of computers, networking of computers began to progress from terminal-computer connections to interconnections of computers within an institution to dedicated, direct, inter- and intracontinental computer connections. Eventually, the Internet was developed, first for military and academic uses (ARPANET, CSNET, BITNET), and later expanded for public use.
The final influential trend in computing was the evolution of software. Software was first designed for use on a single computer, typically a mainframe and later a personal computer. The software was initially used directly at the terminal, but in the case of mainframe and minicomputers it was eventually used at a distance through terminals. With the development of the personal computer and workstation computer came the concept of client server software. In this model of software, a central server program (typically with access to a data store) awaits requests from multiple client programs distributed across a network. The client programs initiate a transaction (say, data entry and data analysis) and the server program performs an action (say, storing or retrieving the data). Client server programs made use of the computing power of the networked workstation while providing for common, centralized access to data. Because data stores were often far beyond that found on workstation computers, client server programs used the strong points of the local and central computers.
A good example of a client-server program would be Microsoft Outlook in a large company. The overall email storage is typically performed by Microsoft Exchange, but the email is viewed, forwarded, or replied to by the client Outlook software on each individuals machine.
With the advent of the World Wide Web, the client server model is being replaced with the Web browser. The evolution of powerful, low-cost servers and the cost of supplying, updating, and maintaining client programs gradually led back to a centralized application/ centralized data store. Now, the Web browser acts as a common client for many different programs. More than a dumb terminal, the Web browser provides a standardized, sophisticated viewing and manipulation tool for working with centralized applications and databases. For example, a Web version of Microsoft Outlook provides most of the functions of the Outlook client, but requires no local software other than a Web browser. This characteristic enables universal accessemail can be checked in airports and business centers at a moments notice. Most notably, the Web browser requires minimal local application maintenance and provisioning costs.
The current buzz in computing is grid computing, which is in some ways a return to the days of olde. Predictably, the sophistication of computer analysis and software has increased substantially and offers the ability to attack very computationally intensive problems. Models for analyzing these problems are very sophisticated, but they require massive amounts of computer memory. As with many problems, the individual steps in the computations can be performed in isolation. Therefore, it is possible to distribute these computations among several processors on the same computer (so-called parallel computing). Better yet, the computations can be distributed among several or even many computers.
Grid computing is a concept involving the coordinated cooperation of many computers in a single effort or problem solution. The use of the term grid is a reference to the electrical grida complex maze of power supply and transmission lines through which power is distributed to users. Similarly, grid computing involves the centrally controlled distribution of a problem to a number of computers. In grid computing typically each computer works on a piece of the problem, using spare processing cycles when idle. Alternatively, the grid aspect can involve data storage on many distributed systems. In many circumstances, both distributed storage and computational processing are involved in a grid system. When a given section of the problem is complete, the distributed computer uploads the solution to the central grid controller and receives another piece of the problem to work on.
There are a number of different models for using grid computing, ranging from tightly managed to open. Tightly managed grids consist of computers on a corporate local area network. Proprietary information would never leave the LAN, and the project could be closely monitored. The most open grid would be one where computers from anywhere in the world might participate, with no particular affiliation. Examples include the Search for Extraterrestrial Intelligence (www.seti.org), the search for the next largest Mersenne Prime (www.mersenne.org/prime.htm), and the several anthrax and smallpox drug design projects (some of which have been made an option on the Google toolbar). Grid projects are also underway in which a consortium of companies participates on a particular project, sharing their spare cycles.
Grid computing shares a great deal, philosophically, with time-sharing. Time-sharing allowed the sharing of a single, expensive resource among many users by maximizing the usage of CPU cycles. Grid computing is emerging today because, for computationally complex or data intensive problems, network bandwith is relatively less expensive than processing capacity. An alternative to grid computing for these problems would be leased time on supercomputersan idea that has also reappeared. These concepts all share the common need to gain access to significant computer processing power on demand (a slogan that IBM has adopted in their recent advertisements).
Grid computing has significant appeal for a number of different reasons. First, grid computing makes it possible for a corporation or consortium to work on a data and computation intensive task that would typically take months to years to solve, and complete the solution in days to weeks. Secondly, the grid-computing concept allows companies to maximize the use of their rapidly aging assetstheir networked computersby allowing them to work on grid-based problems when they would otherwise be displaying a screensaver.
Although grid computing is still in its infancy, a number of pharmaceutical companies have begun to use this methodology in drug discovery. These include problems around analysis of protein folding, receptor-ligand interactions sequence alignment, gene identification, gene homology, computer-aided drug design, high-throughput screening, molecular modeling, gene and protein expression mapping in health and disease, and combinatorial chemistry.
Although grid computing emerged from academia, commercial enterprises have jumped into grid computing, providing technology and know-how to customers. Some have created new business model for this, around the concept of utility computing.
Utility computing involves the use of computer processing power and data storage on demand and, typically, pay as you go. Large companies like IBM, Sun, and HP have vast capabilities of computers, data storage, and networks that are frequently underused. Those companies can sell that capacity and the use of software programs to other companies through a utility model. Much as we pay our water or electric charges on a pay-per-use basis, utility computing would provide enterprise level computing infrastructure to anyone, at the exact moment they needed it and for the time that they needed it. For projects that require occasional, intensive bursts of computational or data storage capacity, the utility computing model can provide it for them.Utility computing is really a back to the future concept. Like grid computing, it is very similar in philosophy to the time-sharing mainframe, although more complex to manage. The utility computing infrastructure needs to be able to accept jobs, distribute the work among an array of computers, collect the results, calculate charges, and possibly distribute paymentsall with minimal human intervention.
Back to clinicalIs grid and utility computing relevant to clinical trials? Generally, the computational power required to analyze clinical trials may require short periods of intense computer use (analysis that can run all night on a server, for example). The effort required today to set up a grid-computing project is too large for this kind of computing. However, it is likely that setting up and managing grid projects will become easier and more direct. Software has been developed that can manage the distribution of a grid computing problem to many different computers. One can imagine a time when setting up a grid computation will be no more difficult than doing the same computation on a single computer. In addition, we are beginning to ask more sophisticated questions of clinical trials. As we consider adaptive trial designs, group sequential analysis, computerized trial simulations, and in silico drug testing through organ systems and whole body simulations, it is quite easy to see how the demands for processing power may begin to enter the grid arena. Pharmacogenomics is another area where linkage between SNPs and pharmacodynamic responses in populations may require massive, sustained efforts that can best be accomplished through grid computing.
Again, many of these uses will require long periods of standard computer usage, punctuated by intensive periods of computational complexityan ideal scenario for utility computing.
The vision of many in grid computing is that eventually all users will have access to a gridavailable whenever neededthat would provide a utility-based supercomputer capability for complex problem solving. So, next time you sit down at your PC, remember that it could become a part of a massive network of computing power, for sale to those who need it.