Do It Yourself Software

February 1, 2005

Applied Clinical Trials

Applied Clinical Trials, Applied Clinical Trials-02-01-2005,

Writing computer code isn?t as difficult as you might think.

Writing computer code isn't as difficult as you might think.


Warning! This column goes over the deep edge. If you aren't from the world of computers and information technology, it may even shock you. Why? Because I am going to suggest that you are probably already writing your own computer programs today, and should be routinely writing programs in the future.

Let's say that you were a "frequent buyer" on eBay, perhaps one with several hobbies. Each day after dinner you did a variety of complex searches and browsed through the results, carefully watching for the right item at the right price.

If it was easy to do, it would be great to write a little utility program that automatically performed the searches you wanted each day while you were at work, and saved the results for you. Each night you would simply review the results and save the hour it took to do each individual research. In this case, there already IS a little program that does this task very well (http://www.timeblaster.com/tbeindex.shtml). The point is that there are many such tasks that could be automated, saving time and effort on a daily basis. The problem is that it is so darn difficult to program something like this.

How did we get here?
It is worth reviewing some of the history of computer software to understand how programming came to be as complex as it is. Suffice it to say, this is going to be a rather limited and superficial treatment of a complex subject.

As most people know, the computer is an electronic tool that works with pure binary information--1s and 0s. The only real information in a computer system is a string of 1s and 0s. For example, <hi> is encoded in ASCII as:

11100011010000110100000111110

Text, numbers, and instructions to the computer are all "understood" by the computer in pure binary. A "machine language" understands segments of this binary stream as instructions to add the binary numbers (in groups of 8, 16, 32 or 64), compare one to another, or keep track of where they are in memory. Programming a computer is nothing more than providing the instructions to do this adding, comparing, and tracking. Over and over again, and very, very fast.

Obviously, human manipulation of the 1s and 0s can be overwhelming for even the tiniest of problems. So, the manipulations were changed from base 2 (binary) to base 16 (hexadecimal) where numbers, letters, and instructions could be more easily read and manipulated. Our <hi> becomes 3C 68 69 3E. As you can see, even this doesn't dramatically improve the readability of anything. Therefore, a language known as assembly was developed to convert the machine language into "readable" chunks. For example:

fld 32real [0001BEC8]
fchs
fadd 32real [0001E000]
fdiv 322 real [0001BEC0]
fstp 32 real [ebp-08]

Assembly language is far more readable than machine language, binary or hex, but it is still a long way from being useful for complex computing. However, in the early 1950s, machine and assembly language were the only ways to program a computer. In those early years, a fanciful project was started that was intended to be used for the "automatic programming" of computers. The FORTRAN group (standing for Formula Translator) developed a language that is still in (modest) use today for the programming of scientific problems. FORTRAN condensed large numbers of assembly language instructions into a single, much more readable statement. For example, our assembly language instructions in FORTRAN would read:

CELSIUS = (FAREN - 32 )/ 1.8,

part of a FORTRAN program for converting temperatures from Celsius to Fahrenheit. FORTRAN created a revolution in programming, allowing for dramatic advances in the development of computers. Computer languages developed for specialized purposes since that time; one particular development was oriented towards the creation of a language that was even more like natural language.

COBOL was developed towards the end of the 1950s as the Common Business Oriented Language. It was truly a language developed by committee. This committee was made up of government (civilian and military) and industry (the major computer manufacturers--IBM, Burroughs, Honeywell, RCA, Sperry Rand, and Sylvania). The intention of the language was for it to be easily readable, to essentially allow programming in English for business and financial uses. A COBOL instruction might read:

SUBTRACT 32 FROM FAREN AND
DIVIDE BY 1.8 GIVING CELSIUS

The language was hampered by its cumbersome nature, although a more symbolic expression of the language could be used. It is still in heavy use in the financial and business community, mostly in legacy applications. (The vast majority of the famous Y2K problem involved COBOL programs.) Although it intended to bring programming to the masses, COBOL never achieved its promise.

Structured programming continued to develop in the ensuing years, with languages such as LISP, Algol, Pascal, and C developing to meet specialized and general needs. Another high-level language similar to FORTRAN was Basic--developed as a simple, easy-to-learn language for college students who weren't programming oriented. Basic was the right language at the right time. It was included with the operating system on the first IBM PCs, and its use skyrocketed. For the most part, however, Basic required considerable education, skill, and commitment.

Computer languages have continued to develop. Two programming trends, object-oriented programming and the virtual machine, have led to the development of a pair of commonly used professional programming languages--Java and C#. Both of these are challenging languages for the novice or casual user. They are typically developed in a "visual" environment in which the programming of the user interface is often automated. But they're quite distant from the ease-of-use hallmark that had been programming holy grail since the earliest days.

From toys to software
The toy manufacturer Lego has been selling a robotics set called Lego Mindstorms (http://mindstorms.lego.com/) for over five years now. Using the kit, older children can construct a robot out of Lego blocks, small engines, and light and touch detectors that can perform pretty sophisticated tasks. The "brain" of this robot is a programmable Lego block (called an RCX control module) that can operate the motors and get feedback from the detectors. It is programmed by downloading a program from the computer. The programs are built by clicking together "blocks" on the computer screen. Each "block" does an action.

Can real programming be as easy as programming a Lego Mindstorms robot? Many believe it can, and there are some excellent examples of so-called Visual Programming that currently are in use. One of these is known as LabView. This very sophisticated software package allows lab scientists to control lab instruments and to acquire and analyze data from these instruments using a symbolic manipulation of objects, in a fashion reminiscent of the Mindstorm blocks. The package can also allow scientists to perform simulations of instruments and many other tasks.

There are scattered uses of visual programming in clinical trial technologies as well. Clinical simulation tools for clinical trials allow patients, measurements, drug dosing, and other objects to be snapped together graphically in a similar fashion to that of LabView, creating a program for simulation of clinical trial results. In addition, some workflow applications for the processing of clinical trial data allow the creation of complex workflows through "snap in" manipulation of icons, leading to actual transfer/approval workflow of documents.

In the case of all of these visual programming applications, the manipulation of graphic icons on the screen creates a significant amount of traditional code in an underlying application. Thus, the non-programming user is in fact doing highly sophisticated computer programming without knowing it.

You may already be a programmer
Many readers of this column already do regular programming in their daily lives. They use languages that are close to "scripting" languages--highly specialized for their specific tasks. Those who are involved with statistics are typically quite expert in the SAS programming language, and many in S-Plus or the closely related open source R programming environment. Others are more involved with databases and are very facile with programming in the natural language-like SQL (structured query language) programming. SQL queries are so straightforward and easy to learn that many forget that they represent a form of programming. So, that takes care of 10%--25% of readers. What about the rest?

Another form of simple programming is the macro, found in many programs from Microsoft Word to Adobe Photoshop. In its simplest implementation, a macro can watch what a user does and repeat it on demand, when selected from a menu or sequence of key strokes. For example, if you wanted to sequentially do "search and replace" for 10 different character sequences in Word, you might begin the recording of a macro and then perform the searches. When finished, you would stop the recording of the macro and name it. Then, to perform the searches in the future you would invoke the macro, sit back, and watch.

Behind the scenes, the recording of a macro creates a sophisticated set of Visual Basic for Applications (VBA) instructions. If you are a "power user" you can edit the VBA and improve the macro without redoing it. Unfortunately, many viruses use VBA in Word documents; thus, macros are disabled by default in most Word installations.

In Adobe Photoshop, macros are called "actions." An action allows a user to record a large series of manipulations to photographs that can then be repeated over and over in batch on a series of photographs. Macros and actions are examples of programming that can easily be performed by a nontechnical user.

Another form of macro is one that you may already be using. In Outlook, it is very easy to configure a filter for incoming messages that checks for certain conditions (sender, recipient, text in the subject or body) and takes an action such as forwarding or filing in a particular folder. Many people sort all their mail automatically in Outlook and review folders rather than the InBox. This macro also has VBA running underneath, but the software instructions behind the design and implementation are absolutely hidden from the end user. If you haven't created macros in Word or filters in Outlook, go ahead and give it a try. You may be amazed at the usefulness of these functions.

More and more people are becoming comfortable with the use of HTML, which is the very straightforward language by which simple Web pages are "programmed." Without adding scripted elements (e.g., Javascript, PHP, and others) most HTML has little to no logic built in, and an HTML Web page may not appear to meet the minimal requirements of computer programming. However, many who are using HTML for building Web pages are often using a Web-page building program (or online environment). That program can insert scripting, and even active, server-based elements into a Web page without the user writing a single line of code. By dragging and dropping an active element onto a WYSIWYG (what you see is what you get) Web page designer, the user is actually doing active computer programming.

You might protest that the act of computer programming involves the writing of computer code--not true. Even sophisticated Java and C# programming is usually done in a visual, WYSIWYG environment in which large blocks of code are inserted into a program automatically.

SQL is used behind the scenes in a variety of database settings. For example, consider a search of a database on a Web site (think Amazon or eBay) or use of an ATM machine. You are actually using buttons to create custom SQL queries that are run against a database to either display your results or take actions like withdrawing money. The original developers of SQL at IBM thought that they were creating a language that could be used by everyone. Unfortunately, the rigidity of syntax was too frustrating for nontechnical users. However, when this can be automated, SQL is a very powerful language for everyone, on a daily basis.

Scripting for fun and profit
While the mainstream of computer languages progressed from FORTRAN to Java, another very important side story developed. It may provide the bridge between computer programming and the casual programmer--that of scripting languages. We have already mentioned scripting languages. Javascript and PHP are used in the development of Web pages and VBA is a form of scripting language for Word and Excel macros.

Scripting languages are a hodgepodge of different languages that range from very simple to fairly sophisticated. Some scripting languages are used nowadays to write large computer programs: Python is one example. But they are generally intended to be used to connect one component (application or data source) with another. Most of the languages we have discussed so far are compiled languages--before being used, they are typically converted into an executable file (.exe) consisting of machine code which operates directly on the operating system and the memory. (Java and C# are a bit different, but that's another column.)

Scripting languages are interpreted languages. These programs exist in plain text form and are run through an interpreter program, which reads and executes each line of code at the time a program is run. As such, a program written in a scripting language may be slower and take more memory to run than a compiled program. But scripted programs are much easier to write and maintain, because many of the mundane tasks of programming are either done automatically or are not needed. The result is a program that can be written quickly and efficiently.

The most common use of scripting languages is to tie together diverse applications. For example, each hour start a program, take its output, put it in as input into another program, and save that output into a file. One very basic scripting language is the set of MS-DOS commands used in batch files, which are simple text files ending in .bat (anyfile.bat). Batch files used to be very important for PC users, automating a number of maintenance tasks now managed by Windows and utility programs. It is still perfectly useful to create a batch file to automate some repetitive task. For example, if your daily work required you to start three programs simultaneously, and then open two specific files, you could accomplish this easily with a three-line batch file that can be double-clicked. All three programs will open, along with the two selected files.

Programming by example
One form of computer "programming" is often taken for granted--the spreadsheet. The spreadsheet (typically, Microsoft Excel) is a highly specialized environment for programming a series of numerical calculations. While many people use Excel as a simple database, or for adding a column of numbers, most of us have had the opportunity to see very complex spreadsheets. If you haven't already done so, you might want to take a look at http://www.exinfm.com/free_spreadsheets.html, where you can find many useful, free spreadsheets. Interestingly, spreadsheets are an example of a programming philosophy known as "programming by example"--show the program what type of answer you want, and it figures out how to get it for you.

If you are really ambitious, try learning a fun and easy scripting language that is tailor made for the Internet. It is called Rebol (http://www.rebol.com) and the core system is available as a free download: http://www.rebol.com/view-platforms.html. Rebol is easy to learn, and with it you can quickly write very useful scripts that can read Web pages, extract specific content, bulk email, and much, much more. Using Rebol, it wouldn't be difficult to write the simple, but functional eBay searching program from the beginning of this column. It is a very different and easy place to start in understanding the power of scripting languages.

If you want to learn a great deal more about the birth and development of software, I recommend the book Go To by Steve Lohr. Many of the ideas and examples in this column came from this book.

Related Content:

eClinical