Go to the first, previous, next, last section, table of contents.

Implementation Notes

This appendix contains information mainly of interest to implementors and maintainers of gawk. Everything in it applies specifically to gawk, and not to other implementations.

Downward Compatibility and Debugging

See section Extensions in gawk Not in POSIX awk, for a summary of the GNU extensions to the awk language and program. All of these features can be turned off by invoking gawk with the `--traditional' option, or with the `--posix' option.

If gawk is compiled for debugging with `-DDEBUG', then there is one more option available on the command line:

-W parsedebug
Print out the parse stack information as the program is being parsed.

This option is intended only for serious gawk developers, and not for the casual user. It probably has not even been compiled into your version of gawk, since it slows down execution.

Making Additions to gawk

If you should find that you wish to enhance gawk in a significant fashion, you are perfectly free to do so. That is the point of having free software; the source code is available, and you are free to change it as you wish (see section GNU GENERAL PUBLIC LICENSE).

This section discusses the ways you might wish to change gawk, and any considerations you should bear in mind.

Adding New Features

You are free to add any new features you like to gawk. However, if you want your changes to be incorporated into the gawk distribution, there are several steps that you need to take in order to make it possible for me to include to your changes.

  1. Get the latest version. It is much easier for me to integrate changes if they are relative to the most recent distributed version of gawk. If your version of gawk is very old, I may not be able to integrate them at all. See section Getting the gawk Distribution, for information on getting the latest version of gawk.
  2. Follow the GNU Coding Standards. This document describes how GNU software should be written. If you haven't read it, please do so, preferably before starting to modify gawk. (The GNU Coding Standards are available as part of the Autoconf distribution, from the FSF.)
  3. Use the gawk coding style. The C code for gawk follows the instructions in the GNU Coding Standards, with minor exceptions. The code is formatted using the traditional "K&R" style, particularly as regards the placement of braces and the use of tabs. In brief, the coding rules for gawk are: If I have to reformat your code to follow the coding style used in gawk, I may not bother.
  4. Be prepared to sign the appropriate paperwork. In order for the FSF to distribute your changes, you must either place those changes in the public domain, and submit a signed statement to that effect, or assign the copyright in your changes to the FSF. Both of these actions are easy to do, and many people have done so already. If you have questions, please contact me (see section Reporting Problems and Bugs), or gnu@prep.ai.mit.edu.
  5. Update the documentation. Along with your new code, please supply new sections and or chapters for this book. If at all possible, please use real Texinfo, instead of just supplying unformatted ASCII text (although even that is better than no documentation at all). Conventions to be followed in Effective AWK Programming are provided after the `@bye' at the end of the Texinfo source file. If possible, please update the man page as well. You will also have to sign paperwork for your documentation changes.
  6. Submit changes as context diffs or unified diffs. Use `diff -c -r -N' or `diff -u -r -N' to compare the original gawk source tree with your version. (I find context diffs to be more readable, but unified diffs are more compact.) I recommend using the GNU version of diff. Send the output produced by either run of diff to me when you submit your changes. See section Reporting Problems and Bugs, for the electronic mail information. Using this format makes it easy for me to apply your changes to the master version of the gawk source code (using patch). If I have to apply the changes manually, using a text editor, I may not do so, particularly if there are lots of changes.

Although this sounds like a lot of work, please remember that while you may write the new code, I have to maintain it and support it, and if it isn't possible for me to do that with a minimum of extra work, then I probably will not.

Porting gawk to a New Operating System

If you wish to port gawk to a new operating system, there are several steps to follow.

  1. Follow the guidelines in section Adding New Features, concerning coding style, submission of diffs, and so on.
  2. When doing a port, bear in mind that your code must co-exist peacefully with the rest of gawk, and the other ports. Avoid gratuitous changes to the system-independent parts of the code. If at all possible, avoid sprinkling `#ifdef's just for your port throughout the code. If the changes needed for a particular system affect too much of the code, I probably will not accept them. In such a case, you will, of course, be able to distribute your changes on your own, as long as you comply with the GPL (see section GNU GENERAL PUBLIC LICENSE).
  3. A number of the files that come with gawk are maintained by other people at the Free Software Foundation. Thus, you should not change them unless it is for a very good reason. I.e. changes are not out of the question, but changes to these files will be scrutinized extra carefully. The files are `alloca.c', `getopt.h', `getopt.c', `getopt1.c', `regex.h', `regex.c', `dfa.h', `dfa.c', `install-sh', and `mkinstalldirs'.
  4. Be willing to continue to maintain the port. Non-Unix operating systems are supported by volunteers who maintain the code needed to compile and run gawk on their systems. If no-one volunteers to maintain a port, that port becomes unsupported, and it may be necessary to remove it from the distribution.
  5. Supply an appropriate `gawkmisc.???' file. Each port has its own `gawkmisc.???' that implements certain operating system specific functions. This is cleaner than a plethora of `#ifdef's scattered throughout the code. The `gawkmisc.c' in the main source directory includes the appropriate `gawkmisc.???' file from each subdirectory. Be sure to update it as well. Each port's `gawkmisc.???' file has a suffix reminiscent of the machine or operating system for the port. For example, `pc/gawkmisc.pc' and `vms/gawkmisc.vms'. The use of separate suffixes, instead of plain `gawkmisc.c', makes it possible to move files from a port's subdirectory into the main subdirectory, without accidentally destroying the real `gawkmisc.c' file. (Currently, this is only an issue for the MS-DOS and OS/2 ports.)
  6. Supply a `Makefile' and any other C source and header files that are necessary for your operating system. All your code should be in a separate subdirectory, with a name that is the same as, or reminiscent of, either your operating system or the computer system. If possible, try to structure things so that it is not necessary to move files out of the subdirectory into the main source directory. If that is not possible, then be sure to avoid using names for your files that duplicate the names of files in the main source directory.
  7. Update the documentation. Please write a section (or sections) for this book describing the installation and compilation steps needed to install and/or compile gawk for your system.
  8. Be prepared to sign the appropriate paperwork. In order for the FSF to distribute your code, you must either place your code in the public domain, and submit a signed statement to that effect, or assign the copyright in your code to the FSF.

Following these steps will make it much easier to integrate your changes into gawk, and have them co-exist happily with the code for other operating systems that is already there.

In the code that you supply, and that you maintain, feel free to use a coding style and brace layout that suits your taste.

Probable Future Extensions

AWK is a language similar to PERL, only considerably more elegant.
Arnold Robbins

Larry Wall

This section briefly lists extensions and possible improvements that indicate the directions we are currently considering for gawk. The file `FUTURES' in the gawk distributions lists these extensions as well.

This is a list of probable future changes that will be usable by the awk language programmer.

The GNU project is starting to support multiple languages. It will at least be possible to make gawk print its warnings and error messages in languages other than English. It may be possible for awk programs to also use the multiple language facilities, separate from gawk itself.
It may be possible to map a GDBM/NDBM/SDBM file into an awk array.
The special files that provide process-related information (see section Special File Names in gawk) may be superseded by a PROCINFO array that would provide the same information, in an easier to access fashion.
More lint warnings
There are more things that could be checked for portability.
Control of subprocess environment
Changes made in gawk to the array ENVIRON may be propagated to subprocesses run by gawk.

This is a list of probable improvements that will make gawk perform better.

An Improved Version of dfa
The dfa pattern matcher from GNU grep has some problems. Either a new version or a fixed one will deal with some important regexp matching issues.
Use of GNU malloc
The GNU version of malloc could potentially speed up gawk, since it relies heavily on the use of dynamic memory allocation.
Use of the rx regexp library
The rx regular expression library could potentially speed up all regexp operations that require knowing the exact location of matches. This includes record termination, field and array splitting, and the sub, gsub, gensub and match functions.

Suggestions for Improvements

Here are some projects that would-be gawk hackers might like to take on. They vary in size from a few days to a few weeks of programming, depending on which one you choose and how fast a programmer you are. Please send any improvements you write to the maintainers at the GNU project. See section Adding New Features, for guidelines to follow when adding new features to gawk. See section Reporting Problems and Bugs, for information on contacting the maintainers.

  1. Compilation of awk programs: gawk uses a Bison (YACC-like) parser to convert the script given it into a syntax tree; the syntax tree is then executed by a simple recursive evaluator. This method incurs a lot of overhead, since the recursive evaluator performs many procedure calls to do even the simplest things. It should be possible for gawk to convert the script's parse tree into a C program which the user would then compile, using the normal C compiler and a special gawk library to provide all the needed functions (regexps, fields, associative arrays, type coercion, and so on). An easier possibility might be for an intermediate phase of awk to convert the parse tree into a linear byte code form like the one used in GNU Emacs Lisp. The recursive evaluator would then be replaced by a straight line byte code interpreter that would be intermediate in speed between running a compiled program and doing what gawk does now.
  2. The programs in the test suite could use documenting in this book.
  3. See the `FUTURES' file for more ideas. Contact us if you would seriously like to tackle any of the items listed there.

Go to the first, previous, next, last section, table of contents.