AWKA-ELMREF(5)         AWKA API REFERENCE        AWKA-ELMREF(5)





NAME
       awka-elmref  -  Awka API Reference for use with Awka-ELM
       libraries.


DESCRIPTION
       Awka is a translator of AWK programs to ANSI-C code, and
       a  library  (libawka.a) against which the code is linked
       to create executables.  Awka is described  in  the  awka
       manpage.

       The  Extended  Library  Methods  (ELM)  provide a way of
       adding new functions to the AWK language, so  that  they
       appear  in  your  AWK code as if they were builtin func-
       tions such as substr() or index().  The awka-elm manpage
       contains an introduction to Awka-ELM.

       This  page  lists the available data structures, defini-
       tions, functions and macros provided by  libawka.h  that
       you may use in creating C libraries that link with awka-
       generated code.

       I have broken the page into the following main sections:
       BASIC  VARIABLE  METHODS,  ARRAY  METHODS, BUILTIN FUNC-
       TIONS, I/O METHODS,  REGULAR  EXPRESSION  METHODS.   So,
       without further ado...


BASIC VARIABLE METHODS
       Data Structures


          a_VAR

            typedef struct {
              double dval;          /* the variable's numeric value */
              char * ptr;           /* pointer to string, array or RE structure */
              unsigned int slen;    /* length of string ptr as per strlen */
              unsigned int allc;    /* space mallocated for string ptr */
              char type;            /* records current cast of variable */
              char type2;           /* double-typed variable flag, explained later. */
              char temp;            /* TRUE if a temporary variable */
            } a_VAR;

            The a_VAR structure is used to store everything related to AWK
            variables.  This includes those named & used in your program, and
            transient variables created to return values from functions and other
            operations like string concatenation.  As such, this structure is
            ubiquitous throughout libawka and awka-generated code.

            The type value is set to one of a number of #define values,
            described in the Defines paragraph below.  Many functions and macros
            exist for working with the contents of a_VARs - see the Functions &
            Macros paragraph for details.



          a_VARARG

            typedef struct {
              a_VAR *var[256];
              int used;
            } a_VARARG;

            This structure is typically used to pass variable numbers of a_VARs to
            functions.  Up to 256 a_VARs may be referenced by an a_VARARG, and the
            used value contains the number of a_VARs present.



          struct gvar_struct

            struct gvar_struct {
              char *name;
              a_VAR *var;
            };

            Provides a mapping of the global variable names in an AWK script to pointers
            to their a_VAR structures.

       Internal Libawka Variables


          a_VAR * a_bivar[a_BIVARS]
            This array contains all the AWK internal variables, such as ARGV, ARGV,
            CONVFMT, ENVIRON and so on, along with $0 and the field variables $1..$n.
            a_BIVARS is a define, as are the identities of which element in the
            array belongs to which variable.  Again, look for functions that manage
            these variables rather than working with them directly if possible.



          extern struct gvar_struct *_gvar;
            This is actually created & populated by the translated C code generated
            by awka, rather than by libawka.a.  It is a NULL-terminated array
            of the gvar_struct structure defined earlier in this page, and contains
            the names of all global variables in an AWK script, mapped to their a_VAR
            structures.

       Defines


          a_VARNUL - the type value of an a_VAR if the variable is unused.

          a_VARDBL - the type value for an a_VAR cast to a number.

          a_VARSTR - type where the a_VAR has been cast to a string.

          a_VARARR - type where the a_VAR contains an array.

          a_VARREG - type where the a_VAR contains a regular expression.

          a_VARUNK - type where the a_VAR is a string, but could also be a
                     number.  Variables populated by getline, the FILENAME variable,
                     and elements of an array created by split(), are all of this
                     special type.


          a_DBLSET - for a string a_VAR that has been read in context as a number, the
                     type2 flag is set to this #define to prevent the string-to-number
                     conversion being done again.

          a_STRSET - the opposite of the above.  The variable is a number, has been read
                     as a string, hence the value of ptr is current, and the type2
                     flag is set to this #define.


          a_BIVARS provides the number of elements in the a_bivar[] array.

       a_ARGC,  a_ARGIND, a_ARGV, a_CONVFMT, a_ENVIRON, a_FILE-
       NAME, a_FNR, a_FS, a_NF,  a_NR,  a_OFMT,  a_OFS,  a_ORS,
       a_RLENGTH,   a_RS,  a_RSTART,  a_RT,  a_SUBSEP,  a_DOL0,
       a_DOLN, a_FIELDWIDTHS, a_SAVEWIDTHS, a_SORTTYPE  provide
       indexes to which elements in the a_bivar[] array are for
       which AWK internal variable.

       Functions & Macros


          awka_getd(a_VAR *)
            This macro calls the awka_getdval() function, appending the calling file & line number
            for debug purposes.  It read-casts the variable to a number, and returns the double
            value of the variable.  By read-cast, we mean that if the variable is a string it
            remains so, but dval is set, and type2 is set to a_DBLSET.  But if the a_VAR
            is a regular expression, the re structure is dropped and the variable converted to a
            number.  If you're not sure whether an a_VAR you're about to read is a number, and you
            want to read it as one, simply call awka_getd(varname) - its the easiest way.



          awka_getd1(a_VAR *)
            Same as awka_getd, except this will be faster if the a_VAR * is a variable.  Do not
            use this if the a_VAR * is a function call return value, as it'll call the function
            several times!  In this case, use awka_getd() instead.



          awka_gets(a_VAR *)
            Similar to awka_getd(), this read-casts an a_VAR to a string, and returns the character
            array pointed to by ptr.



          awka_gets1(a_VAR *)
            Use this where the a_VAR * is a variable, not a function call that returns an a_VAR *,
            for faster performance.



          awka_getre(a_VAR *)
            Write-casts the a_VAR * to a regular expression, and returns the pointer to the awka_regexp
            structure.  Write-cast means that the existing contents of the variable are dropped in
            favour of the new contents.



          static char *awka_strcpy(a_VAR *var, char *str)
            This function sets var to string type, and copies to it the contents of str.
            It returns a pointer to var->ptr.



          a_VAR *awka_varcpy(a_VAR *va, a_VAR *vb)
            This function copies the contents of scalar a_VAR *vb to scalar a_VAR *va,
            and returns a pointer to va.



          double awka_varcmp(a_VAR *va, a_VAR *vb)
            This function compares the contents of the two scalar variables, and returns 0 if the
            variables are equal, -1 if va is less than vb, or 1 if va is greater.  Numerical
            comparison is used where possible, otherwise string.



          a_VAR *awka_vardup(a_VAR *va)
            This function creates a new a_VAR *, copies the contents of va, and returns a pointer
            to the new structure.



          awka_varinit(a_VAR *)
            A macro that takes a NULL a_VAR *, mallocs space for it, and initialises it to a_VARNUL.



          void awka_killvar(a_VAR *)
            Frees all memory used by the a_VAR, except the structure itself.



          static a_VAR * awka_argv()
            You can use a_bivar[a_ARGV] directly when reading the value of elements in the array,
            but when you want to write to the array, use the above function instead, as it will
            make sure the changes are recognised elsewhere in libawka.



          static a_VAR * awka_argc()
            You can use a_bivar[a_ARGC] directly when reading its value, but when you want to write
            to it, use the above function instead, as it will make sure the change is recognised
            elsewhere in libawka.


ARRAY METHODS
       Data Structures & Variables
       These are strictly internal to the array  module  within
       libawka.  If you need functionality other than that pro-
       vided by the array functions, I recommend creating  your
       own  custom  array  data  structures and interface func-
       tions, otherwise you could cause serious problems.   The
       structure  definitions are too lengthy to list here, and
       the foolhardy may find them in  lib/array.h  within  the
       awka distribution.

       Defines

          a_ARR_TYPE_NULL
            The 'type' of an array that has not been initialised, or has been deleted.

          a_ARR_TYPE_SPLIT
            The 'type' of an array populated by the split() builtin function.

          a_ARR_TYPE_HSH
            The 'type' of arrays populated within the AWK script, eg. arr["pigs"] = cows.


          a_ARR_CREATE
            When searching arrays, specifies that an element is to be created if it doesn't
            already exist in the array.

          a_ARR_QUERY
            When searching arrays, this will not create a new element if it doesn't already
            exist.

          a_ARR_DELETE
            In an array search, this flag will cause the element to be deleted from the array.

       Functions


          void awka_arraycreate( a_VAR *var, char type );
            Allocates an array structure of type type, makes var->ptr point
            to it, and sets var->type to a_VARARR.  The type argument may
            be one of a_ARR_TYPE_NULL, a_ARR_TYPE_SPLIT or a_ARR_TYPE_HSH, according to
            how the array will be populated.



          void awka_arrayclear( a_VAR *var );
            Assumes var is an a_VARARR, this deletes the contents of the array structure
            pointed to by var->ptr.



          a_VAR * awka_arraysearch1( a_VAR *v, a_VAR *element, char create, int set );
            Searches array variable v for index element.  If it does not exist, and
            create is a_ARR_CREATE, a new element in the array for this value will be added.
            If the element is found (or created) and create is not a_ARR_DELETE, the
            function will return a pointer to the a_VAR for that element.  For a_ARR_DELETE, the
            element will be deleted from the array.  The set value should be FALSE.



          a_VAR * awka_arraysearch( a_VAR *v, a_VARARG *va, char create );
            Searches array variable v as per awka_arraysearch1(), except that this works
            with multiple index subscripts (eg, arr[x, y]).



          double awka_arraysplitstr( char *str, a_VAR *v, a_VAR *fs, int max );
            The AWK builtin split() function.  It splits str into array variable v,
            based on fs, up to max number of fields.  If fs is NULL, then
            a_bivar[a_FS] will be used.  Otherwise fs may contain an empty string, a
            single-character string, or a regular expression.  The number of fields created
            in v is returned.



          int awka_arrayloop( a_ListHdr *ah, a_VAR *v );
            This function implements the "for (i in j)" feature in AWK.  You provide ah,
            making sure it is initialised to zeroes.

            The best way to understand how to call this function is to type:

              awka 'BEGIN { for (i in j) x = j[i]; }'

            and see what is generated as a result.  You don't have to understand the a_ListHdr
            structure or sub-structures to use this function.



          int awka_arraynext( a_VAR *v, a_ListHdr *ah, int pos );
            Given that ah has been populated by a call to awka_arrayloop(), this function
            copies the (string or integer) element at position pos in the list to v,
            then returns pos+1, or zero if there are no more elements in the array list.



          void awka_alistfree( a_ListHdr *ah );
            Frees the last list element in ah.



          void awka_alistfreeall( a_ListHdr *ah );
            Frees all memory held by ah, and sets its contents to zero/NULL.



          a_VAR * awka_dol0(int set);
            The best means of accessing the $0 a_VAR, as it updates its contents with any pending
            changes.  Make set zero if you're reading the value of $0, but if you want to
            set $0, make it 1.



          a_VAR * awka_doln(int fld, int set);
            This function returns the a_VAR * of the $1..$n variable identified by fld,
            updating the field array with any refreshed $0 contents first if necessary.  If you
            want to read the value of $fld, make set zero, otherwise it should be 1.


BUILTIN FUNCTIONS
       These  are  documented  in  lib/builtin.h  in  the  awka
       distribution.  You can call any of the builtin functions
       as  often  as  you like.  Those that return a_VAR's also
       provide a keep flag that, if TRUE, will return  a  vari-
       able  that you must free, otherwise they will use a tem-
       porary variable that you don't have to worry about free-
       ing,  but will be reused elsewhere sooner or later.  The
       functions should be pretty much as  you'd  expect  them,
       except  that  many  require an a_VARARG as input, and we
       haven't discussed how to create one - we will now.

          a_VARARG * awka_arg0(char);

          a_VARARG * awka_arg1(char, a_VAR *);

          a_VARARG * awka_arg2(char, a_VAR *, a_VAR *);

          a_VARARG * awka_arg3(char, a_VAR *, a_VAR *, a_VAR *);


          a_VARARG * awka_vararg(char, a_VAR *var, ...);
            These functions populate & return a pointer to an a_VARARG structure.  The char
            argument, if TRUE, will make you responsible for freeing the structure, otherwise
            it'll be a temporary one that libawka will manage.  awka_arg0() will return an
            empty structure (ie. no args), awka_arg1() will have one a_VAR * in it, and so
            on.  Where you want to put more than four a_VAR *'s inside an a_VARARG, you can
            call awka_vararg with as many as you like, or if there's seriously a lot, maybe
            write your own loop of code to populate an a_VARARG - its not rocket science.


I/O METHODS
       Data Structures & Variables


          _a_IOSTREAM

            typedef struct {
              char *name;       /* name of output file or device */
              FILE *fp;         /* file pointer */
              char *buf;        /* input buffer */
              char *current;    /* where up to in buffer */
              char *end;        /* end of data in buffer */
              int alloc;        /* size of input buffer */
              char io;          /* input or output stream flag */
              char pipe;        /* true/false */
              char interactive; /* whether from a /dev/xxx stream or not */
            } _a_IOSTREAM;

            extern _a_IOSTREAM *_a_iostream;
            extern int _a_ioallc, _a_ioused;

            Controls input and output streams used by AWK's getline, print and printf
            builtin functions.  The two int variables record the space allocated in the
            _a_iostream array, and the number of elements used, respectively.  I list this
            information here in case you wish to create fread, fwrite and fseek functions for
            awka, as these will need low-level access to the streams.

       Functions


          a_VAR * awka_getline(char keep, a_VAR *target, char *input, int pipe, char main);
            As previously described, keep controls whether you want to be responsible for
            freeing the a_VAR the function returns or not.  Moving on, target is the a_VAR
            to hold the line of data to be read (you provide this one).  input is the name
            of the input file or command.  pipe is TRUE if input is a command rather
            than a file, eg. "sort stuff | getline x".  main should always be false.

            If input is NULL, getline will try to read from the file identified by
            a_bivar[a_FILENAME], or from the next element in the a_bivar[a_ARGV] array.

       I won't go into detail  about  awka_fflush,  awka_close,
       awka_printf  &  so on, as these should be easy enough to
       understand and use, and the chances are you  should  use
       the native C variety anyway where possible.


REGULAR EXPRESSIONS
       Ah,  now  we're in murky water indeed, as awka inherited
       its RE library from Libc, and treats it like  a  magical
       black  box  that  does  its  bidding.   Want  my advice?
       Treating the RE library & structure like a black box  is
       a wise thing to do, as its ugly-looking stuff.

       Ok, we know that when an a_VAR has been set to a_VARREG,
       its ptr value will point to  an  awka_regexp  structure.
       Do  we  need  to know what's in this structure?  I don't
       think so.  What we do need are the functions  that  help
       us  compile and execute regular expressions.  Oops, get-
       ting ahead of myself.  RE's are like  C  programs,  they
       need  to  be  compiled before they can be used to search
       strings.  This basically is a parsing of the RE  pattern
       into  a  tree structure that is easier to navigate while
       searching, and is a one-off task.


          awka_getre(a_VAR *)
            This macro is the easiest method of creating & compiling a regexp.  Providing you've
            set the a_VAR to the string value of the re pattern, this macro call works a treat.



          a_VAR *awka_match(char keep, char fcall, a_VAR *va, a_VAR *rva);
            This function is the implementation of AWK's match() function, and is the most
            simple way of evaluating an RE against a string.  keep is as previously
            discussed, fcall should be set to TRUE if you want a_bivar[a_RSTART] and
            a_bivar[a_RLENGTH] to be set, otherwise FALSE, va contains the string, and
            rva contains the regular expression.  The numerical a_VAR returned is
            1 on success, zero on failure.

       I was going to describe the lower-level methods of  com-
       piling  and  matching against RE's, but when I looked at
       it, there seemed to be a lot of complexity for  no  real
       gain  in  functionality.   All you get is the ability to
       avoid using  a_VAR  structures  to  manage  the  regular
       expressions,  and  honestly  I don't see what you'd gain
       from this given how much more complexity you'd  have  to
       deal with.


NOTES
       I  haven't  described  all of the functions available in
       libawka.h, not by any means.  But I have tried to  avoid
       functions  that  are really only meant for internal use,
       or that are only needed by translated code and should be
       done  in  other  ways  by library code.  In the same way
       I've avoided describing structures that were intended to
       remain  privy to a module within libawka, and you really
       shouldn't need to tamper with them.

       Any questions at all, or suggestions for improving  this
       page, let me know via andrewsumner@yahoo.com.  Make sure
       you preface any message title with the word "awka" so  I
       know its not spam.


SEE ALSO
       awka(1), awka-elm(5).












Version 0.7.x              Aug 8 2000            AWKA-ELMREF(5)
