Table of Contents
awka-elmref - Awka API Reference for use with Awka-ELM libraries.
Awka is a translator of AWK programs to ANSI-C code, and a library
(libawka.a) against which the code is linked to create executables. Awka
is described in the awka manpage.
The Extended Library Methods (ELM) provide
a way of adding new functions to the AWK language, so that they appear
in your AWK code as if they were builtin functions such as substr() or
index(). The awka-elm manpage contains an introduction to Awka-ELM.
This page
lists the available data structures, definitions, functions and macros
provided by libawka.h that you may use in creating C libraries that link
with awka-generated code.
I have broken the page into the following main
sections: BASIC VARIABLE METHODS, ARRAY METHODS, BUILTIN FUNCTIONS, I/O
METHODS, REGULAR EXPRESSION METHODS. So, without further ado...
Data Structures
- a_VARtypedef struct { double dval; /* the variable’s numeric value
*/ char * ptr; /* pointer to string, array or RE structure */
unsigned int slen; /* length of string ptr as per strlen */ unsigned
int allc; /* space mallocated for string ptr */ char type;
/* records current cast of variable */ char type2; /* double-typed
variable flag, explained later. */ char temp; /* TRUE if a temporary
variable */} a_VAR;The a_VAR structure is used to store everything related
to AWKvariables. This includes those named & used in your program, and transient
variables created to return values from functions and otheroperations like
string concatenation. As such, this structure isubiquitous throughout libawka
and awka-generated code.The type value is set to one of a number of #define
values, described in the Defines paragraph below. Many functions and macros
exist for working with the contents of a_VARs - see the Functions & Macros
paragraph for details.
- a_VARARGtypedef struct { a_VAR *var[256]; int used;} a_VARARG;This structure
is typically used to pass variable numbers of a_VARs to functions. Up to
256 a_VARs may be referenced by an a_VARARG, and theused value contains
the number of a_VARs present.
- struct gvar_structstruct gvar_struct { char *name; a_VAR *var;};Provides
a mapping of the global variable names in an AWK script to pointersto their
a_VAR structures.
- Internal Libawka Variables
- a_VAR * a_bivar[a_BIVARS]This array contains all the AWK internal variables,
such as ARGV, ARGV, CONVFMT, ENVIRON and so on, along with $0 and the field
variables $1..$n. a_BIVARS is a define, as are the identities of which element
in the array belongs to which variable. Again, look for functions that
manage these variables rather than working with them directly if possible.
- extern struct gvar_struct *_gvar;This is actually created & populated by
the translated C code generated by awka, rather than by libawka.a. It is
a NULL-terminated array of the gvar_struct structure defined earlier in
this page, and contains the names of all global variables in an AWK script,
mapped to their a_VAR structures.
- Defines
a_VARNUL - the type value of an a_VAR if the variable is unused.
a_VARDBL - the type value for an a_VAR cast to a number.
a_VARSTR - type where the a_VAR has been cast to a string.
a_VARARR - type where the a_VAR contains an array.
a_VARREG - type where the a_VAR contains a regular expression.
a_VARUNK - type where the a_VAR is a string, but could also be a
number. Variables populated by getline, the FILENAME variable,
and elements of an array created by split(), are all of this special
type.
a_DBLSET - for a string a_VAR that has been read in context as a number,
the type2 flag is set to this #define to prevent the string-to-number
conversion being done again.
a_STRSET - the opposite of the above. The variable is a number, has been
read as a string, hence the value of ptr is current, and the type2
flag is set to this #define.
a_BIVARS provides the number of elements in the a_bivar[] array.
a_ARGC,
a_ARGIND, a_ARGV, a_CONVFMT, a_ENVIRON, a_FILENAME, a_FNR, a_FS, a_NF,
a_NR, a_OFMT, a_OFS, a_ORS, a_RLENGTH, a_RS, a_RSTART, a_RT, a_SUBSEP,
a_DOL0, a_DOLN, a_FIELDWIDTHS, a_SAVEWIDTHS, a_SORTTYPE provide indexes
to which elements in the a_bivar[] array are for which AWK internal variable.
Functions & Macros
- awka_getd(a_VAR *)This macro calls the awka_getdval() function, appending
the calling file & line number for debug purposes. It read-casts the variable
to a number, and returns the doublevalue of the variable. By read-cast,
we mean that if the variable is a string itremains so, but dval is set,
and type2 is set to a_DBLSET. But if the a_VARis a regular expression,
the re structure is dropped and the variable converted to anumber. If you’re
not sure whether an a_VAR you’re about to read is a number, and youwant
to read it as one, simply call awka_getd(varname) - its the easiest way.
- awka_getd1(a_VAR *)Same as awka_getd, except this will be faster if the
a_VAR * is a variable. Do notuse this if the a_VAR * is a function call
return value, as it’ll call the functionseveral times! In this case, use
awka_getd() instead.
- awka_gets(a_VAR *)Similar to awka_getd(), this read-casts an a_VAR to a
string, and returns the characterarray pointed to by ptr.
- awka_gets1(a_VAR *)Use this where the a_VAR * is a variable, not a function
call that returns an a_VAR *,for faster performance.
- awka_getre(a_VAR *)Write-casts the a_VAR * to a regular expression, and
returns the pointer to the awka_regexpstructure. Write-cast means that the
existing contents of the variable are dropped infavour of the new contents.
- static char *awka_strcpy(a_VAR *var, char *str)This function sets var to
string type, and copies to it the contents of str.It returns a pointer to
var->ptr.
- a_VAR *awka_varcpy(a_VAR *va, a_VAR *vb)This function copies the contents
of scalar a_VAR *vb to scalar a_VAR *va, and returns a pointer to va.
- double awka_varcmp(a_VAR *va, a_VAR *vb)This function compares the contents
of the two scalar variables, and returns 0 if the variables are equal,
-1 if va is less than vb, or 1 if va is greater. Numerical comparison is
used where possible, otherwise string.
- a_VAR *awka_vardup(a_VAR *va)This function creates a new a_VAR *, copies
the contents of va, and returns a pointerto the new structure.
- awka_varinit(a_VAR *)A macro that takes a NULL a_VAR *, mallocs space for
it, and initialises it to a_VARNUL.
- void awka_killvar(a_VAR *)Frees all memory used by the a_VAR, except the
structure itself.
- static a_VAR * awka_argv()You can use a_bivar[a_ARGV] directly when reading
the value of elements in the array,but when you want to write to the array,
use the above function instead, as it willmake sure the changes are recognised
elsewhere in libawka.
- static a_VAR * awka_argc()You can use a_bivar[a_ARGC] directly when reading
its value, but when you want to write to it, use the above function instead,
as it will make sure the change is recognised elsewhere in libawka.
Data Structures & Variables
These are strictly internal to the array module within libawka. If you
need functionality other than that provided by the array functions, I
recommend creating your own custom array data structures and interface
functions, otherwise you could cause serious problems. The structure definitions
are too lengthy to list here, and the foolhardy may find them in lib/array.h
within the awka distribution.
Defines
a_ARR_TYPE_NULLThe ’type’ of an array that has not been initialised, or has
been deleted.
a_ARR_TYPE_SPLITThe ’type’ of an array populated by the split() builtin function.
a_ARR_TYPE_HSHThe ’type’ of arrays populated within the AWK script, eg. arr["pigs"]
= cows.
a_ARR_CREATEWhen searching arrays, specifies that an element is to be created
if it doesn’talready exist in the array.
a_ARR_QUERYWhen searching arrays, this will not create a new element if
it doesn’t alreadyexist.
a_ARR_DELETEIn an array search, this flag will cause the element to be
deleted from the array.
Functions
- void awka_arraycreate( a_VAR *var, char type );Allocates an array structure
of type type, makes var->ptr pointto it, and sets var->type to a_VARARR. The
type argument maybe one of a_ARR_TYPE_NULL, a_ARR_TYPE_SPLIT or a_ARR_TYPE_HSH,
according tohow the array will be populated.
- void awka_arrayclear( a_VAR *var );Assumes var is an a_VARARR, this deletes
the contents of the array structurepointed to by var->ptr.
- a_VAR * awka_arraysearch1( a_VAR *v, a_VAR *element, char create, int set
);Searches array variable v for index element. If it does not exist, andcreate
is a_ARR_CREATE, a new element in the array for this value will be added.If
the element is found (or created) and create is not a_ARR_DELETE, thefunction
will return a pointer to the a_VAR for that element. For a_ARR_DELETE,
theelement will be deleted from the array. The set value should be FALSE.
- a_VAR * awka_arraysearch( a_VAR *v, a_VARARG *va, char create );Searches
array variable v as per awka_arraysearch1(), except that this workswith
multiple index subscripts (eg, arr[x, y]).
- double awka_arraysplitstr( char *str, a_VAR *v, a_VAR *fs, int max );The
AWK builtin split() function. It splits str into array variable v,based
on fs, up to max number of fields. If fs is NULL, thena_bivar[a_FS] will
be used. Otherwise fs may contain an empty string, asingle-character string,
or a regular expression. The number of fields createdin v is returned.
- int awka_arrayloop( a_ListHdr *ah, a_VAR *v );This function implements
the "for (i in j)" feature in AWK. You provide ah, making sure it is initialised
to zeroes. The best way to understand how to call this function is to type:
awka ’BEGIN { for (i in j) x = j[i]; }’and see what is generated as a result.
You don’t have to understand the a_ListHdrstructure or sub-structures to
use this function.
- int awka_arraynext( a_VAR *v, a_ListHdr *ah, int pos );Given that ah has
been populated by a call to awka_arrayloop(), this functioncopies the (string
or integer) element at position pos in the list to v, then returns pos+1,
or zero if there are no more elements in the array list.
- void awka_alistfree( a_ListHdr *ah );Frees the last list element in ah.
- void awka_alistfreeall( a_ListHdr *ah );Frees all memory held by ah, and
sets its contents to zero/NULL.
- a_VAR * awka_dol0(int set);The best means of accessing the $0 a_VAR, as
it updates its contents with any pendingchanges. Make set zero if you’re
reading the value of $0, but if you want toset $0, make it 1.
- a_VAR * awka_doln(int fld, int set);This function returns the a_VAR * of
the $1..$n variable identified by fld, updating the field array with any
refreshed $0 contents first if necessary. If youwant to read the value
of $fld, make set zero, otherwise it should be 1.
These
are documented in lib/builtin.h in the awka distribution. You can call any
of the builtin functions as often as you like. Those that return a_VAR’s
also provide a keep flag that, if TRUE, will return a variable that you
must free, otherwise they will use a temporary variable that you don’t have
to worry about freeing, but will be reused elsewhere sooner or later.
The functions should be pretty much as you’d expect them, except that many
require an a_VARARG as input, and we haven’t discussed how to create one
- we will now.
a_VARARG * awka_arg0(char);
a_VARARG * awka_arg1(char, a_VAR *);
a_VARARG * awka_arg2(char, a_VAR *, a_VAR *);
a_VARARG * awka_arg3(char, a_VAR *, a_VAR *, a_VAR *);
- a_VARARG * awka_vararg(char, a_VAR *var, ...);These functions populate & return
a pointer to an a_VARARG structure. The char argument, if TRUE, will make
you responsible for freeing the structure, otherwiseit’ll be a temporary
one that libawka will manage. awka_arg0() will return anempty structure
(ie. no args), awka_arg1() will have one a_VAR * in it, and soon. Where
you want to put more than four a_VAR *’s inside an a_VARARG, you can call
awka_vararg with as many as you like, or if there’s seriously a lot, maybewrite
your own loop of code to populate an a_VARARG - its not rocket science.
Data Structures & Variables
- _a_IOSTREAMtypedef struct { char *name; /* name of output file or
device */ FILE *fp; /* file pointer */ char *buf; /* input
buffer */ char *current; /* where up to in buffer */ char *end;
/* end of data in buffer */ int alloc; /* size of input buffer
*/ char io; /* input or output stream flag */ char pipe;
/* true/false */ char interactive; /* whether from a /dev/xxx stream
or not */} _a_IOSTREAM;extern _a_IOSTREAM *_a_iostream;extern int _a_ioallc,
_a_ioused;Controls input and output streams used by AWK’s getline, print
and printfbuiltin functions. The two int variables record the space allocated
in the _a_iostream array, and the number of elements used, respectively.
I list this information here in case you wish to create fread, fwrite
and fseek functions for awka, as these will need low-level access to the
streams.
- Functions
- a_VAR * awka_getline(char keep, a_VAR *target, char *input, int pipe, char
main);As previously described, keep controls whether you want to be responsible
forfreeing the a_VAR the function returns or not. Moving on, target is
the a_VARto hold the line of data to be read (you provide this one). input
is the nameof the input file or command. pipe is TRUE if input is a command
ratherthan a file, eg. "sort stuff | getline x". main should always be false.If
input is NULL, getline will try to read from the file identified by a_bivar[a_FILENAME],
or from the next element in the a_bivar[a_ARGV] array.
- I won’t go into detail
about awka_fflush, awka_close, awka_printf & so on, as these should be easy
enough to understand and use, and the chances are you should use the native
C variety anyway where possible.
Ah, now we’re in murky
water indeed, as awka inherited its RE library from Libc, and treats it
like a magical black box that does its bidding. Want my advice? Treating
the RE library & structure like a black box is a wise thing to do, as its
ugly-looking stuff.
Ok, we know that when an a_VAR has been set to a_VARREG,
its ptr value will point to an awka_regexp structure. Do we need to know
what’s in this structure? I don’t think so. What we do need are the functions
that help us compile and execute regular expressions. Oops, getting ahead
of myself. RE’s are like C programs, they need to be compiled before they
can be used to search strings. This basically is a parsing of the RE pattern
into a tree structure that is easier to navigate while searching, and is
a one-off task.
- awka_getre(a_VAR *)This macro is the easiest method of creating & compiling
a regexp. Providing you’veset the a_VAR to the string value of the re pattern,
this macro call works a treat.
- a_VAR *awka_match(char keep, char fcall, a_VAR *va, a_VAR *rva);This function
is the implementation of AWK’s match() function, and is the mostsimple way
of evaluating an RE against a string. keep is as previouslydiscussed, fcall
should be set to TRUE if you want a_bivar[a_RSTART] anda_bivar[a_RLENGTH]
to be set, otherwise FALSE, va contains the string, andrva contains the
regular expression. The numerical a_VAR returned is1 on success, zero on
failure.
- I was going to describe the lower-level methods of compiling and
matching against RE’s, but when I looked at it, there seemed to be a lot
of complexity for no real gain in functionality. All you get is the ability
to avoid using a_VAR structures to manage the regular expressions, and
honestly I don’t see what you’d gain from this given how much more complexity
you’d have to deal with.
I haven’t described all of the functions available
in libawka.h, not by any means. But I have tried to avoid functions that
are really only meant for internal use, or that are only needed by translated
code and should be done in other ways by library code. In the same way
I’ve avoided describing structures that were intended to remain privy to
a module within libawka, and you really shouldn’t need to tamper with them.
Any questions at all, or suggestions for improving this page, let me know
via andrewsumner@yahoo.com. Make sure you preface any message title with
the word "awka" so I know its not spam.
awka(1)
, awka-elm(5)
.
Table of Contents