What is it
----------

This is a collection of tools for single and multiple constant
multiplication (SCM and MCM) using shifts and adds.  Generation of
minimal cost multiplier blocks is NP-complete even for 1 constant.

Here we provide an optimal cost generator for a single constant <= 19 bits,
based on reference [4].

We also implement our own heuristic MCM algorithm [1], and Dempster/Macleod RAG-n
from [3], and Bull/Horrocks algorithm [2] with Dempster/Macleod modifications from [3]
called BHM.

References
----------

[1] Yevgen Voronenko and Markus Pschel
Multiplierless Multiple Constant Multiplication
to appear in ACM Transactions on Algorithms

[2] D. R. Bull and D. H. Horrocks
Primitive operator digital filters
IEE Proceedings G, 138(3):401-412, 1991

[3] A. G. Dempster and M. D. Macleod Use of minimum-adder multiplier
blocks in FIR digital filters IEEE Transactions in Circuits and
Systems-II: Analog and Digital Signal Processing, 42(9):569-577, 1995

[4] A. G. Dempster and M. D. Macleod.  Constant integer multiplication
using minimum adders IEE Proceedings - Circuits, Devices and Systems,
141(5):407-413, 1994

Build
------

We  provide three  utilities.  Please use  'acm  -gc' if  you need  to
generate code. 'acm' takes all of the arguments of synth (see below in
'synth usage'). Note  however, that 'acm -gc' is  *not* same as 'synth
-gc'.

Primary usage of 'synth' is to experiment with the algorithm and produce
performance plots, while 'acm' should be used as a code generator. 
   
make chains (optimal SCM)
make synth (Hcub, BHM, RAG-n algorithms and testbench)
make acm   (code generator, takes all arguments of synth)

chains usage
-------------

This is  a generator for optimal  SCM DAGs. The DAGs  are only optimal
for 19-bit and smaller constants. Even  though the tool will run for a
wider constants,  the results  will not, in  general be  optimal.  For
20-bit  constants,  there  is   a  very  small  number  of  suboptimal
results. We know of only one cost-6 20-bit constants (699829), however
this tool will produce about 30 such constants,

./chains <bits+1>  2> out

it really  takes maximum shift, and  to generate optimals  for b bits,
max shift must be b+1

There are some constants on  top of chains.cpp that could be adjusted,
like what is printed (costs or actual decompositions).

The output is  set up in a slightly  non-standard way. Status messages
go to stdout, while the actual output to stderr. So that

./chains 14 2> out 

is typical usage.

synth usage
-----------

You can either specify a constant set, or do an experiment on a random
constant set.

1. 'synth' can be used with a specific constant set, but it is not intended
   for code generation, but for (cost) analysis. Given constants c1, c2, c3

   ./synth c1 c2 c3  -v 1 <options> 

   Gives the  information about the  solution.  To get code  use 'acm'
   with  -gc. Also  see the  short 'acm'  section in  the end  of this
   document.

   ./acm c1 c2 c3 -gc


2. Performance data  for random constant  set. This is the  main usage
   model of this tool. 

   ./synth -r <num-experiments> <num-constants> -b <bits> <options> [-seed <random seed>]
	run experiments on random constant set(s)

   ./synth -ex -b <bits> <options>   
	exhaustively generate all (single) constants of given bitwidth

   Typical usage is 

   ./synth -r 100 5 -b 19 -v 1

   runs  100  experiments  on  5  19-bit  constants.  -v  1  increases
   verbosity to 1  and the forces the tool to output  the cost of each
   individual  experiment.  Regardless  of vebosity  the  final output
   line is summary info and average cost.


   One can also run several benchmarks as in 

   ./synth -r 100 1 2 4 6 8 10 -b 19 

   runs 100 experiments on 19-bit constant sets of size 1, 2, 4, 6, 8,
   and 10, and outputs summary info and average cost.

	 $ ./synth  -b 14 -r 100 1 2 4 6 8 10
	  1   4.31   3.42  dep 2.87 sd 0.62 ci95   2.18   4.66 
	  2   8.10   5.32  dep 3.28 sd 0.80 ci95   3.72   6.92 
	  4  15.88   8.56  dep 4.10 sd 0.85 ci95   6.86  10.26 
	  6  24.50  11.69  dep 4.52 sd 1.06 ci95   9.58  13.80 
	  8  32.68  14.33  dep 5.49 sd 1.18 ci95  11.96  16.70 
	 10  41.09  16.79  dep 5.94 sd 1.13 ci95  14.54  19.04 
         ^^n ^^^^ub ^^^^^cost  

         n - number of constants in a set for a given experiment series

         ub - average upper bound on cost (=average CSD cost, unless -esth/-estv is used)

	 cost - average adder cost of the generated solution, this is the 
	        most interesting column

	 dep - average depth of generated solution

	 sd - standard deviation of the 'cost' column (in this case, across 100 expemeriments)

	 ci95 - 95% confidence interval on the cost = [avg.cost - 2sd; avg.cost + 2sd]

3. Depth limits

   The modes  described above  support -maxdepth <depth>  option. This
   constrains the maximal depth of the generated solution, potentially
   increasing  the cost.   Note that  in practice,  under  tough depth
   constraints the cost increases only very modestly.

   To get  minimal possible depth, one  can just use  -maxdepth 0, and
   the tool  will automatically increase  the depth bound so  that the
   solution is feasible.

   NB: This feature currently makes distance function not admissible (
       as defined  in the  paper), and the  tool will  sometimes crash
       with assertion  failed error message. The workaround  is to try
       rerunning  the tool  several times  (without fixing  the random
       seed) or  increasing the depth limit (+1  over minimal possible
       depth is enough).
       
   Example:

   $ ./synth  -b 19 -r 1 1 -maxdepth 0
   Solution infeasible with MAX_DEPTH=0. Increasing MAX_DEPTH=3.
     1   6.00   5.00  dep 3.00 sd 0.00 ci95   5.00   5.00 

   If this fails with an assertion, increasing the depth bound to 3+1=4 
   or rerunning the tool multiple times (with different -seed value, if
   it is used) is typically sufficient.

   If -nofs option is used there is no randomness, and rerunning won't
   help. 

4. Randomness

   In the Hcub search space there are lots of choice. Current strategy
   is to  always optimize for depth,  however there are  still lots of
   choices for intermediate  graph vertices, considered equally "good"
   under the Hcub heuristics. In these situations the tool makes a
   random choice.

   Local randomness can lead to a different solution cost between 
   several runs of the tool. If this is not desired, fix the random
   seed using -seed <seed>, where <seed> is some integer value.

   Also, if you are not satisified with the solution - rerun the tool!
   However, to get reproducible solution always use -seed <seed>, and
   then vary the seed, until you like the solution.

Options:

        -ex       exhaustively go over all constants of given bitwidth
        -v <n>    verbosity n=0,1,2,3 (default=0)
        -b        maximum bitwidth to consider 
        -g        

    Output:

        -gc      Generate C code
        -ga      Generate A-operations, i.e. adders. Shifts are printed values less 1. 
	           2733 = adder(19, 43, -1/7, 4) means that 2733 = -19 + 43<<6 (4 is depth)
                   i.e. -1 is subtraction with no shift (-2 is subtraction and shift by 1)
	           7 is shift by 6
        -gd      Generate DAG (values after slash, eg. t1/-3, are shifts encoded as described
	         above)
        -gdot    Genereate DAG in 'dot' format (dot is utility from graphviz.org package)

    Random constant set generation:

        -r <e> <n>  Run <e> experiments with <n> random constants each.
        -seed <n>   Set random seed to <n> (makes possibly reproducible experiments)
        -gu         Generate <n> unique odd fundamentals when using -r <e> <n>, 
                    (default is sampling with replacement from 1..2^bits-1)
        -pt         Print randomly selected targets

    Algorithm selection: (default our algo with Hcub heuristic)

        -maxb    Use our algo with Hmaxb heuristic (OBSOLETE, this option no longer works)
	-ac1 <f> Use Dempster/Macleod RAG-n, <f> must point to optimal SCM table (.chains)
        -iac1    Improve RAG-n with complete dist-2 tests, note that -ac1 without -iac1
	         seems to be worse than original RAG-n, while -iac1 is slightly better  
        -bhm     Use Dempster/Macleod BHM
	-c1algo  Use Dempster/Macleod depth minimization algo, works on top of our algo, 
	         and possibly RAG-n

    Other:
        -mg      MAX_GEN, don't use
        -ms      MAX_SHIFT, don't use

        -nofs    minimize non-output fundamental sum (NOFS). This option disables all
                 randomness, and solutions are the same between different runs.
                 Non-output fundamentals are intermediate constants generated for the
                 multiplier block. Using this option will reduce adder size in hardware.

	-maxdepth <d>  generate solution of depth <d> or smaller, if <d> is too small,
	               the tool will increase the constraint automatically. Currently
		       with this feature the tool might crash with "assertion failed".
		       Read about easy workarounds in "synth" section above.
                       (!! only works with Hcub)

        -succs2  generate S^2 set, not immediately useful, for research purposes


	-estv    use # of bits in coefficient, i.e. ceil(log2(c)) as cost_estimate 
	         (default = # non-0 CSD bits = CSD cost)

	-esth    use  2*bits+CSD_cost as cost_estimate

        -estopt  use optimal SCM cost as auxiliary cost measure Est(z), 
	         it is not admissible, so the program might crash

	-hc <n>  unused
  

acm usage
---------

acm is a more user friendly code generator. It takes all arguments of
synth, but when generating code handles negative constants, and does
all the final post-processing and shifting etc.

For example, it will generate code for multiplication by 3 and 6, while synth will
just reduce 6 to 3, and assume nothing needs to be generated.

-code                 generate code to stdout [default]
-noheader             do not generate an informative code header
-gap                  generate code that can be processed by Spiral's GAP
-dot                  generate a dag (to stdout) in Graphviz's Dot format
-dotcode <dotfile>    geneate code to stdout and dag to <dotfile>

