Consider the following exercise (Ex. 1.5 in “Paradigms of Artifical Intelligence Programming” by Peter Norvig):

Write a function to compute the dot product of two sequences of numbers, represented as lists. The dot product is computed by multiplying corresponding elements and then adding up the resulting products. Example:

`(dot-product '(10 20) '(3 4)) => 110`

Here are four implementations (the first one is mine, the other three are the solutions to the exercise in the book)

#### 1. My solution: Recursion with accumulator — tail call optimized (TCO)

(defun dot-product (lst1 lst2 &optional (acc 0)) "Computes the dot product of two sequences, represented as lists." (if (not lst1) acc (let ((x (first lst1)) (y (first lst2)) (lst11 (rest lst1)) (lst22 (rest lst2))) (dot-product lst11 lst22 (+ acc (* x y))))))

#### 2. Solution 1 in PAIP: apply and mapcar

(defun dot-product1 (lst1 lst2) (apply #'+ (mapcar #'* lst1 lst2)))

#### 3. Solution 2 in PAIP: recursive without TCO

(defun dot-product2 (lst1 lst2) (if (or (null lst1) (null lst2)) 0 (+ (* (first lst1) (first lst2)) (dot-product2 (rest lst1) (rest lst2)))))

#### 4. Solution 3 in PAIP: iteration and indexing

(defun dot-product3 (lst1 lst2) (let ((sum 0)) (dotimes (i (length lst1)) (incf sum (* (elt lst1 i) (elt lst2 i)))) sum))

### Performance

We test the solutions like this:

Set up some test data:

CL-USER> (defparameter *a* (make-list 100000 :initial-element 1))

**dot-product**: Use the `time`

macro to measure the performance of the function `dot-product`

:

CL-USER> (time (dot-product *a* *a*)) Evaluation took: 0.002 seconds of real time 0.001302 seconds of total run time (0.001301 user, 0.000001 system) 50.00% CPU 3,450,092 processor cycles 0 bytes consed 100000

The function does not allocate any new memory, uses 50% CPU on average and finishes in 0.001302 seconds for a 10,000 element list.

Similarly, we test the other three functions:

**dot-product1**:

CL-USER> (time (dot-product1 *a* *a*)) Evaluation took: 0.003 seconds of real time 0.002625 seconds of total run time (0.002611 user, 0.000014 system) 100.00% CPU 6,967,544 processor cycles 3,205,632 bytes consed 100000

**dot-product2**:

CL-USER> (time (dot-product2 *a* *a*)) Control stack guard page temporarily disabled: proceed with caution ...

This one aborted because it ran out of stack space before it could complete.

**dot-product3**:

CL-USER> (time (dot-product3 *a* *a*)) Evaluation took: 58.350 seconds of real time 58.347480 seconds of total run time (58.330376 user, 0.017104 system) 99.99% CPU 155,215,653,812 processor cycles 0 bytes consed 100000

So it seems my solution is twice as fast as the fastest one in PAIP and has the advantage of not using any extra memory, whereas the PAIP solution takes up nearly 3MB (32 bytes per element). My solution is 44,813 times faster than the PAIP solution that does not allocate extra memory. Admittedly, the PAIP solutions are not about getting into performance but more about showing how to write code in Lisp; and neither has the author delved into Tail-Call-Optimization yet; so I am not criticizing PAIP at all.

Well, `dot-product1`

looks very concise and elegant, doesn’t it? What if we try to run it with a larger list?

CL-USER> (defparameter *a* (make-list 1000000 :initial-element 1)) *A*

And now, alas, it also exhausts the stack space:

CL-USER> (time (dot-product1 *a* *a*)) Control stack guard page temporarily disabled: proceed with caution ; Evaluation aborted on #<SB-KERNEL::CONTROL-STACK-EXHAUSTED {100341FE93}>.

Presumably this is because we are trying to pass in 1,000,000 arguments to `apply`

and the arguments have to be pushed on to the stack. Perhaps we can tweak this a bit so that the arguments are not pushed on to the stack:

(defun dot-product4 (lst1 lst2) (reduce #'+ (mapcar #'* lst1 lst2) :initial-value 0))

And now this function does run:

CL-USER> (time (dot-product4 *a* *a*)) Evaluation took: 0.212 seconds of real time 0.210735 seconds of total run time (0.182024 user, 0.028711 system) [ Run times consist of 0.143 seconds GC time, and 0.068 seconds non-GC time. ] 99.53% CPU 563,737,032 processor cycles 47,974,560 bytes consed 1000000

Hmm.. it seems to be allocating even more bytes per element than before (47 bytes per element) and there is now some GC overhead. What about my function?

CL-USER> (time (dot-product *a* *a*)) ; *a* is a list with a million elements Evaluation took: 0.014 seconds of real time 0.013474 seconds of total run time (0.013462 user, 0.000012 system) 92.86% CPU 36,021,059 processor cycles 0 bytes consed 1000000

Still jolly good!

#### Update

Svente posted another (more efficient, very readable) solution in the comments below:

(defun dot-product5 (lst1 lst2) (loop for element1 in lst1 for element2 in lst2 sum (* element1 element2)))

This code uses two nested for loops and a (invisible, hidden) local variable to hold the sum. How does it perform?

CL-USER> (time (dot-product5 *a* *a*)) Evaluation took: 0.007 seconds of real time 0.007470 seconds of total run time (0.007470 user, 0.000000 system) 100.00% CPU 19,859,546 processor cycles 0 bytes consed 1000000

It performs very well indeed; better than my recursive solution.

**Summary**

CPU, time, memory taken with a list of 10,000 elements:

Soluton | CPU % | Time (s) | Processor cycles/element | Bytes consed |
---|---|---|---|---|

`dot-product` |
50.0% | 0.001302 | 34.5 | 0 |

`dot-product1` |
100.0% | 0.002625 | 69.7 | 3,205,632 |

`dot-product2` |
does not even run to completion | |||

`dot-product3` |
99.99% | 58.35 | 1552 | 0 |

CPU, time, memory taken with a list of 1,000,000 elements:

Soluton | CPU % | Time (s) | Processor cycles/element | Bytes consed |
---|---|---|---|---|

`dot-product` |
100.0% | 0.010221 | 27.2 | 0 |

`dot-product4` |
99.21% | 0.1259 | 336 | 48M |

`dot-product5` |
100.0% | 0.0079 | 21.2 | 0 |

## Comments

The main trouble is that, unlike Scheme, the Common Lisp standard doesn’t

guarantee proper tail calls, so your solution is implementation specific.

True. These tests were done on SBCL on a Mac. Perhaps other implementations would not do TCO. I remember vaguely that CMUCL did not.

First off, I think that the following is pretty idiomatic:

(defun dot-product-4 (list-1 list-2)

(loop :for element-1 :in list-1

:for element-2 :in list-2

:sum (* element-1 element-2)))

It also runs about 10% faster than yours.

Note that tail call optimization is not a guaranteed feature of Common

Lisp implementations.

Solution 1 of PAIP is OK. It is not the most efficient one, but it will scale linearly. Solution 2 and 3 have obvious problems. Solution 2 blows the stack, and solution 3 is a Shlemiel the painter (using elt in a loop over a list).

I have also made the experience that the most flexible utility

functions are often not compiled to the most efficient code. It seems

that current compilers are reluctant to apply too aggressive inlining

and other optimizations. This might get better with higher speed

optimization declarations, but the flexibility of these functions

often precludes too much loss of generality.

Another example is the concatenation of strings. For the occasional

concatenation, use (concatenate ‘string …), but in tight inner

loops, it is better to use with-output-to-string or even preallocate

the target string and loop.

Svente: thank you for your comments. Yes, your solution is both very readable and very efficient. I’ve updated my post to include your solution.

As for your comment about “Solution 1”:

| Solution 1 of PAIP is OK. It is not the most efficient one, but it will scale linearly.

Umm.. it too blows up at a million elements long list. Presumably because apply requires the arguments to be pushed on to the stack.

Simply write (reduce #’+ …) instead of (apply #’+ …). It’s a classic mistake to hit lambda-parameters-limit by using apply too broadly.

It is true that tail call optimization is not a guaranteed feature of Common

Lisp implementations, but it is a feature of all CL implementations that are in general use: CMUCL SBCL CCL Allegro LispWorks CLisp ECL.