Consider the following exercise (Ex. 1.5 in “Paradigms of Artifical Intelligence Programming” by Peter Norvig):
Write a function to compute the dot product of two sequences of numbers, represented as lists. The dot product is computed by multiplying corresponding elements and then adding up the resulting products. Example:
(dot-product '(10 20) '(3 4)) => 110
Here are four implementations (the first one is mine, the other three are the solutions to the exercise in the book)
1. My solution: Recursion with accumulator — tail call optimized (TCO)
(defun dot-product (lst1 lst2 &optional (acc 0))
"Computes the dot product of two sequences, represented as lists."
(if (not lst1)
acc
(let ((x (first lst1))
(y (first lst2))
(lst11 (rest lst1))
(lst22 (rest lst2)))
(dot-product lst11 lst22 (+ acc (* x y))))))
2. Solution 1 in PAIP: apply and mapcar
(defun dot-product1 (lst1 lst2)
(apply #'+ (mapcar #'* lst1 lst2)))
3. Solution 2 in PAIP: recursive without TCO
(defun dot-product2 (lst1 lst2)
(if (or (null lst1) (null lst2))
0
(+ (* (first lst1) (first lst2))
(dot-product2 (rest lst1) (rest lst2)))))
4. Solution 3 in PAIP: iteration and indexing
(defun dot-product3 (lst1 lst2)
(let ((sum 0))
(dotimes (i (length lst1))
(incf sum (* (elt lst1 i) (elt lst2 i))))
sum))
Performance
We test the solutions like this:
Set up some test data:
CL-USER> (defparameter *a* (make-list 100000 :initial-element 1))
dot-product: Use the time macro to measure the performance of the function dot-product:
CL-USER> (time (dot-product *a* *a*))
Evaluation took:
0.002 seconds of real time
0.001302 seconds of total run time (0.001301 user, 0.000001 system)
50.00% CPU
3,450,092 processor cycles
0 bytes consed
100000
The function does not allocate any new memory, uses 50% CPU on average and finishes in 0.001302 seconds for a 10,000 element list.
Similarly, we test the other three functions:
dot-product1:
CL-USER> (time (dot-product1 *a* *a*))
Evaluation took:
0.003 seconds of real time
0.002625 seconds of total run time (0.002611 user, 0.000014 system)
100.00% CPU
6,967,544 processor cycles
3,205,632 bytes consed
100000
dot-product2:
CL-USER> (time (dot-product2 *a* *a*))
Control stack guard page temporarily disabled: proceed with caution
...
This one aborted because it ran out of stack space before it could complete.
dot-product3:
CL-USER> (time (dot-product3 *a* *a*))
Evaluation took:
58.350 seconds of real time
58.347480 seconds of total run time (58.330376 user, 0.017104 system)
99.99% CPU
155,215,653,812 processor cycles
0 bytes consed
100000
So it seems my solution is twice as fast as the fastest one in PAIP and has the advantage of not using any extra memory, whereas the PAIP solution takes up nearly 3MB (32 bytes per element). My solution is 44,813 times faster than the PAIP solution that does not allocate extra memory. Admittedly, the PAIP solutions are not about getting into performance but more about showing how to write code in Lisp; and neither has the author delved into Tail-Call-Optimization yet; so I am not criticizing PAIP at all.
Well, dot-product1 looks very concise and elegant, doesn’t it? What if we try to run it with a larger list?
CL-USER> (defparameter *a* (make-list 1000000 :initial-element 1))
*A*
And now, alas, it also exhausts the stack space:
CL-USER> (time (dot-product1 *a* *a*))
Control stack guard page temporarily disabled: proceed with caution
; Evaluation aborted on #<SB-KERNEL::CONTROL-STACK-EXHAUSTED {100341FE93}>.
Presumably this is because we are trying to pass in 1,000,000 arguments to apply and the arguments have to be pushed on to the stack. Perhaps we can tweak this a bit so that the arguments are not pushed on to the stack:
(defun dot-product4 (lst1 lst2)
(reduce #'+ (mapcar #'* lst1 lst2) :initial-value 0))
And now this function does run:
CL-USER> (time (dot-product4 *a* *a*))
Evaluation took:
0.212 seconds of real time
0.210735 seconds of total run time (0.182024 user, 0.028711 system)
[ Run times consist of 0.143 seconds GC time, and 0.068 seconds non-GC time. ]
99.53% CPU
563,737,032 processor cycles
47,974,560 bytes consed
1000000
Hmm.. it seems to be allocating even more bytes per element than before (47 bytes per element) and there is now some GC overhead. What about my function?
CL-USER> (time (dot-product *a* *a*)) ; *a* is a list with a million elements
Evaluation took:
0.014 seconds of real time
0.013474 seconds of total run time (0.013462 user, 0.000012 system)
92.86% CPU
36,021,059 processor cycles
0 bytes consed
1000000
Still jolly good!
Update
Svente posted another (more efficient, very readable) solution in the comments below:
(defun dot-product5 (lst1 lst2)
(loop for element1 in lst1
for element2 in lst2
sum (* element1 element2)))
This code uses two nested for loops and a (invisible, hidden) local variable to hold the sum. How does it perform?
CL-USER> (time (dot-product5 *a* *a*))
Evaluation took:
0.007 seconds of real time
0.007470 seconds of total run time (0.007470 user, 0.000000 system)
100.00% CPU
19,859,546 processor cycles
0 bytes consed
1000000
It performs very well indeed; better than my recursive solution.
Summary
CPU, time, memory taken with a list of 10,000 elements:
| Soluton |
CPU % |
Time (s) |
Processor cycles/element |
Bytes consed |
dot-product |
50.0% |
0.001302 |
34.5 |
0 |
dot-product1 |
100.0% |
0.002625 |
69.7 |
3,205,632 |
dot-product2 |
does not even run to completion |
dot-product3 |
99.99% |
58.35 |
1552 |
0 |
CPU, time, memory taken with a list of 1,000,000 elements:
| Soluton |
CPU % |
Time (s) |
Processor cycles/element |
Bytes consed |
dot-product |
100.0% |
0.010221 |
27.2 |
0 |
dot-product4 |
99.21% |
0.1259 |
336 |
48M |
dot-product5 |
100.0% |
0.0079 |
21.2 |
0 |