Tag Archives: erlang

Facebook uses Erlang

Here is proof: http://www.joyent.com/?gclid=CLeIy4La-pQCFRMJewoddC7c3A

For Facebook Chat, we rolled our own subsystem for logging chat messages (in C++) as well as an epoll-driven web server (in Erlang) that holds online users’ conversations in-memory and serves the long-polled HTTP requests. Both subsystems are clustered and partitioned for reliability and efficient failover. Why Erlang? In short, because the problem domain fits Erlang like a glove. Erlang is a functional concurrency-oriented language with extremely low-weight user-space “processes”, share-nothing message-passing semantics, built-in distribution, and a “crash and recover” philosophy proven by two decades of deployment on large soft-realtime production systems.

Up and running with Emacs, Erlang, and Distel

Umm… did I mention that I yearn for hacking Erlang code, too? Following Bill Clementson’s lead, I set up  Emacs+Erlang on my Ubuntu hardy box.

Step 1: Install erlang and erlang-mode

I used the version of erlang and erlang-mode for emacs in Ubuntu’s repositories. I might come to regret it some day, but for now the following was just too simple to resist:

$ sudo aptitude install erlang
# The above will install erlang-mode too;
# if it does not just "apt-get install erlang-mode"

Step 2: Configure erlang-mode in Emacs

I just put the following in my .emacs:

;; Erlang-mode
(require 'erlang-start)
(add-to-hook 'erlang-mode-hook
             (lambda ()
            ;; when starting an Erlang shell in Emacs, the node name
            ;; by default should be "emacs"
            (setq inferior-erlang-machine-options '("-sname" "emacs"))
            ;; add Erlang functions to an imenu menu
            (imenu-add-to-menubar "imenu")))

Now I can launch Emacs and open an erlang file, which puts me in Erlang mode.  I can start and Erlang shell with “C-c C-z”, compile the Erlang code with “C-c C-k”, and view the compilation result in the Erlang buffer (if the buffer is hidden) with “C-c C-l”.  Or, I can just switch to the Erlang shell with “C-c C-z”.  The customizations above enable me to get a menu item in the Emacs menubar with a list of functions defined in the file I am visiting, which is a big help. I am used to “ecb-mode” when coding Python; ECB does not grok Erlang yet, so the imenu is a handy substitute when coding Erlang.

Step 3: Install Distel

Distel is to Erlang and erlang-mode what SLIME is to Lisp and lisp-mode.

To get distel:

$ cd ~/.emacs.d/
$ svn co http://distel.googlecode.com/svn/trunk/ distel
$ cd distel
$ make
$ cd doc
$ make postscript && make postscript # must run twice
$ make info && sudo make install # install the Info documentation
$ info distel # read the distel info documentation

Step 4: Configure Emacs to use Distel

I just need to put this in my .emacs:

(push "/home/parijat/.emacs.d/distel/elisp/" load-path)
(require 'distel)
(distel-setup)

Step 5: Configure Erlang

Distel is designed to work in a distributed Erlang system. It can connect to specified Erlang nodes. My strategy is to use the standard erlang-mode command “C-c C-z” to start an Erlang shell, and connect to it using Distel by “C-c C-d n”. The latter asks for a nodename, which is going to be “emacs@beowulf” (because beowulf is the short hostname of my laptop).

But before we start using remote Erlang nodes, we should create on each physical machine that we will use a ~/.erlang.cookie file with a password, for inter-Erlang node authentication:

$ echo "secret" > ~/.erlang.cookie
$ chmod 0400 ~/.erlang.cookie

Step 6: Play with Distel

So, now we can launch an Erlang node, either via Emacs using “C-c C-z”, or on the command line using “erl -sname mynode”. Then, from within Emacs, we can connect Distel to this node using “C-c C-d n” specifiying the nodename on the prompt. Now we can ask Distel to interrogate the Erlang node for various things.

C-c C-d l ; list erlang processes
PID/Name              Initial Call                              Reds     Msgs
init                  otp_ring0:start/2                         3821        0
erl_prim_loader       erlang:apply/2                           87239        0
error_logger          proc_lib:init_p/5                          229        0
application_controlle erlang:apply/2                            2501        0
<0.7.0>               proc_lib:init_p/5                           45        0
<0.8.0>               application_master:start_it/4               91        0
kernel_sup            proc_lib:init_p/5                         1498        0
rex                   proc_lib:init_p/5                          493        0
global_name_server    proc_lib:init_p/5                           69        0
<0.12.0>              erlang:apply/2                              25        0
<0.13.0>              erlang:apply/2                               4        0
<0.14.0>              erlang:apply/2                               3        0
inet_db               proc_lib:init_p/5                          129        0
net_sup               proc_lib:init_p/5                          312        0
erl_epmd              proc_lib:init_p/5                          147        0

									

Erlang bit syntax wrinkles

The Erlang bit syntax is wonderful, if complicated.

Say you have a Binary like:

84> Bin1 = <>.
<>

Then, this will match:

85> <<A:1/binary, B:8, C:1/binary>> = Bin1.
<<1,0,2>> %% A => 1, C => 2

But this will not match:

86> <<A:1/binary, B:1/binary, C:1/binary>> = Bin1.

=ERROR REPORT==== 19-Aug-2007::17:46:17 ===
Error in process <0.139.0> with exit value: {{badmatch,<<3>>},[{erl_eval,expr,3}]}

** exited: {{badmatch,<<1,0,2>>,[{erl_eval,expr,3}]} **

It seems that I can specify a literal ‘0’ as an int type (the default type) but not a binary type (see http://erlang.org/doc/reference_manual/expressions.html#bit_syntax for the gory details). Wonder why?

Concurrency: Erlang vs Java

Its fashionable to extol the high-performance of Erlang concurrency these days. Programming Erlang, Section 8.11, has this problem:

Write a ring benchmark. Create N processes in a ring. Send a message
round the ring M times so that a total of N * M messages get
sent. Time how long this takes for different values of N and M.

Well, I copied off the Java program from here: http://www.sics.se/~joe/ericsson/du98024.html

And then I wrote this Erlang program:

-module(threadperftest).
-compile(export_all).

timeit() ->
register(mainproc, self()),
io:format(" Procs, Mesgs, SpawnTotal, SpawnProc, RunTotal, RunMesg~n", []),
timeit_aux(1000, 1000, 10000),
init:stop().

timeit_aux(NProcs, NMsgs, NProcsMax) ->
if NProcs =
main([NProcs, NMsgs]),
timeit_aux(NProcs+1000, NMsgs, NProcsMax);
true -> void
end.

main([N, M]) ->
{_, _W0} = statistics(wall_clock),
FirstPid = spawn(fun loop0/0),
% io:format("First: ~p~n", [FirstPid]),
LastPid = setup(N-1, FirstPid),
FirstPid ! LastPid,
{_, W1} = statistics(wall_clock),
LastPid ! M,
receive
stop ->
void
end,
{_, W2} = statistics(wall_clock),
io:format("~6B, ~6B, ~10g, ~10g, ~10g, ~10g~n", [N, M, W1*1.0, 1.0*W1/N, W2*1.0, W2*1000.0/(M*N+N)]).

setup(0, PrevPid) ->
PrevPid;
setup(HowMany, PrevPid) ->
Pid = spawn(fun() -> loop(PrevPid) end),
% io:format("~p: Linking to: ~p~n", [Pid, PrevPid]),
setup(HowMany - 1, Pid).

loop0() ->
receive
LinkedPid when is_pid(LinkedPid) ->
% io:format("First: ~p: Linking to ~p~n", [self(), LinkedPid]),
loop1(LinkedPid);
X ->
erlang:error({unknownMessageError, X, self()})
end.

loop1(LinkedPid) ->
receive
M when is_integer(M), M > 0 ->
% io:format("~p: Received: ~p~n", [self(), M]),
LinkedPid ! (M-1),
loop1(LinkedPid);
M when is_integer(M), M =:= 0 ->
% io:format("~p: Received: ~p, Terminating~n", [self(), M]),
mainproc ! stop,
done;
X ->
erlang:error({unknownMessageError, X, self()})
end.

loop(LinkedPid) ->
receive
M when is_integer(M), M > 0 ->
% io:format("~p: Received: ~p~n", [self(), M]),
LinkedPid ! M,
loop(LinkedPid);
M when is_integer(M), M =:= 0 ->
% io:format("~p: Received: ~p, Terminating~n", [self(), M]),
LinkedPid ! M,
done;
X ->
erlang:error({unknownMessageError, X, self()})
end.

The performance is something like this (all times are in milliseconds, except the last one, which is in microseconds):

 Procs,  Mesgs, SpawnTotal,  SpawnProc,   RunTotal,    RunMesg
1000, 1000, 0.00000e+0, 0.00000e+0, 266.000, 0.265734
2000, 1000, 16.0000, 8.00000e-3, 562.000, 0.280719
3000, 1000, 0.00000e+0, 0.00000e+0, 969.000, 0.322677
4000, 1000, 0.00000e+0, 0.00000e+0, 1672.00, 0.417582
5000, 1000, 15.0000, 3.00000e-3, 2469.00, 0.493307
6000, 1000, 16.0000, 2.66667e-3, 3234.00, 0.538462
7000, 1000, 16.0000, 2.28571e-3, 4015.00, 0.572998
8000, 1000, 16.0000, 2.00000e-3, 4750.00, 0.593157
9000, 1000, 31.0000, 3.44444e-3, 5469.00, 0.607060
10000, 1000, 31.0000, 3.10000e-3, 6375.00, 0.636863

The numbers in the RunMesg column are the time it takes to send one message from one process to another. Why is it increasing? Does the runtime have to do more work to find the target process?

Here is the Java output (program below):

 Procs,  Mesgs, SpawnTotal, SpawnProcs,   RunTotal,   RunMesg
100, 1000, 16.000, 0.160, 15531.000, 15.531
200, 1000, 62.000, 0.310, 27047.000, 27.047
300, 1000, 94.000, 0.313, 37828.000, 37.828
400, 1000, 125.000, 0.313, 46859.000, 46.859
500, 1000, 141.000, 0.282, 54578.000, 54.578
600, 1000, 203.000, 0.338, 60594.000, 60.594
700, 1000, 234.000, 0.334, 65282.000, 65.282
800, 1000, 406.000, 0.508, 68156.000, 68.156
900, 1000, 484.000, 0.538, 69782.000, 69.782
1000, 1000, 735.000, 0.735, 70015.000, 70.015

The last column is in milliseconds whereas for Erlang it was in microseconds, AND the number of procs is 100-1000, whereas for Erlang they were 1000-10,000.

Programming Erlang exercise problem: trouble in the concurrent universe?

Going through Programming Erlang, encountered this exercise problem in Section 8.11:

Write a function start(AnAtom, Fun) to register AnAtom as spawn(Fun). Make sure your program works correctly in the case when two parallel processes simultaneously evaluate start/2. In this case, you must guarantee that one of these processes succeeds and the other fails.

Here is my first solution, and it seems to compile and run.

-module(register_function).
-export([start/2]).

start(AnAtom, Fun) ->
case whereis(AnAtom) of
undefined -> register(AnAtom, spawn(Fun));
_ -> error
end.

Now, is there race condition in the above? I mean, consider Processes A and B both executing start/2:

  1. Process A calls whereis(AnAtom) which evaluates to ‘undefined’;
  2. Process B calls whereis(AnAtom) which evaluates to ‘undefined’;
  3. Process A calls register(AnAtom, spawn(Fun)), which succeeds and returns true;
  4. Process B calls register(AnAtom, spawn(Fun)), which fails as AnAtom has been registered

Superficially, we have satisfied the requirements of the problem: Process A succeeded and B failed. But the sinister problem is that spawn(…) has been called by both processes anyway. Thus we have two instances of Fun being executed, with one of them in an anonymous process that I, at least for now, don’t know how to access.

Ironically, the problem seems to be because presumably register puts AnAtom in some “global” table. Exactly the kind of concurrency problem that Erlang tries to avoid.

How would this be solved? Here is an idea:

  1. Create a “register_function” server;
  2. start/2 sends a message to this server with AnAtom and Fun as contents
  3. presumably the server processes one message at a time and the code between matching a message and executing some expressions is atomic with respect to other messages to the same server

Puff. You’d think Erlang would come with built in functionality for this. Is this important?