xmpp 中文翻譯計畫 網站



iq:roster 的交易過程真是複雜的可以

Concurrent Programming 相關報告

一. 我會接觸Erlang的緣由
1.RFID Middleware

2.jabber (xml::stream http://zh.wikipedia.org/wiki/Jabber)

3.ejabber (http://www.process-one.net/en/ )

二. 現在的商業環境(web server)所面臨的問題

傳統上httpd 使用Prefork的方式來解決,短時間時密集連線的問題,在現在的環境愈到了嚴重的挑戰,比如: HTTP_Streaming、Server Push、COMET 這些需要長時間連線的架構,使得httpd 能夠服務的連線變少了,而fork process 最大的問題是,他所需要佔用記憶體的空間過於龐大,於是其他的伺服器架構崛起(lighthttpd ghttpd …)

The C10K problem( http://www.kegel.com/c10k.html )
It's time for web servers to handle ten thousand clients simultaneously, don't you think? After all, the web is a big place now.
And computers are big, too. You can buy a 1000MHz machine with 2 gigabytes of RAM and an 1000Mbit/sec Ethernet card for $1200 or so. Let's see - at 20000 clients, that's 50KHz, 100Kbytes, and 50Kbits/sec per client. It shouldn't take any more horsepower than that to take four kilobytes from the disk and send them to the network once a second for each of twenty thousand clients. (That works out to $0.08 per client, by the way. Those $100/client licensing fees some operating systems charge are starting to look a little heavy!) So hardware is no longer the bottleneck???

三. Concurrent Programming
1. fork
(程式+資料) --fork(複製一份)(程式+資料)

當程式fork 後,child 繼承原來的資料,此後彼此不相關,如果要傳遞資訊,需要使用pipe sharememory 或是 unix socket 來做資料交換

2. thread
事實上在Linux 系統下,執行緒只是一個light weight process:Linux 核心是以fork() system call 來產生一個新的行程(process),而執行緒是以clone() system call 產生的。fork()和clone()的差別只是在clone()可以指定和父行程共用的資源有哪些,當所有資源都和父行程共用時就相當於一個執行緒了。因為Thread 的使用會讓子父行程共用資源,因此非常容易引發dead lock / race condition …這類的問題

3. lightweight Threads ( http://www.defmacro.org/ramblings/concurrency.html)
Erlang process 是一個輕量級的Thread,因此他可以非常輕易的去開啟或是結束且快速在彼此做切換,因為掀開他的底層,他只是一個簡單的function罷了,process節省了大量的context switching浪費僅在一些function上做切換的動作(Erlang 的Thread 是 vm level thread)


四. Erlang ( http://www.erlang.org/ )
1.以下是 about Erlang 對他自己的簡述

Erlang is a programming language which has many features more commonly associated with an operating system than with a programming language: concurrent processes, scheduling, memory management, distribution, networking, etc.
The initial open-source Erlang release contains the implementation of Erlang, as well as a large part of Ericsson's middleware for building distributed high-availability systems.
Erlang is characterized by the following features:
Concurrency - Erlang has extremely lightweight processes whose memory requirements can vary dynamically. Processes have no shared memory and communicate by asynchronous message passing. Erlang supports applications with very large numbers of concurrent processes. No requirements for concurrency are placed on the host operating system.
Distribution - Erlang is designed to be run in a distributed environment. An Erlang virtual machine is called an Erlang node. A distributed Erlang system is a network of Erlang nodes (typically one per processor). An Erlang node can create parallel processes running on other nodes, which perhaps use other operating systems. Processes residing on different nodes communicate in exactly the same was as processes residing on the same node.
Soft real-time - Erlang supports programming "soft" real-time systems, which require response times in the order of milliseconds. Long garbage collection delays in such systems are unacceptable, so Erlang uses incremental garbage collection techniques.
Hot code upgrade - Many systems cannot be stopped for software maintenance. Erlang allows program code to be changed in a running system. Old code can be phased out and replaced by new code. During the transition, both old code and new code can coexist. It is thus possible to install bug fixes and upgrades in a running system without disturbing its operation.
Incremental code loading - Users can control in detail how code is loaded. In embedded systems, all code is usually loaded at boot time. In development systems, code is loaded when it is needed, even when the system is running. If testing uncovers bugs, only the buggy code need be replaced.
External interfaces - Erlang processes communicate with the outside world using the same message passing mechanism as used between Erlang processes. This mechanism is used for communication with the host operating system and for interaction with programs written in other languages. If required for reasons of efficiency, a special version of this concept allows e.g. C programs to be directly linked into the Erlang runtime system.

2.Erlang 語言上的概觀
書籍: ( http://pragmaticprogrammer.com/titles/jaerlang/index.html )

[ Sequential Erlang ]


Consider the factorial function N! defined by:
N!=N*(N-1) when N>0
N!=1 when N=0


fac(N) when N > 0 -> N * fac(N-1);
fac(0)-> 1.


-export([sum1/1, sum2/1]).

sum1([H T]) -> H + sum1(T);
sum1([]) -> 0.

sum2(L) -> sum2(L, 0).
sum2([], N) -> N;
sum2([H T], N) -> sum2(T, H+N).

[ Concurrency Programming ]



-export([start/0, say /2]).

say (What, 0) ->
say (What, Times) ->
io:format("~p~n", [What]),
say_something(What, Times - 1).

start() ->
spawn(tut14, say, [hello, 3]),
spawn(tut14, say, [goodbye, 3]).



loop() ->
{rectangle, Width, Ht} ->
io:format("Area of rectangle is ~p~n",[Width * Ht]),
{circle, R} ->
io:format("Area of circle is ~p~n", [3.14159 * R * R]),
Other ->
io:format("I don't know what the area of a ~p is ~n",[Other]),

We can create a process which evaluates loop/0 in the shell:

Pid = spawn(area_server,loop,[]).
Pid ! {rectangle, 6, 10}.
Pid ! {circle, 23}.
Pid ! {triangle,2,4,5}.

4. Erlang –style process or event-based model for actors ( http://lambda-the-ultimate.org/node/1615 )
( http://lamp.epfl.ch/~phaller/doc/haller07coord.pdf )

Message passing
Each process has its own input queue for messages it receives. New messages received are put at the end of the queue. When a process executes a receive, the first message in the queue is matched against the first pattern in the receive, if this matches, the message is removed from the queue and the actions corresponding to the the pattern are executed.
However, if the first pattern does not match, the second pattern is tested, if this matches the message is removed from the queue and the actions corresponding to the second pattern are executed. If the second pattern does not match the third is tried and so on until there are no more pattern to test. If there are no more patterns to test, the first message is kept in the queue and we try the second message instead. If this matches any pattern, the appropriate actions are executed and the second message is removed from the queue (keeping the first message and any other messages in the queue). If the second message does not match we try the third message and so on until we reach the end of the queue. If we reach the end of the queue, the process blocks (stops execution) and waits until a new message is received and this procedure is repeated.
Of course the Erlang implementation is "clever" and minimizes the number of times each message is tested against the patterns in each receive.
五. Erlang相關資源
Open Source Erlang

Mail List:
Erlang-questions -- Erlang/OTP discussions

Concurrent programming in Erlang
Programming Erlang Software for a Concurrent World

xmpp 實做的分享

最近在寫jabber server, jabber 是建構在xmpp protocol 上的一個IM,因為RFC的規格制定曠日費時,所以;jabber 以xmpp 為基礎,自己又定義了約200 個協定XEP-0001~0214,而 xmpp 主要由五個protocol所組成,分別是RFC-3920~3923 RFC-4622

我目前的進度已經可以讓像 Exodus or Pandion(IM Client) 連接上我自己實做的jabber server,預計下星期我就能讓im client 直接在上面talk,且訂閱彼此的狀態...在寫的過程成中越來越覺得他的複雜,其實,這大概是我目前寫過最複雜的伺服器,不過有一點心得可以先分享給大家,其實有一些人有一個疑問,jabber長的像什麼? 如果我們撇開他在IM的實做(處理訊息的傳遞也是一種運算資源),我們可以把它看成是一家公司,一家公司會有他對客戶的服務,而當產品要製作時,它需要資源,需要應徵人員,每個應徵的人員需要依照公司的制度來運行(component plug-in),每個人員應徵後需要報到,然後正式工作,依照給個人的專業知識分派工作 ....循環不已,當新的產品要製作時,這個公司可以再應徵新的 不同專業領域的資源,而且可以重新制定新的工作規則...而公司組織裡的這些運行其實都是靠制度,而這個制度相對於jabber 就是他的protocol,所以我把xmpp形容成是一個資源/運算的分散者,因此他可以建構一個基本的Grid Computing 環境,把每個運算工作分散到無限台機器上....如果你要問他可以做什麼? 事實上在jabber 的protocol 裡幾乎定義了絕大部分
的應用,voip 影音...,所有的運算資源都可以在事後 plug in 進去,google_talk 所實做的部份可能還不到整個jabber 的1/20,由此;我們可以看出他的規模/擴充性之大... 總之把他想成是一個有組織的公司,只是公司的規模有大有小罷了,xmpp 的工作資源分配真的跟這個描述很像,有機會自己實做一次體會一下囉!

javascript 縮減語法的工具


這是一個javascript 縮減語法的工具,他的原理其實就是把 javascript 內不必要的斷行 空白...去除,這個網站還提供php .net ...的語法縮減工具,縮減語法的好處可以降低網路頻寬的使用,參考google 的首頁就知道,google 不只把js 縮減,也把html 也縮減了,這樣可以節省大量網頁下載時所使用的頻寬

另外google 還會對輸出網頁做gzip 的壓縮,這一點我就比較不明白,事實上;使用cache 的機制會比做gzip來的更節省(http hrader: cache-control expire if-modify-since etag...尤其是針對不常變動的首頁),然而;ie 6 有一個bug,那就是當網頁使用gzip 時,上述的標頭都會失效(firefox 則不會有此問題),因此;這一點是我比較不解的地方


上午似乎通往對岸的網路連線都不通,我要連上http://lukeshei.javaeye.com/ 發現無法連線,以為是只有JAVAEYE伺服器當掉,後來發現;只要是通往對岸的網站,全連不上去,難道;HINET通往大陸的海纜又斷了?

ps:通往大陸的網路似乎已恢復(2007/04/15 pm1:40)

看來mybloglog的開發人員也該安裝script debugger

看來mybloglog的開發人員也該安裝script debugger(請參閱http://rd-program.blogspot.com/2007/04/js-script-debugger.html),今天進去幾個有安裝mybloglog 的網站,都發生javascript 錯誤的問題,本來我的部落格也有安裝測試,後來發現有錯誤就把它移除了



高效率的web server--lighttpd

高效率的web server -> http://www.lighttpd.net/

Its event-driven architecure is optimized for a large number of parallel connections (keep-alive) which is important for high performant AJAX applications.

目前比較主流的架構都採用epoll/kqueue 或是aio 來做 事件導向的伺服器架構,這可以平行處理大量的網路連結,而且不會像apache prefork 架構,再遇到連線多的時候,佔用過多的記憶體而導致網站停止服務,黑米書籤的網站服務就是使用 lighttpd/1.4.13

這裡有探討 C10K Problem(簡單的意思是單一機器處理1萬個連線的問題)http://www.kegel.com/c10k.html


這文章原是我的恩師在討論過程提到的一篇文章,文章裡提到一些目前c++ 遇到的問題,並且使用haskell去討論一些可能的解決方法

The Next Mainstream Programming Language: A Game Developer's Perspective
by Tim Sweeny (from Epic Games, Unreal)


這讓我想起另一篇介紹concurrency programming/erlang的文章



郵件地址貼紙可以放在blog防止spammer但找了很久都只找到 gmail yahoo ...信箱的郵件地址網路貼紙

ps:右側 (郵件地址貼紙服務(beta)) 連結

寫js 應該要安裝 Script Debugger

做網頁寫javascript 的人,應該要安裝這個工具 Script Debugger ,可以找到一些錯誤訊息,我因為有裝這個工具,常看到很多網站有javascript的錯誤而不自知,上圖就是一個例子囉!

工具可在 http://www.microsoft.com/downloads/details.aspx?familyid=2f465be0-94fd-4569-b3c4-dffdf19ccd99&displaylang=en 免費下載


現在公家機關的標案似乎都要符合無障礙網頁的檢測 ,
實測(XML+XSL)轉換後的HTML 碼符合無障礙規範,結果


from: steve_wang


使用QT做出來的類似VISUAL BASIC 的編輯器,目前可以在WIN32 SOLARIS 以及許多主要的LINUX OS 上執行,可參考 http://gambas.sourceforge.net/

JavaScript會讓Web 2.0網站資料外洩

JavaScript會讓Web 2.0網站資料外洩 CNET新聞專區  03/04/2007

原始碼檢查工具製造商Fortify Software 2日發表的報告指出,JavaScript可被用來抓取未適當防衛的Web 2.0網站資料。


JavaScript是Web 2.0盛行的要角,但惡意的JavaScript,尤其是結合了日漸普遍的網站安全瑕疵,可能引發潛藏的網路攻擊。

遭挾持的惡意JavaScript可攻擊同樣使用JavaScript的許多網路應用軟體的資料傳輸機制,未獲授權的攻擊者便可藉此讀取當中機密的資料。Whitehat Security的Jeremiah Grossman去年就利用Google Gmail的同類瑕疵示範這種攻擊。由於Gmail用未防護的JavaScript傳輸資料,攻擊者可偷取Gmail用戶的通訊錄。

Fortify檢查了12種受歡迎的網路程式編寫工具,發現只有一種倖免於難。該公司表示:「只有DWR 2.0裝設了防範JavaScript挾持的機制。其他架構並未明確地提供任何防護,也沒有在使用說明中提及任何安全顧慮。」

Fortify檢查了四種伺服器整合工具組、Direct Web Remoting(DWR)、微軟ASP.Net Ajax(Atlas)、Xajax和Google Web Toolkit(GWT),及八種客端軟體組:Prototype、Script.aculo.us、Dojo、Moo.fx、jQuery、Yahoo UI、Rico和MochiKit。

要防範JavaScript挾持,Fortify建議Web 2.0應用軟體應在每一次請求納入一個難以猜出的參數,藉以拒絕惡意的請求。此外,應防止惡徒利用合法客端的功能直接執行JavaScript。(陳智文/譯)

comet 只是配角不會是取代ajax 的技術

comet 的主要問題如下:
1.他無法支援網頁壓縮的技術,因為js stream一直都在進行中,尚未結束,所以無從壓縮也無從解壓縮
3.他會受到特殊設備的阻擋,有些防火牆/PROXY 會一直等斷線才一次送出資訊
5.他給現有的伺服器很大的挑戰(這也是謂何前一個例子;我自己寫WEB SERVER的原因),因為它的連線時間很長,但是實際資料的傳輸並不多,這不利PREFORK 架構的APACHE,目前APACHE 有幾種架構我尚未測試(MINA EVENT-MPM),之前在WIN32 上測試過THREAD-MPM , 結果並不理想

現在的WEB SERVER幾乎都把焦點放在epoll/kqueue身上,不像以前使用MUTI-THREAD/MUTI-PROCESS這類的架構,這讓COMET 這類的技術得以延續,但是基本上它只會是配角,永遠不可能變成主角,或是取代AJAX

看看什麼是comet (簡單的範例)?


如要測試;應該先開兩個以上的視窗,然後再任一個視窗按下開啟/關閉 , 你會發現,其他視窗會跟著開啟關閉,
COMET的技術大致上到此結束,但這不是我做這個測試的重點,其實我是比較想測試我自己寫的web server,目前你所連接的伺服器,是single thread server,理論上;他一秒鐘可以承受至少1,500個以上的連線,現在再找環境;如果有人有興趣,可以發mail給我(http://se2.program.com.tw:9100/alert.htm 有信箱)


comet 是ajax 下一代的技術嗎?


事實上這一篇文章有一點沒說道,iframe 沒辦法跨越網域,就跟
ajax一樣,如果是透過 script src='..' 連接使用json ,那

另外;像apache prefork這種架構,無法大量服務comet,必須使
用litespeed這種single thread 才能解決大量 且 長時間 服
務的問題,此時yaws 這類的model 就可以看出他的功效了


Both Haskell and Erlang are general-purpose functional programming languages , but they also have many differences.Haskell is a lazy, statically typed, purely functional language featuring higher-order functions, polymorphism,type classes,
and monadic effects. Erlang is a strict, dynamically typed functional programming language with built-in support for concurrency, communication, distribution,and fault-tolerance. In contrast to Haskell, which arose from an academic initiative,
Erlang was developed in the Ericsson Computer Science Laboratory, and has been
actively used in industry both within Ericsson and beyond