PHP-CGI 進(jìn)程 CPU 100% 與 file_get_contents 函數(shù)的關(guān)系
大、中型網(wǎng)站中,基于 HTTP 協(xié)議的 API 接口調(diào)用,是家常便飯。PHP 程序員們喜歡使用簡單便捷的 file_get_contents("http://example.com/") 函數(shù),來獲取一個(gè) URL 的返回內(nèi)容,但是,如果 http://example.com/ 這個(gè)網(wǎng)站響應(yīng)緩慢,file_get_contents() 就會(huì)一直卡在那兒,不會(huì)超時(shí)。
我們知道,在 php.ini 中,有一個(gè)參數(shù) max_execution_time 可以設(shè)置 PHP 腳本的最大執(zhí)行時(shí)間,但是,在 php-cgi(php-fpm) 中,該參數(shù)不會(huì)起效。真正能夠控制 PHP 腳本最大執(zhí)行時(shí)間的是 php-fpm.conf 配置文件中的以下參數(shù):
- The?timeout?(in?seconds)?for?serving?a?single?request?after?which?the?worker?process?will?be?terminated????
- Should?be?used?when?'max_execution_time'?ini?option?does?not?stop?script?execution?for?some?reason????
- '0s'?means?'off'????
0s ???
默認(rèn)值為 0 秒,也就是說,PHP 腳本會(huì)一直執(zhí)行下去。這樣,當(dāng)所有的 php-cgi 進(jìn)程都卡在 file_get_contents() 函數(shù)時(shí),這臺(tái) Nginx+PHP 的 WebServer 已經(jīng)無法再處理新的 PHP 請(qǐng)求了,Nginx 將給用戶返回“502 Bad Gateway”。修改該參數(shù),設(shè)置一個(gè) PHP 腳本最大執(zhí)行時(shí)間是必要的,但是,治標(biāo)不治本。例如改成 30s,如果發(fā)生 file_get_contents() 獲取網(wǎng)頁內(nèi)容較慢的情況,這就意味著 150 個(gè) php-cgi 進(jìn)程,每秒鐘只能處理 5 個(gè)請(qǐng)求,WebServer 同樣很難避免“502 Bad Gateway”。
要做到徹底解決,只能讓 PHP 程序員們改掉直接使用 file_get_contents("http://example.com/") 的習(xí)慣,而是稍微修改一下,加個(gè)超時(shí)時(shí)間,用以下方式來實(shí)現(xiàn) HTTP GET 請(qǐng)求。要是覺得麻煩,可以自行將以下代碼封裝成一個(gè)函數(shù)。
- $ctx?=?stream_context_create(array(????
- ???'http'?=>?array(????
- ???????'timeout'?=>?1?//設(shè)置一個(gè)超時(shí)時(shí)間,單位為秒????
- ???????)????
- ???)????
- );????
- file_get_contents("http://example.com/",?0,?$ctx);????
- ?>???
當(dāng)然,導(dǎo)致 php-cgi 進(jìn)程 CPU 100% 的原因不只有這一種,那么,怎么確定是 file_get_contents() 函數(shù)導(dǎo)致的呢?
首先,使用 top 命令查看 CPU 使用率較高的 php-cgi 進(jìn)程。
- top?-?10:34:18?up?724?days,?21:01,??3?users,??load?average:?17.86,?11.16,?7.69?
- Tasks:?561?total,??15?running,?546?sleeping,???0?stopped,???0?zombie?
- Cpu(s):??5.9%us,??4.2%sy,??0.0%ni,?89.4%id,??0.2%wa,??0.0%hi,??0.2%si,??0.0%st?
- Mem:???8100996k?total,??4320108k?used,??3780888k?free,???772572k?buffers?
- Swap:??8193108k?total,????50776k?used,??8142332k?free,???412088k?cached?
- ??PID?USER??????PR??NI??VIRT??RES??SHR?S?%CPU?%MEM????TIME+??COMMAND????????????????????????????????????????????????????????????
- 10747?www???????18???0??360m??22m??12m?R?100.6?0.3????0:02.60?php-cgi????????????????????????????????????????????????????????????
- 10709?www???????16???0??359m??28m??17m?R?96.8??0.4????0:11.34?php-cgi????????????????????????????????????????????????????????????
- 10745?www???????18???0??360m??24m??14m?R?94.8??0.3????0:39.51?php-cgi????????????????????????????????????????????????????????????
- 10707?www???????18???0??360m??25m??14m?S?77.4??0.3????0:33.48?php-cgi????????????????????????????????????????????????????????????
- 10782?www???????20???0??360m??26m??15m?R?75.5??0.3????0:10.93?php-cgi????????????????????????????????????????????????????????????
- 10708?www???????25???0??360m??22m??12m?R?69.7??0.3????0:45.16?php-cgi????????????????????????????????????????????????????????????
- 10683?www???????25???0??362m??28m??15m?R?54.2??0.4????0:32.65?php-cgi????????????????????????????????????????????????????????????
- 10711?www???????25???0??360m??25m??15m?R?52.2??0.3????0:44.25?php-cgi????????????????????????????????????????????????????????????
- 10688?www???????25???0??359m??25m??15m?R?38.7??0.3????0:10.44?php-cgi????????????????????????????????????????????????????????????
- 10719?www???????25???0??360m??26m??16m?R??7.7??0.3????0:40.59?php-cgi?
找其中一個(gè) CPU 100% 的 php-cgi 進(jìn)程的 PID,用以下命令跟蹤一下:
- strace?-p?10747?
如果屏幕顯示:
- select(7,?[6],?[6],?[],?{15,?0})????????=?1?(out?[6],?left?{15,?0})?
- poll([{fd=6,?events=POLLIN}],?1,?0)?????=?0?(Timeout)?
- select(7,?[6],?[6],?[],?{15,?0})????????=?1?(out?[6],?left?{15,?0})?
- poll([{fd=6,?events=POLLIN}],?1,?0)?????=?0?(Timeout)?
- select(7,?[6],?[6],?[],?{15,?0})????????=?1?(out?[6],?left?{15,?0})?
- poll([{fd=6,?events=POLLIN}],?1,?0)?????=?0?(Timeout)?
- select(7,?[6],?[6],?[],?{15,?0})????????=?1?(out?[6],?left?{15,?0})?
- poll([{fd=6,?events=POLLIN}],?1,?0)?????=?0?(Timeout)?
- select(7,?[6],?[6],?[],?{15,?0})????????=?1?(out?[6],?left?{15,?0})?
- poll([{fd=6,?events=POLLIN}],?1,?0)?????=?0?(Timeout)?
- select(7,?[6],?[6],?[],?{15,?0})????????=?1?(out?[6],?left?{15,?0})?
- poll([{fd=6,?events=POLLIN}],?1,?0)?????=?0?(Timeout)?
- select(7,?[6],?[6],?[],?{15,?0})????????=?1?(out?[6],?left?{15,?0})?
- poll([{fd=6,?events=POLLIN}],?1,?0)?????=?0?(Timeout)?
- select(7,?[6],?[6],?[],?{15,?0})????????=?1?(out?[6],?left?{15,?0})?
- poll([{fd=6,?events=POLLIN}],?1,?0)?????=?0?(Timeout)?
- select(7,?[6],?[6],?[],?{15,?0})????????=?1?(out?[6],?left?{15,?0})?
- poll([{fd=6,?events=POLLIN}],?1,?0)?????=?0?(Timeout)?
- select(7,?[6],?[6],?[],?{15,?0})????????=?1?(out?[6],?left?{15,?0})?
- poll([{fd=6,?events=POLLIN}],?1,?0)?????=?0?(Timeout)?
那么,就可以確定是 file_get_contents() 導(dǎo)致的問題了。