Maksim Moshkov. O sozdanii zhiznestojkogo veb-servera
---------------------------------------------------------------
Tezisy dlya seminara WebClub
Date: 17 Nov 1999
---------------------------------------------------------------
CHto mozhet sdelat' server
Obychnyj intelevyj host Linux ili FreeBSD s apachem v sostoyanii
obsluzhit' 100-150 staticheskih rekvestov v sekundu.
|to 7Mln rekvestov v sutki, chto sootvetstvuet 200 tysyach posetitelej
v sutki. Trafik generitsya pri etom 1-2M v sekundu.
Vopros, nado li vam bol'shego?
300 MaxClients sootvetstvuet 3M zaprosov v sutki, 60 tysyach chelovek.
400M RAM
Primer Lenta.Ru. Den' vyborov.
400Mb RAM, MaxClients 512, Timeout 120, CacheTime 900
500,000 cgi-html-rekvestov + 5Mln img-fajlov
v chas pik 35,000 zaprosov, 540 odnovremennyh httpd, load 8-15
swapa ne bylo
Disk: tol'ko SCSI.
Vot i vse, chto mozhno trebovat' ot mashiny.
I NIKAKOGO swap! (Imeetsya vvidu, chto u veb-servera swapoblast' byt' dolzhna,
no ona obyazana byt' pustoj)
VSEGDA STAVITX Last-Modified ATRIBUT V VYDACHU CGI-SKRIPTOV
- dokument bez vremennogo shtampa ne sohranyaetsya v lokal'nom
keshe, i postoyanno perezasasyvaetsya pri prosmotre
Pereimenovat' svoyu direktoriyu CGI-skriptov iz cgi-bin vo
chto-nibud' drugoe
- Proksi-servery ne keshiruyut URL vida
http://host.name/cgi-bin/file/name.txt i kazhdyj raz vynuzhdeny
obrashchat'sya k vam na server.
Vsegda ustanavlivat' pole Last-modified u Russkogo-Apacha s
avtomaticheskim ugadyvaniem kodirovki
+ Da, esli ne vzvodit' eto pole, to na proxy-serverah ne zastryanut
fajly v nekkorektnoj kodirovke.
- No naskol'ko napryagutsya vse ostal'nye yuzery (a ih >95%), i sam
veb-server...
CharsetDisableForcedExpires on
CacheNegotiatedDocs
Ne primenyat' avtoredirekt po charsetu v russkom Apache
CharsetNormalizeToUrl none
CharsetAutoRedirect koi8-r none
CharsetAutoRedirect windows-1251 none
Hranit' dokumenty na servere v kodirovke windows-1251
CharsetSourceEnc koi8-r
+ Poskol'ku 95% posetitelej zhivut v etoj kodirovke, dlya nih serveru
ne potrebuetsya perekodirovat' dokumenty.
- rus-apach _vsegda_ perekodiruet dokument. Dazhe win v win
Fajlam s SSI server sbrasyvaet Last-Modified, no eto lechitsya
SSI - porozhdaet dopolnitel'nuyu nagruzku. Luchshe vydelit' ih tol'ko na
otdel'noe rasshirenie .shtml, i ne trogat' chistostaticheskie .htm i .html.
V konfigure servera est' direktiva, vozvrashchayushchaya Last-Modified SSI-fajlam
XBitHack full
i vypolnit'
chmod 755 *.shtml
Frejmy ne ispol'zovat'
Uslozhnyayut programmirovanie i dobavlyayut lishnie rekvesty:
(dvuhfrejmovaya stranica - 3 fajla vmesto odnogo!)
Ne delat' superoblozhek, maksimum info v golovnuyu stranicu
Lishnij klik, poterya posetitelej, snizhenie glubiny prosmotra.
NIKAKIH ANIMATED-GIFOV
- Iz-za oshibki v Netscape-navigatore on postoyanno perezaprashivaet
animated-gif po seti, posylaya zapros na server kazhdye 10-15 sekund
Predstav'te, chto na vashu stranicu s 10 animirovannymi gifami zashlo
dvadcat' Netscape i prosto smotryat na nee ni vo chto ne klikaya.
Netscap'y sami nachnut slat' vashemu serveru IFMS-zaprosy v tempe
20 zaprosov v sekundu.
Lishnie imadzhi = poteryannye den'gi
+ Mnogie hostery ne berut deneg za traffik i razmery grafiki mozhno ne
schitat'.
- No chasto vklyuchayut schetchik na _vhodyashchij_ zarubezhnyj traffik.
Pomnite, chto sam HTTP-rekvest ot zarubezhnogo posetitelya - _vhodyashchij_
Vsego-to v nem 200-300 bajt. No esli u vas na kazhdoj stranichke po
20 gif-fajlov s oformleniem, to odin HTML-klik iz-za zagranicy obojdetsya
v 4Kb vhodnogo trafika. Pomnozhim na 10 tysyach stranichek v den', da na
30 dnej - 1.2Gb - vhodyashchej zarubezhki. 100-200 baksov - kak s kusta.
Lishnie imadzhi = zamedlennyj otklik i poteryannye posetiteli
- Mnogo dopolnitel'nyh rekvestov za grafikoj zabivayut vhodnuyu ochered',
perepolnyaya MaxClients, bolee prioritetnye zaprosy na obychnye html
vynuzhdeny stoyat' v obshchej ocheredi, zaderzhivaya otklik do 10-30 sekund.
+ Otnesti vsyu grafiku na otdel'nyj port, i na nego povesit' "hudoj"
otdel'nyj veb-server, kotoryj mozhet tol'ko obsluzhivat' staticheskie
fajly i nichego krome. V nem - sokrashchennyj TimeOut, i men'she
zhretsya virtual'noj pamyati.
+ khttpd dlya Linux - rabotaet kak modul' yadra - s minimal'nym overhedom.
http://www.fenrus.demon.nl/index.html
+ thttpd - derzhit do 2000 rekvestov/sek bez ogranicheniya chisla konnektov
pod FreeBSD na nem sdelan images.rambler.ru, pod Linux glyuchit
http://www.acme.com -> freeware
Mathopd (na nem sdelan top.list.ru)
+ Razmyshleniya/sovetami po povodu proizvoditel'nyh http-serverov:
.htaccess v yuzerskih direktoriyah otmenit'
Delaem
AllowOverride None
inache server pri otkrytii lyubogo dokumenta budet posledovatel'no
sherstit' vse vyshestoyashchie direktorii na predmet nalichiya v nih .htaccess
- Soshedshij s uma robot sobiraet neveroyatnoe kolichestvo 404 oshibok,
zaciklivayas' v nih na veki
404 kod ne delat' cgi-skriptom
404 kod ne delat' "krasivym" - s gifchikami i ukazaniyami na prochie razdely
robots.txt
Obyazatel'no delat' fajl robots.txt, potomu chto on - naibolee zaprashivaemyj
na servere dokument, i inache porozhdaet massu 404 - sm. vyshe, osobenno
esli 404 - cgi-skript
Razumnye roboty slushayutsya zapretov v fajle robots.txt
# "Skazhem NET offline-kachalkam
User-Agent: DISCo Pump, Wget, WebZIP, Teleport Pro, WebSnake, Offline Explorer, Web-By-Mail
Disallow: /
Upravlenie dostupom cherez httpd.conf
Primer perekryvaet dostup k nashim .zip fajlam esli ih
linkuyut ne s nashih stranic a snaruzhi.
SetEnvIfNoCase Referer lib\.ru internal_referer
SetEnvIfNoCase User-Agent Teleport internal_referer
SetEnvIfNoCase User-Agent Vampire internal_referer
SetEnvIfNoCase User-Agent ReGet internal_referer
SetEnvIfNoCase User-Agent GetRight internal_referer
SetEnvIfNoCase User-Agent Wget internal_referer
<Files ~ "\.zip$">
ErrorDocument 403 http://lib.ru/books/index.htm
order deny,allow
deny from all
allow from env=internal_referer
</Files>
Razvivat' ego mozhno po raznym napravleniyam: po raznomu obrabatyvat' raznyh
Us er-Agent, proveryat' IP-klienta i mnogoe drugoe, i glavnoe, chto vse eto
delaetsya ne v cgi-skripte, a na urovne bazovogo httpd - a znachit deshevo
obhoditsya serveru.
Esli robot uporstvuet, ego unichtozhayut
route add -host 123.456.789.1 gw localhost
Esli na na mod_rewrite, kak to tak - po usloviyam -
RewriteCond %{HTTP_USER_AGENT} Teleport [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MSIECrawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} DISCoFinder [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebCrawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} spider [NC,OR]
vse zaprosy ot izvestnyh robotov na dinamicheskie stranicy perenapravlyayutsya
na staticheskuyu zaglushku
RewriteRule ^/news.html? /static_index.html [R]
NC = No Case
R = redirect
L = Last rule
Naprimer - pereadresovka vseh vneshnih referorov na arhivy - na mordu sajta
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^http://(www\.lib\.ru/)|(lib\.ru/).*$ [NC]
RewriteBase /home/lib-www/docs/
RewriteRule ^arc/.*\.(zip)|(rar)$ http://www.lib.ru/ [R]
RewriteCond %{HTTP_REFERER} !^http://(www\.lib\.ru/)|(lib\.ru/).*$ [NC]
RewriteBase /home/lib-www/docs/
RewriteRule ^index2\.html$ http://www.lib.ru/ [R]
Ili tak:
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^http://allowed-site1.com*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.allowed-site1.com*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://allowed-site2.com*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.allowed-site2.com.*$ [NC]
RewriteRule ^.*$ http://site.com/another_pic.gif [R,L]
Dazhe tak:
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?domen.ru/.*$ [NC]
RewriteRule \.(gif|jpg)$ http://www.domen.ru/fuck_off.gif [R,L]
RewriteEngine on
RewriteCond %{REMOTE_ADDR} !^81.19.69.21$
RewriteRule ^(/n/.*) https://lenta.ru$1 [R,L]
RewriteEngine on
RewriteCond %{REMOTE_ADDR} !^81.19.69.28
RewriteCond %{REMOTE_ADDR} !^81.19.68.[6-9]
RewriteCond %{REMOTE_ADDR} !^81.19.68.1[012].
RewriteRule ^(/N/.*) https://lenta.ru$1 [R,L]
# Allow from 81.19.68.64/255.255.255.224
Ne stav'te bannery na samyj verh
- Banner sverhu otnimaet 1-2 rekvesta iz 4 - i v itoge gruzitsya vpered
tormozya vashi sajtovye kartinki
+ v ssylke na img src bannera vmesto hostname stav'te IP - sekonomite
posetitelyu dns-resolving - a eto 2-30 sekund.
Zaderzhka v zagruzke "vashego" soderzhimogo - derzhit u _vas_ lishnie httpd
Ni v koem sluchae ne delat' uniq-url dlya bannera s pomoshch'yu SSI virtual cgi
include. Potomu chto ps -axf pokazhet vam:
12858 ? S 0:00 \_ /usr/local/apache/sbin/httpd
12859 ? S 0:00 \_ /usr/local/apache/sbin/httpd
12862 ? S 0:00 \_ /usr/local/apache/sbin/httpd
13097 ? Z 0:00 | \_ (rand.cgi <zombie>)
13098 ? Z 0:00 | \_ (rb2 <zombie>)
13103 ? Z 0:00 | \_ (rb2 <zombie>)
13104 ? Z 0:00 | \_ (c4.pl <zombie>)
13105 ? Z 0:00 | \_ (random.cgi <zombie>)
12863 ? S 0:00 \_ /usr/local/apache/sbin/httpd
12868 ? S 0:00 \_ /usr/local/apache/sbin/httpd
Vmesto etogo ispol'zovat' var - datu
<!--#config timefmt="%H%w%e%M%S"-->
<a href=http://rb2.design.ru/cgi-bin/href/nit?<!--#echo var="date_local"-->
target="_top">
<!--#config timefmt="%M%H%S%I%e"-->
<a href=http://www1.reklama.ru/cgi-bin/href/nit?<!--#echo var="date_local"-->
target=_top>
<img src=http://www1.reklama.ru/cgi-bin/banner/nit?<!--#echo var="date_local"-->
width=468 height=60 border=0 vspace=10
alt="www.reklama.ru. The Banner Network." ismap></a>
Tx3 predlagaet vnutrennyuyu podkachku bannera: eto lishnij cgi-skript,
zatem iz skripta delaet obrashchenie k bannernomu dvizhku - eto zaderzhka
pri generacii html, a znachit - bol'she httpd visyashchih v pamyati.
200 tysyach v sekundu = 3 skripta v sekundu
30 static v sekundu =
suexec - zapusk cgi-skriptov pod yuzerskim id - da, povyshaet bezopasnost', no
udvaivaet chislo fork+exec pri zapuske lyubogo cgi-skripta. Izbegajte
naskol'ko eto vozmozhno.
Sledit', chto vkompilirovano v httpd. Da, konechno kod v unix reenterabel'nyj,
no ved' u modperl i php3 ogromnye oblasti inicializiruemyh dannyh - vse eto
zhret virtual'nuyu pamyat', i vremya na obrabotku odnogo zaprosa, da i prosto
proverka hoock'ov, na kotoruyu podvesheny moduli otnimaet vremya. Stoit li
obrabatyvat' 100 staticheskih httpd-zaprosov, dlya obsluzhivaniya kotoryh
dostatochno odnogo modulya default s pomoshch'yu 5M monstra s vkompilirovannymi v
nego modperl, php3, ssl httpd - kotoryh za eto zhe vremya potrebuetsya 2-5. Iz
100.
Konechno luchshij yazyk dlya napisaniya cgi-skriptov - perl. No on bezzhalosten k
serveru.
Perl-skripty - kompiliruyutsya pri kazhdom vyzove. Skorost' kompilyacii sil'no
zavisit, no vse ravno - eto primerno 0.1 sek na 20Kb perl-koda. Moral' -
dazhe bez ucheta na vremya raboty sobestvenno programmy 60Kb skript smozhet
vypolnit'sya ne chashche chem 2-3 raza za sekundu!
Kak vykruchivat'sya iz polozheniya?
Razbit' bol'shoj skript na mnogo melkih sostavnyh chastej i podklyuchat' ih
tol'ko kogda ukazannyj kusok koda trebuetsya pri dannom sluchae ispolneniya
koda. Dlya etogo v perl ispol'zuetsya operator "require" (|to gramotnyj analog
include - gramotnost' zaklyuchena v tom, chto reyauire - ispolnimyj operator, i
zatyagivaet dopolnitel'nyj kod tol'ko kogda on zatrebovan, a pri povtornom
ispolnenii require on ego NE perekompiliruet povtorno)
Prekompilyaciya perl. Perl2C. modperl. FastCGI...
Keshirovanie.
Mozhno sohranyat' rezul'tat raboty skripta v keshfajle i pri povtornyh zaprosa
vydavat' ego vmesto povtornoj generacii.
Po sub®ektivnym oshchushcheniyam kesh fajl luchshe vydavat' ne samim skriptom
open IN $file; while(){print;}
a vnutrennim redirektom
print "Location: http:$file\n\n";
Keshirovanie s pomoshch'yu squid v rezhime proxy-accelerator
Pozhaluj, luchshee reshenie, esli nado uskoryat' cgi-skriptovyj server. Skorost'
i nagruzka na mashinu u squid-accelerator sovpadaet s rabotoj httpd otdayushchego
staticheskie html i image fajly. A nagruzku na cgi-dvizhok on snizhaet v 2-3
raza.
Squid smozhet podderzhivat' direktivy IfModifiedSince i REGET dlya soderzhimogo
skripta, chto, ponyatnoe delo samomu v skripte delat' ochen' neveselo.
Mashiny stoyali mordami drug k drugu tak, chto vyezzhayushchaya
podstavka dlya kofe odnogo nazhimala na knopku Reset vtorogo,
i naoborot.
Predydushchaya reinkarnaciya moej lib.ru zhila v odnom korpuse s drugoj
mashinoj. Byla u nih vnutri na kolenke payanaya shema-samodelka, kotoraya
pozvolyala pitanie peredernut' sosedu.
A voobshche dlya podobnyh veshchej obychnyj smart-UPS luchshe vsego podhodit. A
komport ot UPSa nado zavodit' libo na kisku, ibo oni ne dohnut, libo na
modem i zvonit' na nego iz doma.
From: Exler
Poskol'ku ohrannik raza tri za noch' obhodil pomeshchenie na predmet
vozgoraniya (zahodil v komnatu, vklyuchal svet, obozreval pomeshchenie, vyklyuchal
svet i uhodil), k vyklyuchatelyu na noch' prisoedinyalas' knopka, kotoraya pri
nazhatii na vyklyuchatel' avtomaticheski resetila mashinu.
Konfiguracionnye parametry vliyayushchie na skorost'.
Options FollowSymLinks - pozvolyaet ne proveryat' simlinki
AllowOverride all - pozvolyaet ne iskat' .htaccess vo vseh poddirektoriyah
Ochen' vazhno! Na servere s bol'shoj poseshchaemost'yu: 1. Kartinki snesti na
vydelennyj server(port) (ili otdel'nyj process servera), i otklyuchit'
KeepAlive Off
Poskol'ku Alive ispol'zuetsya tol'ko dlya podkachki kartinok, a dlya htmlya
brouzer vse ravno otkryvaet novyj konnekt. S KeepAlive kazhdyj server obsluzhiv
pros eshche 15 sekund boltaetsya v pamyati ozhidaya, ne pridet li novyj zapros na
kartinku - uvelichivaya kolichestvo processov raza v 4.
Pereezd servera, smena ego IP-adresa
Staryj IP-adres sidit v keshah DNS dovol'no dolgo (oficial'no - do 8 chasov,
real'no - do dvuh s lishnim sutok). Vse eto vremya mnogie klienty idut po
staromu IP, na kotorom ih uzhe nikto ne zhdet - poteri posetitelej vo vremya
"ustakanivaniya DNS dostigayut ot 20 do 60%.
Vyhod: dvuhshagovaya smena IP s ispol'zovaniem redirektov.
1. SHag. Za dva dnya do real'noj smeny IP podnimaem na novom IP virtual'nyj
vebserver-zaglushku, kotoryj bydet otklikat'sya na www.washserver.ru, a v ego
konfigure stavim redirekt vseh zaprosov na http://washserver.ru
httpd.conf na novom IP-adrese:
<VirtualHost Novyj-IP:*>
ServerName www.washserver.ru
Redirect / http://washserver.ru/
</VirtualHost>
DNS-zona domena washserver.ru:
@ IN A staryj-IP
www IN A novyj-IP
Posle etogo propisyvaem v DNS dlya www.washserver.ru novyj IP,
a washserver.ru ostavlyaem starym.
Posetiteli, prishedshie na www.washserver.ru budut redirektit'sya na
washserver.ru - t.e. my nikogo ne poteryaem, i zhdem 2 sutok, poka "razojdetsya"
novyj IP dlya www.washserver.ru
CHerez 2 sutok 2 shag. Real'naya smena IP u servera. Odnovremenno s etim:
Na starom IP podnimaem virtual'nyj vebserver-zaglushku, kotoryj budet
otklikat'sya na washserver.ru, i delat' redirekt vseh zaprosov na
http://www.washserver.ru
V DNS propisyvaem washserver.ru na novyj IP
Posetiteli, prishedshie po staromu IP na washserver.ru budut redirektit'sya na
www.washserver.ru s novym IP - t.e. my nikogo ne poteryaem. A cherez 2 sutok
novyj IP dlya imeni washserver.ru razojdetsya po DNS i redirekt mozhno budet
snyat'.
httpd.conf na starom IP-adrese:
<VirtualHost Staryj-IP:*>
ServerName washserver.ru
Redirect / http://www.washserver.ru/
</VirtualHost>
DNS-zona domena washserver.ru:
@ IN A novyj-IP
www IN A novyj-IP
Last-modified: Tue, 12 Apr 2005 05:24:00 GMT