随意吧: 二月 2014

2014年2月10日星期一

Cisco 3020 在HP Blade Server上 Management IP Address 配置

这个Management IP 折腾我好久了，按照HP给我的方法，在C7000上的管理界面做过更改，都没有生效，
头痛。之前设置的Management IP 是这个：

但在OA的管理界面看到的还是http://0.0.0.0 ，郁闷是相当的

后来闲来没事，折腾一下这个交换机，才发现原来还有个Fastethernet 0 接口，这个口在交换机面板上
看不到的，也不属于其中的内置16个Gib 口。在交换机上可以接口中有显示：

后来本人把这个fastethernet 端口配了个IP，保存重启一下，发现OK了（PS:在HP的文档上找不到很详细的资料，并非完全想靠摸索来解决问题）

nginx 之图片缓存服务器

大量的图片需要专业的图片服务器来存放。由于 nginx 可以取代 squid 作为代理缓存使用，今天抽空

试了一把，感觉还是不错的。看过程：

编译：增加一个 cache_purge 模块，用来清缓存。

./configure --prefix=/home/ngx_openresty --with-http_stub_status_module \
--add-module=/root/ngx_cache_purge-master/ --with-pcre=/home/tao.li1/pcre-8.30

nginx 主要配置部分：

    proxy_temp_path /dev/shm/img_temp;
    proxy_cache_path /dev/shm/img_cache levels=1:2 keys_zone=pic_cache:500m inactive=1d max_size=10g;
    server {
        listen       82;
        server_name  sample.com;
        location / {
                proxy_cache pic_cache;
                proxy_cache_valid 200 304 24h;
                proxy_cache_key $host$uri$is_args$args;
                proxy_set_header Host  $host;
                proxy_set_header X-Forwarded-For  $remote_addr;
                proxy_pass http://127.0.0.1:88;
                expires      1d;
                }
        location ~ /purge(/.*) {
            allow       127.0.0.1;
            proxy_cache_purge    pic_cache   $host$1$is_args$args;
        }
    } 
    server {
        listen 88;
        server_name 127.0.0.1;
        root /diska/htdocs/images/;
        location ~ .*\.(gif|jpg|jpeg|png|bmp|swf|ico)$ {
             expires      10s; 
             #expires      -1;
             access_log logs/88pic.log;
        }
    }

测试一下，连续两次访问：

[root@m88 logs]# curl -I 'http://127.0.0.1:82/color.png'
HTTP/1.1 200 OK
Server: ngx_openresty/1.4.3.9
Date: Thu, 02 Jan 2014 03:03:11 GMT
Content-Type: image/png
Content-Length: 892
Connection: keep-alive
Last-Modified: Thu, 12 Apr 2012 08:50:01 GMT
ETag: "4f869739-37c"
Expires: Fri, 03 Jan 2014 03:03:11 GMT
Cache-Control: max-age=86400
Accept-Ranges: bytes

[root@m88 logs]# ls -lh
total 16K
-rw-r--r-- 1 root root 530 Jan  2 11:03 88pic.log
-rw-r--r-- 1 root root 527 Jan  2 11:03 access.log
[root@m88 logs]# curl -I 'http://127.0.0.1:82/color.png'
HTTP/1.1 200 OK
Server: ngx_openresty/1.4.3.9
Date: Thu, 02 Jan 2014 03:03:22 GMT
Content-Type: image/png
Content-Length: 892
Connection: keep-alive
Last-Modified: Thu, 12 Apr 2012 08:50:01 GMT
ETag: "4f869739-37c"
Expires: Fri, 03 Jan 2014 03:03:22 GMT
Cache-Control: max-age=86400
Accept-Ranges: bytes

[root@m88 logs]# ls -lh
total 16K
-rw-r--r-- 1 root root 530 Jan  2 11:03 88pic.log
-rw-r--r-- 1 root root 701 Jan  2 11:03 access.log

通过access log 的大小，我们可以发现， 88pic.log 对 color.png 的访问已经没有新增记录，前端的文件大小有了变化，

说明第二次访问使用的已经是服务器端缓存。要清除这个缓存，只需要：

[root@m88 logs]# curl -I 'http://127.0.0.1:82/purge/color.png'

再来看一下相关细节:

1. curl 获取的 Cache-Control: max-age=86400，和nginx 配置中的前端 expires 对应，这个没有疑问。

2. 那么实际上图片缓存在服务器上的时间是多少? 还是要依赖 proxy_cache_path 中 inactive 的时间。

我自己做了几个测试，将后端 88 端口的时间修改为 -1或者no cache (也就是不缓存)，proxy_cache_path

inactive 时间和前端的 expires 都是无效的。即:

proxy_cache_path /dev/shm/img_cache levels=1:2 keys_zone=pic_cache:500m  max_size=10g;
http {
    server {
        listen 82;
        ......
        
        ......
    }
    server {
        listen 88;
        ......
        
        ......
    }
}

上面的配置注定你想要的缓存功能无法实现。

那这个优先级到底是怎样一个关系呢? 经过再后面的几次测试，发现，缓存有效的时间是以 proxy_cache_path 中的

inactive 的值和后端 88 中 expires 较小的一个值决定的，相等当然最好。也就是如果配置是这样:

proxy_cache_path /dev/shm/img_cache levels=1:2 keys_zone=pic_cache:500m  max_size=10g;
http {
    server {
        listen 82;
        ......
                ......
    }
    server {
        listen 88;
        ......
        
        ......
    }
}

缓存的有效时间是 10s (以后端的为准)；我们可以看服务器上的缓存文件(我是通过curl 模拟一次请求)，可以看到

这个缓存文件 /dev/shm/img_cache/4/39/40aaedc4a0bd8e5d6e507e7eca39b394 的时间变了:

[root@m88 logs]# curl -I '127.0.0.1:82/color.png' &&
 ls -l /dev/shm/img_cache/4/39/40aaedc4a0bd8e5d6e507e7eca39b394 &&
 sleep 90 && curl -I '127.0.0.1:82/color.png' &&
 ls -l /dev/shm/img_cache/4/39/40aaedc4a0bd8e5d6e507e7eca39b394

HTTP/1.1 200 OK
Server: ngx_openresty/1.4.3.9
Date: Thu, 02 Jan 2014 06:31:50 GMT
Content-Type: image/png
Content-Length: 892
Connection: keep-alive
Last-Modified: Thu, 12 Apr 2012 08:50:01 GMT
ETag: "4f869739-37c"
Expires: Thu, 02 Jan 2014 06:31:49 GMT
Cache-Control: no-cache
Accept-Ranges: bytes

-rw------- 1 nobody nobody 1267 Jan  2  /dev/shm/img_cache/4/39/40aaedc4a0bd8e5d6e507e7eca39b394
HTTP/1.1 200 OK
Server: ngx_openresty/1.4.3.9
Date: Thu, 02 Jan 2014 06:33:20 GMT
Content-Type: image/png
Content-Length: 892
Connection: keep-alive
Last-Modified: Thu, 12 Apr 2012 08:50:01 GMT
ETag: "4f869739-37c"
Expires: Thu, 02 Jan 2014 06:33:19 GMT
Cache-Control: no-cache
Accept-Ranges: bytes

-rw------- 1 nobody nobody 1267 Jan  2  /dev/shm/img_cache/4/39/40aaedc4a0bd8e5d6e507e7eca39b394

对比两个 access log 可以发现

[root@m88 logs]# ls -lh
total 16K
-rw-r--r-- 1 root root 875 Jan  2 14:33 88pic.log
-rw-r--r-- 1 root root 870 Jan  2 14:33 access.log

也都是两次请求。所以这个 expires 是 10s，认为过期后会从服务器上重新去取。那么如果配置是这样:

proxy_cache_path /dev/shm/img_cache levels=1:2 keys_zone=pic_cache:500m  max_size=10g;
http {
    server {
        listen 82;
        ......
        
        ......
    }
    server {
        listen 88;
        ......
        
        ......
    }
}

我们可以理解为缓存文件 /dev/shm/img_cache 在10s 内无请求，自动清除，需要重新下载一份作为缓存，所以需要访问后端。有效时间仍然是 10s。

3. 主要是关于 proxy_cache_path 中的参数:

1) levels 指定目录的层次，这个比较好理解。最多好像是三层目录。

2) keys_zone 设置缓存名字和共享内存大小；名字不多说，主要说大小，这里是针对单个nginx 进程最大可以使用多大内存来保留；

3) inactive 删除过期文件的时间，默认好像是10 分钟，如果删了，必须重新缓存；

4) max_size 最多使用的大小，这里是指定的使用目录如 /dev/shm/img_cache 大小。

[root@APP ~]# date +%Y-%m-%d
2013-11-01
[root@APP ~]# date -d "-1 month" "+%m "
10
[root@APP ~]# date -s 2013-12-01
Sun Dec 1 00:00:00 CST 2013
[root@APP ~]# date -d "-1 month" "+%m "
11
[root@APP ~]# date -s 2013-12-31
Tue Dec 31 00:00:00 CST 2013
[root@APP ~]# date -d "-1 month" "+%m "
12
[root@APP ~]# date -s 2013-12-30
Mon Dec 30 00:00:00 CST 2013
[root@APP ~]# date -d "-1 month" "+%m "
11

当今天是31 号的时候，计算就会出现失误了，我再以典型的2月为例子：
（这种计算是减去上个月的天数（例如2月的28 天）
[root@APP ~]# date -s 2014-03-28
Fri Mar 28 00:00:00 CST 2014
[root@APP ~]# date -d "-1 month" "+%m "
02
[root@APP ~]# date -s 2014-03-29
Sat Mar 29 00:00:00 CST 2014
[root@APP ~]# date -d "-1 month" "+%m "
03
[root@APP ~]#

找到一个靠谱的解决办法：

[root@APP ~]# date +%Y-%m-%d
2014-03-29
[root@APP ~]# date +%Y-%m-%d -d "-$(date +%d)days -0 month"
2014-02-28
[root@APP ~]# date -d last-month +%Y-%m-%d
2014-03-01
[root@APP ~]# date -d "-1 month" "+%Y-%m-%d"
2014-03-01

php-fpm 配置不当导致 nginx 502 错误一例

不要总看nginx 的日志文件。

cat /home/php/logs/php-fpm.log 部分内容如下

"Error: Wrong IP address ' 10.11.80.49' in listen.allowed_clients"

"Error: Connection disallowed: IP address '10.11.80.49' has been dropped."

ERROR: Connection disallowed: IP address "10.11.80.49" has been dropped

回头且看php-fpm 配置文件

cat /home/php/etc/php-fpm.conf

[global]

pid = /home/php/logs/php-fpm.pid

error_log = /home/php/logs/php-fpm.log

log_level = notice

[www]

listen = 10.11.80.49:9000

listen.allowed_clients = 10.11.80.47,10.11.80.48, 10.11.80.49

user = nobody

group = nobody

slowlog = /home/php/logs/slow.log

catch_workers_output = yes

request_terminate_timeout = 30s

request_slowlog_timeout = 5s

pm = dynamic

pm.max_children = 100

pm.start_servers = 50

pm.min_spare_servers = 5

pm.max_spare_servers = 100

就是这里

[www]

listen = 10.11.80.49:9000

listen.allowed_clients = 10.11.80.47,10.11.80.48, 10.11.80.49

user = nobody

group = nobody

netstat 看到的端口是listen 状态，启动php-fpm 无报错。

其它两台机器 47 48 也没有问题，我就奇怪了，telnet 一下测试

#telnet 10.11.80.49 9000

Tring 10.11.80.49 ...

Connected to 10.11.80.49

Esape character is '^]'.

我分别在47 和48 上测试，发现不一样的就是 49 是被直接close 掉的。

后面再看配置文件，48 后面多了一个空格。去掉重启，测试，我勒个去。

python 之图片无法显示 IOError: decoder zip not available

最近用 mezzanine 搭建了一个 blog, 后台管理 media-library 部分总是报错：

About the PIL Error — IOError: decoder zip not available

重新安装，看到安装后的结果是支持的：

*** ZLIB (PNG/ZIP) support available

重装了好几遍，还是没有用，使用模块 Pillow 的 2.3.0 版本安装后，页面可以成功打开了。

处理完那个问题，又有个新问题，就是中文的地址无法解析：

http://ip/%E4%BA%B2%E6%83%85/

'ascii' codec can't encode characters in position 35-36: ordinal not in range(128)

export LC_ALL='en_US.UTF-8'

export LC_ALL='en_US.UTF-8'

‍

mezzanine 的备份和迁移还需要关注一下。这个虚拟主机还有半年左右就到期了。

About the PIL Error — IOError: decoder zip not available

*** ZLIB (PNG/ZIP) support available

'ascii' codec can't encode characters in position 35-36: ordinal not in range(128)

export LANG='en_US.UTF-8'

export LC_ALL='en_US.UTF-8'

python 之学习检查 /etc/hosts 文件有效性

单线程版本：

#!/usr/bin/python

import os
import commands
f=open("/etc/hosts","r")

for line in f.readlines():
    ip=str(line.split()[0:1][0])
    if ip == '#' or ip == ' ':
        continue
    status, output = commands.getstatusoutput('ping -c 3 -w 5' +" " + ip)
    if status == 0:
        continue
    else:
        print "Warnning:" + ip + " is timed out!!!"
f.close

多线程版本：

#!/usr/bin/python
import os,re,sys
from threading import Thread

class testit(Thread):
   def __init__ (self,ip):
      Thread.__init__(self)
      self.ip = ip
      self.status = -1
   def run(self):
      pingaling = os.popen("ping -w5 -c2 "+self.ip,"r")
      while 1:
        line = pingaling.readline()
        if not line: break
        igot = re.findall(testit.lifeline,line)
        if igot:
            self.status = int(igot[0])
testit.lifeline = re.compile(r"(\d) received")
pinglist = []

f=open("/etc/hosts","r")

for line in f.readlines():
   ip = str(line.split()[0:1][0])
   if ip == '#' or ip == '':
        continue
   current = testit(ip)
   pinglist.append(current)
   current.start()

for pingle in pinglist:
    pingle.join()
    if pingle.status == 0:
       print pingle.ip + "is timed out!!"

lvs 超时问题

客户端 http fsopen 调用偶发遭遇timed out，dmesg 看到入口一台机器报：

nf_conntrack: table full, dropping packet.

由于 lvs 架构原因，有做 iptable 策略，增大max 的值，减少timeout 的时间：

net.nf_conntrack_max = 655360

net.netfilter.nf_conntrack_tcp_timeout_established = 1200

其它版本有可能：

net.ipv4.netfilter.ip_conntrack_max = 655350

net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 1200

too many open files 问题

系统允许用户打开文件最大数，之前修改基本上都是 limitd.conf 文件值。昨天在遭遇mysql 数据库分表连接时候，碰到一个悲剧问题。

当设置的值超过大约 1024000 时候，系统就无法 ssh 进入。这个准确的来讲，需要修改两个地方：

/proc/sys/fs/file-max 的值，这个是内核参数，直接 echo 方式修改，写入到 /etc/sysctl.conf 文件：

echo 2048 > /proc/sys/fs/file-max

2. 修改 /etc/securety/limitd.conf 文件：

* - nofile 2040

limitd.conf 文件值不能超过 file-max 的值，测试过，如果是那样的话，在你没退出 ssh 之前最好测试一下，否则

出现 ssh 无法登录悲剧。

nscd dns 缓存引起的修改hosts 不及时生效

以下一个需要修改的 ip 对于hosts，发现不生效

[root@app]# cat /etc/hosts |grep app_db.56.com
10.11.81.13 app_db.56.com
[root@app]# ping app_db.56.com
PING uevent_db.56.com (120.31.133.142) 56(84) bytes of data.

120.31.133.142 是旧的，相应的 hosts 文件对应 ip 已修改，

10.11.81.13 app_db.56.com app_db

但ping 的时候内核仍然使用旧的ip

[root@app]# ping app_db
PING app_db.56.com (10.11.81.13) 56(84) bytes of data.
64 bytes from SHNHDX81-13.opi.com (10.11.81.13): icmp_seq=1 ttl=60 time=41.3 ms
64 bytes from SHNHDX81-13.opi.com (10.11.81.13): icmp_seq=2 ttl=60 time=31.2 ms

[root@app]# ping app_db.56.com
PING app_db.56.com (10.11.81.13) 56(84) bytes of data.
64 bytes from SHNHDX81-13.opi.com (10.11.81.13): icmp_seq=1 ttl=60 time=31.9 ms
64 bytes from SHNHDX81-13.opi.com (10.11.81.13): icmp_seq=2 ttl=60 time=31.2 ms

注意到ping 的那一行和 ttl 返回的行不一样。

检查host.conf 文件

[root@ZHONGSH26-131-DX-DB etc]# vim /etc/host.conf 
multi on

使用的是 multi on ，将其修改为

order hosts,bind

还是没神马效果，指向的还是旧的 hostname ：

[root@app]# ping app2.56.com           
PING app2.56.com (10.11.81.123) 56(84) bytes of data.
^C
--- app2.56.com ping statistics ---
24 packets transmitted, 0 received, 100% packet loss, time 23447ms

[root@app]# ping 2.56.com              
PING app2.56.com (10.11.81.13) 56(84) bytes of data.
64 bytes from app2.56.com (10.11.81.13): icmp_seq=1 ttl=60 time=31.3 ms
64 bytes from app2.56.com (10.11.81.13): icmp_seq=2 ttl=60 time=31.2 ms
^C
--- app2.56.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1736ms
rtt min/avg/max/mdev = 31.250/31.292/31.335/0.181 ms
[root@ZHONGSH26-131-DX-DB etc]# cat /etc/hosts |grep app2
10.11.81.14 app2.56.com 2.56.com

怀疑是系统内核的问题，但是我任意加一个新的 ip 和对应的 hosts，是没有问题。就是将其修改过后就是不生效。

混了这几年，还是有盲区啊，这情况还是头次遇到，得想办法查出来。通过strace 来跟踪一下到底是调用了哪些文件。

[root@app]# strace -f -F -o /root/ping.txt /bin/ping app_db.56.com

预计发了两次数据包之后，我 Ctrl C 中断。看了一下那个 ping.txt 文件，果然有发现：

21292 socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 4
21292 connect(4, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = 0
21292 sendto(4, "\2\0\0\0\r\0\0\0\6\0\0\0hosts\0", 18, MSG_NOSIGNAL, NULL, 0) = 18

为什么要用这个socket 呢？ google 之，发现是一个dns 本地缓存服务。继续跟进，ps 发现是起来的：

[root@app]# ps -ef |grep nscd
nscd      3027     1  0 17:29 ?        00:00:00 /usr/sbin/nscd

配置文件就是 /etc/nscd.conf ，我看了一下，主要有这么一堆：

  enable-cache            passwd          yes
        positive-time-to-live   passwd          600
        negative-time-to-live   passwd          20
        suggested-size          passwd          211
        check-files             passwd          yes
        persistent              passwd          yes
        shared                  passwd          yes
        max-db-size             passwd          33554432
        auto-propagate          passwd          yes

        enable-cache            group           yes
        positive-time-to-live   group           3600
        negative-time-to-live   group           60
        suggested-size          group           211
        check-files             group           yes
        persistent              group           yes
        shared                  group           yes
        max-db-size             group           33554432
        auto-propagate          group           yes

        enable-cache            hosts           yes
        positive-time-to-live   hosts           3600
        negative-time-to-live   hosts           20
        suggested-size          hosts           211
        check-files             hosts           yes
        persistent              hosts           yes
        shared                  hosts           yes
        max-db-size             hosts           33554432

        enable-cache            services        yes
        positive-time-to-live   services        28800
        negative-time-to-live   services        20
        suggested-size          services        211
        check-files             services        yes
        persistent              services        yes
        shared                  services        yes
        max-db-size             services        33554432

我看是貌似全部加上了，这个服务对于我们全部基于hosts 的应用来讲，相当的无用，反而会带来一些

麻烦。顺便也打算研究一下这个服务。我先是把注释的部分拿掉，看hosts 修改是否能够及时生效：

# ping app_db.56.com
PING app_db.56.com (10.11.81.14) 56(84) bytes of data.
64 bytes from app_db.56.com (10.11.81.14): icmp_seq=1 ttl=60 time=31.3 ms

再改一下，生效：

# ping app_db.56.com 
PING app_db.56.com (10.11.81.13) 56(84) bytes of data.
64 bytes from app_db.56.com (10.11.81.13): icmp_seq=1 ttl=60 time=31.3 ms
64 bytes from app_db.56.com (10.11.81.13): icmp_seq=2 ttl=60 time=31.4 ms

附一个网上的说明：

dns缓存在服务器上的作用

　　在需要通过域名与外界进行数据交互的时候,dns缓存就派上用场了,它可以减少域名解析的时间,提高效率.例如以下情况:
　　使用爬虫采集网络上的页面数据,
　　使用auth2.0协议从其他平台(如微博或QQ)获取用户数据,
　　使用第三方支付接口,
　　使用短信通道下发短信等.

dns缓存到底能提升多少性能呢?

　　首先要看网络和dns服务器的能力,dns解析越慢,dns缓存的优势就越大.比如我们在北京用的dns服务器202.106.0.20和google的dns服务器8.8.8.8速度会差不少.
　　如果dns服务器比较稳定,那它对效率的影响就是一个常数.这个常数有多大呢?
　　我简单试了一下.在局域网内进行压力测试,压一个nginx下的静态页面,使用202.106.0.20这个dns服务器,不用dns缓存.平均一分钟可以访问27万次.压一个简单的php页面,平均一分钟可以访问22万次.加上nscd服务后,静态页面平均一分钟可以访问120万次,要快4倍多.php页面平均一分钟可以访问50万次,快一倍多.
　　如果是做搜索引擎或是一些代理服务类的项目,比如短信通道,数据推送服务,这个性能提升还是比较可观的.但在一般的项目中,一台服务器每分钟发22万次请求的情况是很少见的,所以这个性能提升也微呼其微.
　　但在追求极限的道路上,每一小步都至关重要噢~

订阅：评论 (Atom)

随意吧