从Nginx日志中提取UserAgent、IP等信息


有时我们需要用到大量UserAgent信息,而Nginx日志文件中包含了海量真实的UserAgent,从Nginx文件中提取所有UserAgent信息就很有必要;

Python代码实现从Nginx日志文件中提取UserAgent信息:

import re

def extract_nginx_log(log):

    regex = re.compile(
        '''(?P<remote_addr>[\d\.]{7,}) - - (?:\[(?P<datetime>[^\[\]]+)\]) "(?P<request>[^"]+)" (?P<status>\d+) (?P<size>\d+) "(?:[^"]+)" "(?P<user_agent>[^"]+)"'''
    )

    return regex.match(log).groupdict()

ua_lst = []

with open('access.log','r') as fp:
    for line in fp.readlines():
        ua_lst.append(extract_nginx_log(line)['user_agent'])

ua_lst = list(set(ua_lst)) # 简单去重

with open('user_agent.txt','w') as fp:
    for ua in ua_lst:
        fp.write(ua+'\n')

转载声明
本文版权归作者所有

如需转载,请注明出处;本文地址: https://www.perfcode.com/p/nginx-log-user-agent.html