博客内容Blog Content

网站访客SQL数据分析 Website Visitor SQL Data Analysis

BlogType : Website releaseTime : 2024-09-08 15:00:00

记录一下网站截至目前的一些访客的数据情况,包括国家、访问情况、以及对这些数据的一些简单分析 Record some data regarding the website's visitors up to the present, including the countries, visit details, and a brief analysis of this data.

背景 Background 

自从博客恢复运营至今,服务器上的nginx日志已经攒了一段时间的访客数据,目前使用脚本每日去重已经免费的ip解析接口解析后,对用户的数据来源做了一些查询,并对其进行分析

Since the blog resumed operations, the Nginx logs on the server have accumulated visitor data over time. Currently, using a script for daily deduplication and free IP resolution interfaces, I have performed some queries on the sources of user data and conducted an analysis on them.


分析 Analysis

目前的数据表

current data table

image.png


由于使用了不同平台的免费接口,所以country字段不统一,使用的country_code更准确,目前来访问的国家总数有:

Since I used free interfaces from different platforms, the country field is not uniform. The country_code is more accurate. So far, the total number of countries that have visited is:

image.png


查看访客最多的国家

 check the countries with the most visitors.

select country_code, count(distinct(ip)) as cnt
from dwd_daily_ip_cnt
group by country_code
order by count(distinct(ip)) desc
limit 5;

image.png

请求是来自美国和中国,符合预期

The requests are from the United States and China, as expected.



然而,日志中有大量仅简单进行一次访问的,还在相同ip号段下

However, there are a large number of logs showing only a single, simple visit, and they are under the same IP range.

image.png

这种估计是网络爬虫或者扫描器人,它们是怎么发现我网站的呢?

These are likely web crawlers or scanning bots. How are they discovering my website?



查看nginx日志核实,确实如此

After checking the Nginx logs to verify, this seems to be the case.

image.png



查看访问了七天以上的人数

Looking at the number of people who have visited for more than seven days,

select count(*) 
from (
	select ip, count(log_date) as cnt 
	from dwd_daily_ip_cnt
	group by ip
) t
where cnt >= 7
order by cnt desc;

image.png