博客内容Blog Content
个人网站数据分析及可视化 Personal Website Data Analysis and Visualization
一个将网站相关数据获取、加工、存储、分析和可视化的过程 A Process for acquiring, processing, storing, analyzing, and visualizing data related to the website
背景 Background
任何数据都是有价值的,数据工程的艺术在于如何获取这些数据,并在合理加工存储分析后,能够查询或挖掘进而产生新的价值。对于本网站的相关数据也同理,因此本文章用于记录相关数据如何使用
All data has value, and the art of data engineering lies in how to obtain this data and, after appropriate processing, storage, and analysis, make it queryable or mineable to generate new value. The same principle applies to the data related to this website, so this article is intended to document how the relevant data is used.
目前实现效果如图:
The current effect is as shown in the image:
目标 Target
数据日志记录:访问ip等
Data Log Recording: visit IPs, and more.
网站可视化分析:机器负载网站日活、dau、ip来源分析等
Website Visualization Analysis: Machine load, website daily active users (DAU), IP source analysis, and more.
架构图 Architecture Diagram
数据流 Data Flow
数据实例 Data Example
ods和dwd访问ip频率记录表 visit ip & frequence record table at ods&dwd levels
日活 DAU
SELECT log_date AS stat_date, COUNT(1) AS distinct_ip_cnt, SUM(cnt) AS total_ip_cnt FROM ods_daily_ip_cnt GROUP BY log_date ORDER BY log_date ASC;
ODS和DWD数据对比 data comparsion between ODS&DWD
SELECT log_date AS stat_date, SUM(IF(source = 'ods', distinct_ip_cnt, 0)) AS ods_distinct_ip_cnt, SUM(IF(source = 'ods', total_ip_cnt, 0)) AS ods_total_ip_cnt, SUM(IF(source = 'dwd', distinct_ip_cnt, 0)) AS dwd_distinct_ip_cnt, SUM(IF(source = 'dwd', total_ip_cnt, 0)) AS dwd_total_ip_cnt FROM ( SELECT log_date, COUNT(1) AS distinct_ip_cnt, SUM(cnt) AS total_ip_cnt, 'ods' AS source FROM ods_daily_ip_cnt GROUP BY log_date UNION ALL SELECT log_date, COUNT(1) AS distinct_ip_cnt, SUM(cnt) AS total_ip_cnt, 'dwd' AS source FROM dwd_daily_ip_cnt GROUP BY log_date ) combined GROUP BY stat_date ORDER BY stat_date ASC;