博客内容Blog Content
构建网站AI小助手 Building a Website AI Assistant
我基于自己的知识库搭建了一个AI客服机器人,它能根据知识库材料和提出的问题进行相应回答(基于RAG架构和本地向量召回)I built an AI customer service bot based on my own knowledge base. It can provide appropriate answers according to the materials in the knowledge base and the questions asked (based on the RAG architecture and local vector retrieval).
概述 Overview
AI客服机器人入口:这里
AI Customer Service Bot Entry Point: Here
搭建AI客服机器人目的是因为其能有效地提升效率,一方面能帮助提问者快速找到答案,而非通过找繁琐的资料,另一方面,我也无须将维护的资料进行各种格式和排版的整理,只需放入知识库即可。
The purpose of building the AI customer service bot is its ability to effectively improve efficiency. On one hand, it helps users quickly find answers without going through cumbersome resources. On the other hand, I no longer need to organize and format the resources I maintain; I just need to place them into the knowledge base.
原理 Principle
RAG(Retrieval-Augmented Generation)模型结合了检索模型和生成模型两种技术。首先,检索模型将输入查询转化为向量,从外部知识库中检索相关文档;然后,生成模型(如 GPT 或 BART)基于这些检索到的文档生成答案或文本。RAG 的优势在于能够从外部知识库获取信息,并结合生成技术生成更准确且信息丰富的回答。
The RAG (Retrieval-Augmented Generation) model combines two techniques: a retrieval model and a generation model. First, the retrieval model transforms the input query into a vector and retrieves relevant documents from an external knowledge base. Then, the generation model (e.g., GPT or BART) generates an answer or text based on the retrieved documents. The advantage of RAG lies in its ability to retrieve information from external sources and combine it with generation models to produce more accurate and information-rich responses.
RAG 的优势在于它可以从外部知识库检索信息,并结合生成模型生成更准确、信息丰富的回答。
The advantage of RAG lies in its ability to retrieve information from an external knowledge base and combine it with a generative model to produce more accurate and information-rich responses.
但相比微调模型,RAG 存在一些不足。首先,它依赖检索模型的质量,若检索结果不准确,会影响生成效果。此外,RAG 需要持续维护外部知识库,增加了复杂性和维护成本,同时由于有检索过程,响应速度较慢。相比之下,微调模型通过直接学习任务数据,生成更快、更精准,且不依赖外部知识库。
However, compared to fine-tuned models, RAG has some drawbacks. First, it relies on the quality of the retrieval model, and inaccurate retrieval results can affect the generated output. Additionally, RAG requires ongoing maintenance of an external knowledge base, increasing complexity and maintenance costs, while also being slower due to the retrieval process. In contrast, fine-tuned models directly learn from task-specific data, generating faster and more accurate responses without relying on an external knowledge base.
架构和细节 Architecture and Details
首先利用Python Flask弄了一个服务chatbot-api部署在内网,用于维护知识库和和RAG的实现,代码可参考这里,里面对指令做了限定(只能问相关范围问题),并设定相关GPT的参数
First, a service called chatbot-api was created using Python Flask and deployed on the internal network. This service is used for maintaining a knowledge base and implementing RAG (Retrieval-Augmented Generation). You can refer to the code here, where instructions are limited (only questions within a specific scope are permitted), and relevant GPT parameters are set.
然后是外部的Springboot的web,整体架构不复杂,前端是聊天窗口进行交互,后端直接查询知识数据库进行匹配相似知识,然后把topK结果和prompt一起发给GPT接口整理获取答案
Then there is the external Spring Boot-based web application. The overall architecture is not complex: the front end provides a chat window for interactions, while the back end directly queries the knowledge database to match similar knowledge. It then sends the top-K results along with the prompt to the GPT interface, which organizes and retrieves the final answer.
总结和优化 Summary and Optimization
目前整体效果还是比较满意,一方面通过提示词成功限制了聊天范围,另一方面基于embedding的相似度匹配效果也不错
The overall effect is quite satisfactory so far. On one hand, the prompt successfully limits the scope of the conversation, and on the other hand, the embedding-based similarity matching works well.
待优化点:
这里整个组件用到了Java的SpringBoot和Python的Flask两个不同的技术组件,维护起来还是比较复杂,因为之前项目部署在tomcat上,所以前端页面还是用的SpringBoot维护,但Java本身对OpenAI的库支持又不太方便,所以得靠Python调用OpenAI的库,要是两个能用一套语言实现当然是最好
因为使用的是接口调用,所以机器人没有上下文记忆功能,导致有时问的问题效果并不好,还需要优化(需要每次对话加入上下文,比较耗token)
相关内容匹配embedding使用的是遍历匹配,当数据量大的时候效率不高,应该使用向量索引或数据库