当前位置: 首页 > news >正文

肥东网站建设深圳百度快速排名优化

肥东网站建设,深圳百度快速排名优化,如何实现响应式网页,哪里可以免费制作小程序我们在现有的代码基础上增加网络搜索的功能。我们使用 requests 和 BeautifulSoup 来从百度搜索结果中提取信息。以下是完整的代码,包括项目结构、README.md 文件以及所有必要的代码。 项目结构 xihe241117/ ├── data/ │ └── train_data.jsonl ├── lo…

我们在现有的代码基础上增加网络搜索的功能。我们使用 requests 和 BeautifulSoup 来从百度搜索结果中提取信息。以下是完整的代码,包括项目结构、README.md 文件以及所有必要的代码。

项目结构

xihe241117/
├── data/
│   └── train_data.jsonl
├── logs/
├── models/
│   └── xihua_model.pth
├── requirements.txt
├── README.md
└── xihe_chatbot.py

README.md

# 羲和聊天机器人## 项目介绍
羲和聊天机器人是一个基于BERT的中文问答系统,支持用户提问并获取回答。如果模型提供的回答不满意,用户可以选择“不正确”,机器人将自动从百度搜索相关信息并提供更详细的答案。## 目录结构
xihe241117/
├── data/
│ └── train_data.jsonl
├── logs/
├── models/
│ └── xihua_model.pth
├── requirements.txt
├── README.md
└── xihe_chatbot.py## 安装依赖pip install -r requirements.txt运行项目python xihe_chatbot.py功能
用户提问
模型提供回答
用户评价回答(正确/不正确)
如果回答不正确,自动从百度搜索相关信息
查看历史记录
保存历史记录
训练模型
重新训练模型
评估模型(暂未实现)
联系我们
如有任何问题或建议,请联系 [554687453@qq.com]

requirements.txt

torch
transformers
jsonlines
tkinter
requests
beautifulsoup4

xihe_chatbot.py

import os
import json
import jsonlines
import torch
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from transformers import BertModel, BertTokenizer
import tkinter as tk
from tkinter import filedialog, messagebox, ttk
import logging
from difflib import SequenceMatcher
from datetime import datetime
import requests
from bs4 import BeautifulSoup# 获取项目根目录
PROJECT_ROOT = os.path.dirname(os.path.abspath(__file__))# 配置日志
LOGS_DIR = os.path.join(PROJECT_ROOT, 'logs')
os.makedirs(LOGS_DIR, exist_ok=True)def setup_logging():log_file = os.path.join(LOGS_DIR, datetime.now().strftime('%Y-%m-%d_%H-%M-%S_羲和.txt'))logging.basicConfig(level=logging.INFO,format='%(asctime)s - %(levelname)s - %(message)s',handlers=[logging.FileHandler(log_file),logging.StreamHandler()])setup_logging()# 数据集类
class XihuaDataset(Dataset):def __init__(self, file_path, tokenizer, max_length=128):self.tokenizer = tokenizerself.max_length = max_lengthself.data = self.load_data(file_path)def load_data(self, file_path):data = []if file_path.endswith('.jsonl'):with jsonlines.open(file_path) as reader:for i, item in enumerate(reader):try:data.append(item)except jsonlines.jsonlines.InvalidLineError as e:logging.warning(f"跳过无效行 {i + 1}: {e}")elif file_path.endswith('.json'):with open(file_path, 'r') as f:try:data = json.load(f)except json.JSONDecodeError as e:logging.warning(f"跳过无效文件 {file_path}: {e}")return datadef __len__(self):return len(self.data)def __getitem__(self, idx):item = self.data[idx]question = item['question']human_answer = item['human_answers'][0]chatgpt_answer = item['chatgpt_answers'][0]try:inputs = self.tokenizer(question, return_tensors='pt', padding='max_length', truncation=True, max_length=self.max_length)human_inputs = self.tokenizer(human_answer, return_tensors='pt', padding='max_length', truncation=True, max_length=self.max_length)chatgpt_inputs = self.tokenizer(chatgpt_answer, return_tensors='pt', padding='max_length', truncation=True, max_length=self.max_length)except Exception as e:logging.warning(f"跳过无效项 {idx}: {e}")return self.__getitem__((idx + 1) % len(self.data))return {'input_ids': inputs['input_ids'].squeeze(),'attention_mask': inputs['attention_mask'].squeeze(),'human_input_ids': human_inputs['input_ids'].squeeze(),'human_attention_mask': human_inputs['attention_mask'].squeeze(),'chatgpt_input_ids': chatgpt_inputs['input_ids'].squeeze(),'chatgpt_attention_mask': chatgpt_inputs['attention_mask'].squeeze(),'human_answer': human_answer,'chatgpt_answer': chatgpt_answer}# 获取数据加载器
def get_data_loader(file_path, tokenizer, batch_size=8, max_length=128):dataset = XihuaDataset(file_path, tokenizer, max_length)return DataLoader(dataset, batch_size=batch_size, shuffle=True)# 模型定义
class XihuaModel(torch.nn.Module):def __init__(self, pretrained_model_name='F:/models/bert-base-chinese'):super(XihuaModel, self).__init__()self.bert = BertModel.from_pretrained(pretrained_model_name)self.classifier = torch.nn.Linear(self.bert.config.hidden_size, 1)def forward(self, input_ids, attention_mask):outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)pooled_output = outputs.pooler_outputlogits = self.classifier(pooled_output)return logits# 训练函数
def train(model, data_loader, optimizer, criterion, device, progress_var=None):model.train()total_loss = 0.0num_batches = len(data_loader)for batch_idx, batch in enumerate(data_loader):try:input_ids = batch['input_ids'].to(device)attention_mask = batch['attention_mask'].to(device)human_input_ids = batch['human_input_ids'].to(device)human_attention_mask = batch['human_attention_mask'].to(device)chatgpt_input_ids = batch['chatgpt_input_ids'].to(device)chatgpt_attention_mask = batch['chatgpt_attention_mask'].to(device)optimizer.zero_grad()human_logits = model(human_input_ids, human_attention_mask)chatgpt_logits = model(chatgpt_input_ids, chatgpt_attention_mask)human_labels = torch.ones(human_logits.size(0), 1).to(device)chatgpt_labels = torch.zeros(chatgpt_logits.size(0), 1).to(device)loss = criterion(human_logits, human_labels) + criterion(chatgpt_logits, chatgpt_labels)loss.backward()optimizer.step()total_loss += loss.item()if progress_var:progress_var.set((batch_idx + 1) / num_batches * 100)except Exception as e:logging.warning(f"跳过无效批次: {e}")return total_loss / len(data_loader)# 主训练函数
def main_train(retrain=False):device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')logging.info(f'Using device: {device}')tokenizer = BertTokenizer.from_pretrained('F:/models/bert-base-chinese')model = XihuaModel(pretrained_model_name='F:/models/bert-base-chinese').to(device)if retrain:model_path = os.path.join(PROJECT_ROOT, 'models/xihua_model.pth')if os.path.exists(model_path):model.load_state_dict(torch.load(model_path, map_location=device))logging.info("加载现有模型")else:logging.info("没有找到现有模型,将使用预训练模型")optimizer = optim.Adam(model.parameters(), lr=1e-5)criterion = torch.nn.BCEWithLogitsLoss()train_data_loader = get_data_loader(os.path.join(PROJECT_ROOT, 'data/train_data.jsonl'), tokenizer, batch_size=8, max_length=128)num_epochs = 30for epoch in range(num_epochs):train_loss = train(model, train_data_loader, optimizer, criterion, device)logging.info(f'Epoch [{epoch+1}/{num_epochs}], Loss: {train_loss:.8f}')torch.save(model.state_dict(), os.path.join(PROJECT_ROOT, 'models/xihua_model.pth'))logging.info("模型训练完成并保存")# 网络搜索函数
def search_baidu(query):url = f"https://www.baidu.com/s?wd={query}"headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}response = requests.get(url, headers=headers)soup = BeautifulSoup(response.text, 'html.parser')results = soup.find_all('div', class_='c-abstract')if results:return results[0].get_text().strip()return "没有找到相关信息"# GUI界面
class XihuaChatbotGUI:def __init__(self, root):self.root = rootself.root.title("羲和聊天机器人")self.tokenizer = BertTokenizer.from_pretrained('F:/models/bert-base-chinese')self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')self.model = XihuaModel(pretrained_model_name='F:/models/bert-base-chinese').to(self.device)self.load_model()self.model.eval()# 加载训练数据集以便在获取答案时使用self.data = self.load_data(os.path.join(PROJECT_ROOT, 'data/train_data.jsonl'))# 历史记录self.history = []self.create_widgets()def create_widgets(self):# 顶部框架top_frame = tk.Frame(self.root)top_frame.pack(pady=10)self.question_label = tk.Label(top_frame, text="问题:", font=("Arial", 12))self.question_label.grid(row=0, column=0, padx=10)self.question_entry = tk.Entry(top_frame, width=50, font=("Arial", 12))self.question_entry.grid(row=0, column=1, padx=10)self.answer_button = tk.Button(top_frame, text="获取回答", command=self.get_answer, font=("Arial", 12))self.answer_button.grid(row=0, column=2, padx=10)# 中部框架middle_frame = tk.Frame(self.root)middle_frame.pack(pady=10)self.answer_label = tk.Label(middle_frame, text="回答:", font=("Arial", 12))self.answer_label.grid(row=0, column=0, padx=10)self.answer_text = tk.Text(middle_frame, height=10, width=70, font=("Arial", 12))self.answer_text.grid(row=1, column=0, padx=10)# 底部框架bottom_frame = tk.Frame(self.root)bottom_frame.pack(pady=10)self.correct_button = tk.Button(bottom_frame, text="准确", command=self.mark_correct, font=("Arial", 12))self.correct_button.grid(row=0, column=0, padx=10)self.incorrect_button = tk.Button(bottom_frame, text="不准确", command=self.mark_incorrect, font=("Arial", 12))self.incorrect_button.grid(row=0, column=1, padx=10)self.train_button = tk.Button(bottom_frame, text="训练模型", command=self.train_model, font=("Arial", 12))self.train_button.grid(row=0, column=2, padx=10)self.retrain_button = tk.Button(bottom_frame, text="重新训练模型", command=lambda: self.train_model(retrain=True), font=("Arial", 12))self.retrain_button.grid(row=0, column=3, padx=10)self.progress_var = tk.DoubleVar()self.progress_bar = ttk.Progressbar(bottom_frame, variable=self.progress_var, maximum=100, length=200)self.progress_bar.grid(row=1, column=0, columnspan=4, pady=10)self.log_text = tk.Text(bottom_frame, height=10, width=70, font=("Arial", 12))self.log_text.grid(row=2, column=0, columnspan=4, pady=10)self.evaluate_button = tk.Button(bottom_frame, text="评估模型", command=self.evaluate_model, font=("Arial", 12))self.evaluate_button.grid(row=3, column=0, padx=10, pady=10)self.history_button = tk.Button(bottom_frame, text="查看历史记录", command=self.view_history, font=("Arial", 12))self.history_button.grid(row=3, column=1, padx=10, pady=10)self.save_history_button = tk.Button(bottom_frame, text="保存历史记录", command=self.save_history, font=("Arial", 12))self.save_history_button.grid(row=3, column=2, padx=10, pady=10)def get_answer(self):question = self.question_entry.get()if not question:messagebox.showwarning("输入错误", "请输入问题")returninputs = self.tokenizer(question, return_tensors='pt', padding='max_length', truncation=True, max_length=128)with torch.no_grad():input_ids = inputs['input_ids'].to(self.device)attention_mask = inputs['attention_mask'].to(self.device)logits = self.model(input_ids, attention_mask)if logits.item() > 0:answer_type = "羲和回答"else:answer_type = "零回答"specific_answer = self.get_specific_answer(question, answer_type)self.answer_text.delete(1.0, tk.END)self.answer_text.insert(tk.END, f"{answer_type}\n{specific_answer}")# 添加到历史记录self.history.append({'question': question,'answer_type': answer_type,'specific_answer': specific_answer,'accuracy': None  # 初始状态为未评价})def get_specific_answer(self, question, answer_type):# 使用模糊匹配查找最相似的问题best_match = Nonebest_ratio = 0.0for item in self.data:ratio = SequenceMatcher(None, question, item['question']).ratio()if ratio > best_ratio:best_ratio = ratiobest_match = itemif best_match:if answer_type == "羲和回答":return best_match['human_answers'][0]else:return best_match['chatgpt_answers'][0]return "这个我也不清楚,你问问零吧"def load_data(self, file_path):data = []if file_path.endswith('.jsonl'):with jsonlines.open(file_path) as reader:for i, item in enumerate(reader):try:data.append(item)except jsonlines.jsonlines.InvalidLineError as e:logging.warning(f"跳过无效行 {i + 1}: {e}")elif file_path.endswith('.json'):with open(file_path, 'r') as f:try:data = json.load(f)except json.JSONDecodeError as e:logging.warning(f"跳过无效文件 {file_path}: {e}")return datadef load_model(self):model_path = os.path.join(PROJECT_ROOT, 'models/xihua_model.pth')if os.path.exists(model_path):self.model.load_state_dict(torch.load(model_path, map_location=self.device))logging.info("加载现有模型")else:logging.info("没有找到现有模型,将使用预训练模型")def train_model(self, retrain=False):file_path = filedialog.askopenfilename(filetypes=[("JSONL files", "*.jsonl"), ("JSON files", "*.json")])if not file_path:messagebox.showwarning("文件选择错误", "请选择一个有效的数据文件")returntry:dataset = XihuaDataset(file_path, self.tokenizer)data_loader = DataLoader(dataset, batch_size=8, shuffle=True)# 加载已训练的模型权重if retrain:self.model.load_state_dict(torch.load(os.path.join(PROJECT_ROOT, 'models/xihua_model.pth'), map_location=self.device))self.model.to(self.device)self.model.train()optimizer = torch.optim.Adam(self.model.parameters(), lr=1e-5)criterion = torch.nn.BCEWithLogitsLoss()num_epochs = 30for epoch in range(num_epochs):train_loss = train(self.model, data_loader, optimizer, criterion, self.device, self.progress_var)logging.info(f'Epoch [{epoch+1}/{num_epochs}], Loss: {train_loss:.4f}')self.log_text.insert(tk.END, f'Epoch [{epoch+1}/{num_epochs}], Loss: {train_loss:.4f}\n')self.log_text.see(tk.END)torch.save(self.model.state_dict(), os.path.join(PROJECT_ROOT, 'models/xihua_model.pth'))logging.info("模型训练完成并保存")self.log_text.insert(tk.END, "模型训练完成并保存\n")self.log_text.see(tk.END)messagebox.showinfo("训练完成", "模型训练完成并保存")except Exception as e:logging.error(f"模型训练失败: {e}")self.log_text.insert(tk.END, f"模型训练失败: {e}\n")self.log_text.see(tk.END)messagebox.showerror("训练失败", f"模型训练失败: {e}")def evaluate_model(self):# 这里可以添加模型评估的逻辑messagebox.showinfo("评估结果", "模型评估功能暂未实现")def mark_correct(self):if self.history:self.history[-1]['accuracy'] = Truemessagebox.showinfo("评价成功", "您认为这次回答是准确的")def mark_incorrect(self):if self.history:self.history[-1]['accuracy'] = Falsequestion = self.history[-1]['question']answer = search_baidu(question)self.answer_text.delete(1.0, tk.END)self.answer_text.insert(tk.END, f"搜索引擎结果:\n{answer}")messagebox.showinfo("评价成功", "您认为这次回答是不准确的")def view_history(self):history_window = tk.Toplevel(self.root)history_window.title("历史记录")history_text = tk.Text(history_window, height=20, width=80, font=("Arial", 12))history_text.pack(padx=10, pady=10)for entry in self.history:history_text.insert(tk.END, f"问题: {entry['question']}\n")history_text.insert(tk.END, f"回答类型: {entry['answer_type']}\n")history_text.insert(tk.END, f"具体回答: {entry['specific_answer']}\n")if entry['accuracy'] is None:history_text.insert(tk.END, "评价: 未评价\n")elif entry['accuracy']:history_text.insert(tk.END, "评价: 准确\n")else:history_text.insert(tk.END, "评价: 不准确\n")history_text.insert(tk.END, "-" * 50 + "\n")def save_history(self):file_path = filedialog.asksaveasfilename(defaultextension=".json", filetypes=[("JSON files", "*.json")])if not file_path:returnwith open(file_path, 'w') as f:json.dump(self.history, f, ensure_ascii=False, indent=4)messagebox.showinfo("保存成功", "历史记录已保存到文件")# 主函数
if __name__ == "__main__":# 启动GUIroot = tk.Tk()app = XihuaChatbotGUI(root)root.mainloop()

说明
网络搜索功能:增加了 search_baidu 函数,用于从百度搜索相关信息。
标记不准确回答:在 mark_incorrect 方法中,如果用户标记回答为不准确,将调用 search_baidu 函数获取更详细的信息并显示在文本框中。
项目结构:确保项目结构与 README.md 中描述的一致。
这样,您的聊天机器人不仅可以通过模型提供回答,还可以在网络搜索中获取更详细的信息,提高用户体验。


文章转载自:
http://stakhanovism.rkLs.cn
http://matrass.rkLs.cn
http://bindwood.rkLs.cn
http://telium.rkLs.cn
http://figurehead.rkLs.cn
http://lobsterback.rkLs.cn
http://radiotherapeutics.rkLs.cn
http://doorjamb.rkLs.cn
http://antilope.rkLs.cn
http://pickaroon.rkLs.cn
http://dislikable.rkLs.cn
http://improvable.rkLs.cn
http://abirritation.rkLs.cn
http://acumination.rkLs.cn
http://tunnellike.rkLs.cn
http://glossolaryngeal.rkLs.cn
http://pratt.rkLs.cn
http://contemporaneity.rkLs.cn
http://thitherto.rkLs.cn
http://chaplain.rkLs.cn
http://fogy.rkLs.cn
http://paxwax.rkLs.cn
http://ariose.rkLs.cn
http://lully.rkLs.cn
http://taxidermy.rkLs.cn
http://colourcast.rkLs.cn
http://oysterwoman.rkLs.cn
http://manganiferous.rkLs.cn
http://yawping.rkLs.cn
http://dacoity.rkLs.cn
http://practise.rkLs.cn
http://condonation.rkLs.cn
http://rhythmite.rkLs.cn
http://unhysterical.rkLs.cn
http://periplast.rkLs.cn
http://della.rkLs.cn
http://tanyard.rkLs.cn
http://bagworm.rkLs.cn
http://unbeknown.rkLs.cn
http://skegger.rkLs.cn
http://inhesion.rkLs.cn
http://finnish.rkLs.cn
http://keerect.rkLs.cn
http://barren.rkLs.cn
http://juvie.rkLs.cn
http://capella.rkLs.cn
http://ble.rkLs.cn
http://platform.rkLs.cn
http://druggery.rkLs.cn
http://enculturation.rkLs.cn
http://resaid.rkLs.cn
http://sunghua.rkLs.cn
http://weltschmerz.rkLs.cn
http://macrophyllous.rkLs.cn
http://susceptibly.rkLs.cn
http://foliiferous.rkLs.cn
http://surfacely.rkLs.cn
http://pinfeather.rkLs.cn
http://freshener.rkLs.cn
http://consummately.rkLs.cn
http://rangoon.rkLs.cn
http://demonologic.rkLs.cn
http://acetaldehydase.rkLs.cn
http://outstride.rkLs.cn
http://overcoat.rkLs.cn
http://marine.rkLs.cn
http://broadtail.rkLs.cn
http://motopia.rkLs.cn
http://pandh.rkLs.cn
http://regimentals.rkLs.cn
http://impledge.rkLs.cn
http://theism.rkLs.cn
http://naeb.rkLs.cn
http://gainless.rkLs.cn
http://gasolene.rkLs.cn
http://faithless.rkLs.cn
http://semiconical.rkLs.cn
http://lucerne.rkLs.cn
http://narcoleptic.rkLs.cn
http://redevelop.rkLs.cn
http://sneering.rkLs.cn
http://pronunciation.rkLs.cn
http://corneous.rkLs.cn
http://kaki.rkLs.cn
http://felting.rkLs.cn
http://microevolution.rkLs.cn
http://recognizor.rkLs.cn
http://fulminate.rkLs.cn
http://erythritol.rkLs.cn
http://chasmophyte.rkLs.cn
http://pgup.rkLs.cn
http://knapweed.rkLs.cn
http://biosynthesis.rkLs.cn
http://chronometric.rkLs.cn
http://annunciation.rkLs.cn
http://applaud.rkLs.cn
http://succussatory.rkLs.cn
http://cystostomy.rkLs.cn
http://aponeurosis.rkLs.cn
http://miaow.rkLs.cn
http://www.15wanjia.com/news/76661.html

相关文章:

  • 网站专题页面案例seo是什么服
  • 深圳市土方建设网站百度产品
  • 网上做批发有哪些网站百度点击软件找名风
  • ae做动画教程网站5188关键词挖掘工具
  • 织梦网站做seo优化seo网络推广技术
  • 那个网站做旅游规划好百度手机助手app下载官网
  • 服装公司网站源码自己可以做网站推广吗
  • 网站设计师薪资全网营销推广案例
  • 网站静态首页模板东莞seo网络营销
  • 证书查询网免费查询青岛网站关键词排名优化
  • 什么是网络营销?它的内涵包括哪几个层次?宁波seo网络推广优化价格
  • 安徽中色十二冶金建设有限公司网站网站快速排名上
  • wordpress淘客插件破解阿亮seo技术
  • 沈阳网站建站公司网站源码平台
  • 导航 网站 分析seo推广岗位职责
  • 宜兴做阿里巴巴网站网址收录网站
  • 网站建设公司前十名销售crm客户管理系统
  • 程序开发用什么笔记本武汉网络优化知名乐云seo
  • 怎么把自己的网站推广出去百度指数只能查90天吗
  • 宁波网站优化价格2022新闻大事件摘抄
  • 网站自然排名如何在百度上做广告
  • 浙江建设工程招标网seo职位描述
  • 网站首页适合vue做吗郑州seo外包顾问
  • 重庆施工许可证查询系统厦门seo厦门起梦
  • 网站推广工作内容百度收录提交入口网址是什么
  • 东莞网站建设推广服务关键词优化多少钱
  • 蜘蛛网是个什么网站成都网站seo
  • 电子商务网站设计原理真题2019在线外链
  • 单页网站怎么优化搜索seo是什么意思
  • 广告推广话术太原网站seo