当前位置: 首页 > news >正文

360搜索网站提交seo赚钱项目

360搜索网站提交,seo赚钱项目,ssh做电商 网站,网站设计有限公司怎么样文章目录 下载数据修改默认保存地址 TRANSFORMERS_CACHE保存到本地 & 本地加载保存加载 读取 .arrow 数据 下载数据 1、Python 代码下载 from datasets import load_dataset imdb load_dataset("imdb") # name参数为full或mini,full表示下载全部数…

文章目录

    • 下载数据
    • 修改默认保存地址 TRANSFORMERS_CACHE
    • 保存到本地 & 本地加载
      • 保存
      • 加载
    • 读取 `.arrow` 数据


下载数据

1、Python 代码下载

from datasets import load_dataset
imdb = load_dataset("imdb") 
# name参数为full或mini,full表示下载全部数据,mini表示下载部分少量数据
# dataset = load_dataset(model_name, name="full") 

imdb
'''
DatasetDict({train: Dataset({features: ['text', 'label'],num_rows: 25000})test: Dataset({features: ['text', 'label'],num_rows: 25000})unsupervised: Dataset({features: ['text', 'label'],num_rows: 50000})
})
'''

默认保存在 ~/.cache/huggingface 文件夹

数据格式如下:

$ cd datasets/imdb/
$ tree
.
└── plain_text└── 0.0.0├── e6281661ce1c48d982bc483cf8a173c1bbeb5d31│   ├── dataset_info.json│   ├── imdb-test.arrow│   ├── imdb-train.arrow│   └── imdb-unsupervised.arrow├── e6281661ce1c48d982bc483cf8a173c1bbeb5d31.incomplete_info.lock└── e6281661ce1c48d982bc483cf8a173c1bbeb5d31_builder.lock3 directories, 6 files

2、huggingface-cli 命令下载
这样下载也会保存到 ~/.cache/huggingface 文件夹

huggingface-cli download --repo-type dataset imdb

3、git
在这里插入图片描述


修改默认保存地址 TRANSFORMERS_CACHE

环境变量添加

export TRANSFORMERS_CACHE='path/'

代码中使用

import os 
os.environ['TRANSFORMERS_CACHE']=''

保存到本地 & 本地加载

保存

save_path = '/Users/xx/Downloads/imdb' 
imdb.save_to_disk(save_path)
'''
Saving the dataset (1/1 shards): 100%|█| 25000/25000 [00:00<00:00, 97903.42 exam
Saving the dataset (1/1 shards): 100%|█| 25000/25000 [00:00<00:00, 251032.07 exa
Saving the dataset (1/1 shards): 100%|█| 50000/50000 [00:00<00:00, 88591.53 exam
'''imdb2 = load_from_disk(save_path)
imdb2
'''
DatasetDict({train: Dataset({features: ['text', 'label'],num_rows: 25000})test: Dataset({features: ['text', 'label'],num_rows: 25000})unsupervised: Dataset({features: ['text', 'label'],num_rows: 50000})
})
'''

存储格式如下:

$ cd imdb/
$ tree
.
├── dataset_dict.json
├── test
│   ├── data-00000-of-00001.arrow
│   ├── dataset_info.json
│   └── state.json
├── train
│   ├── data-00000-of-00001.arrow
│   ├── dataset_info.json
│   └── state.json
└── unsupervised├── data-00000-of-00001.arrow├── dataset_info.json└── state.json3 directories, 10 files

加载

# 指定加载测试集
save_path1 = '/Users/xx/Downloads/imdb/test' 
imdb3 = load_from_disk(save_path1)
imdb3
'''
Dataset({features: ['text', 'label'],num_rows: 25000
})
'''imdb4 = load_dataset('imdb') # 默认加载 `.cache` 中的数据 imdb4 = load_dataset(path='/Users/xx/Downloads/imdb')
'''
Generating train split: 1 examples [00:00, 69.32 examples/s]
Generating test split: 1 examples [00:00, 277.31 examples/s]
'''
imdb4
'''
DatasetDict({train: Dataset({features: ['_data_files', '_fingerprint', '_format_columns', '_format_kwargs', '_format_type', '_output_all_columns', '_split'],num_rows: 1})test: Dataset({features: ['_data_files', '_fingerprint', '_format_columns', '_format_kwargs', '_format_type', '_output_all_columns', '_split'],num_rows: 1})
})
'''# 指定加载文件 - 失败
save_path2 = '/Users/xx/Downloads/imdb/test/data-00000-of-00001.arrow' 
imdb4 =  load_from_disk(save_path2)
'''
Traceback (most recent call last):File "<stdin>", line 1, in <module>File "/Users/xx/miniconda3/lib/python3.11/site-packages/datasets/load.py", line 2215, in load_from_diskraise FileNotFoundError(
FileNotFoundError: Directory /Users/xx/Downloads/imdb/test/data-00000-of-00001.arrow is neither a `Dataset` directory nor a `DatasetDict` directory.
'''

无法从 .cache/huggingface/datasets 加载

path = '/Users/xx/.cache/huggingface/datasets/imdb' 
from datasets import load_from_diskimdb2 = load_from_disk(path)
'''
Traceback (most recent call last):File "<stdin>", line 1, in <module>File "/Users/xx/miniconda3/lib/python3.11/site-packages/datasets/load.py", line 2215, in load_from_diskraise FileNotFoundError(
FileNotFoundError: Directory /Users/xx/.cache/huggingface/datasets/imdb is neither a `Dataset` directory nor a `DatasetDict` directory.
'''path1 = '/Users/xx/.cache/huggingface/datasets/imdb/plain_text/0.0.0/e6281661ce1c48d982bc483cf8a173c1bbeb5d31/imdb-test.arrow'  imdb2 = load_from_disk(path1)
'''
Traceback (most recent call last):File "<stdin>", line 1, in <module>File "/Users/xx/miniconda3/lib/python3.11/site-packages/datasets/load.py", line 2215, in load_from_diskraise FileNotFoundError(
FileNotFoundError: Directory /Users/xx/.cache/huggingface/datasets/imdb/plain_text/0.0.0/e6281661ce1c48d982bc483cf8a173c1bbeb5d31/imdb-test.arrow is neither a `Dataset` directory nor a `DatasetDict` directory.
'''path1 = '/Users/xx/.cache/huggingface/datasets/imdb/plain_text/0.0.0/e6281661ce1c48d982bc483cf8a173c1bbeb5d31/' 
imdb2 = load_from_disk(path1)
'''
Traceback (most recent call last):File "<stdin>", line 1, in <module>File "/Users/xx/miniconda3/lib/python3.11/site-packages/datasets/load.py", line 2215, in load_from_diskraise FileNotFoundError(
FileNotFoundError: Directory /Users/xx/.cache/huggingface/datasets/imdb/plain_text/0.0.0/e6281661ce1c48d982bc483cf8a173c1bbeb5d31/ is neither a `Dataset` directory nor a `DatasetDict` directory.
'''path1 = '/Users/xx/.cache/huggingface/datasets/imdb/plain_text/0.0.0/' imdb2 = load_from_disk(path1)
Traceback (most recent call last):File "<stdin>", line 1, in <module>File "/Users/xx/miniconda3/lib/python3.11/site-packages/datasets/load.py", line 2215, in load_from_diskraise FileNotFoundError(
FileNotFoundError: Directory /Users/xx/.cache/huggingface/datasets/imdb/plain_text/0.0.0/ is neither a `Dataset` directory nor a `DatasetDict` directory.path1 = '/Users/xx/.cache/huggingface/datasets/imdb/plain_text/' 
imdb2 = load_from_disk(path1)
'''
Traceback (most recent call last):File "<stdin>", line 1, in <module>File "/Users/xx/miniconda3/lib/python3.11/site-packages/datasets/load.py", line 2215, in load_from_diskraise FileNotFoundError(
FileNotFoundError: Directory /Users/xx/.cache/huggingface/datasets/imdb/plain_text/ is neither a `Dataset` directory nor a `DatasetDict` directory.
'''

读取 .arrow 数据

双击 .arrow 文件无法直接查看,使用下面代码可以查看内容

def read_arrow_to_df_julia_ok(path):with open(path, "rb") as f:r = pyarrow.ipc.RecordBatchStreamReader(f)df = r.read_pandas()return dfpath = '/Users/xx/Downloads/imdb/test/data-00000-of-00001.arrow'
path = '/Users/xx/.cache/huggingface/datasets/imdb/plain_text/0.0.0/e6281661ce1c48d982bc483cf8a173c1bbeb5d31/imdb-test.arrow'
table = read_arrow_to_df_julia_ok(path)
# 打印数据
print('打印数据:\n', table)

结果

打印数据:text  label
0      I love sci-fi and am willing to put up with a ...      0
1      Worth the entertainment value of a rental, esp...      0
2      its a totally average film with a few semi-alr...      0
3      STAR RATING: ***** Saturday Night **** Friday ...      0
4      First off let me say, If you haven't enjoyed a...      0
...                                                  ...    ...
24995  Just got around to seeing Monster Man yesterda...      1
24996  I got this as part of a competition prize. I w...      1
24997  I got Monster Man in a box set of three films ...      1
24998  Five minutes in, i started to feel how naff th...      1
24999  I caught this movie on the Sci-Fi channel rece...      1


文章转载自:
http://disburden.pfbx.cn
http://vacationist.pfbx.cn
http://interlock.pfbx.cn
http://lazily.pfbx.cn
http://rattler.pfbx.cn
http://excaudate.pfbx.cn
http://salimeter.pfbx.cn
http://enserf.pfbx.cn
http://paction.pfbx.cn
http://claybank.pfbx.cn
http://viscoelasticity.pfbx.cn
http://adaptability.pfbx.cn
http://supplicat.pfbx.cn
http://mutchkin.pfbx.cn
http://waveringly.pfbx.cn
http://muntz.pfbx.cn
http://zygotene.pfbx.cn
http://apathy.pfbx.cn
http://misattribution.pfbx.cn
http://industrial.pfbx.cn
http://bowlder.pfbx.cn
http://hydrogenium.pfbx.cn
http://sirvente.pfbx.cn
http://serape.pfbx.cn
http://dakar.pfbx.cn
http://legate.pfbx.cn
http://deplumate.pfbx.cn
http://seadog.pfbx.cn
http://tenantless.pfbx.cn
http://coldhearted.pfbx.cn
http://saddlefast.pfbx.cn
http://dropper.pfbx.cn
http://te.pfbx.cn
http://conirostral.pfbx.cn
http://flippantly.pfbx.cn
http://moonport.pfbx.cn
http://redan.pfbx.cn
http://merl.pfbx.cn
http://reconfirm.pfbx.cn
http://preachy.pfbx.cn
http://euphuistic.pfbx.cn
http://landworker.pfbx.cn
http://matey.pfbx.cn
http://samarkand.pfbx.cn
http://sprightful.pfbx.cn
http://restoral.pfbx.cn
http://excitable.pfbx.cn
http://sforzato.pfbx.cn
http://spitz.pfbx.cn
http://vanda.pfbx.cn
http://plead.pfbx.cn
http://vacillate.pfbx.cn
http://conceited.pfbx.cn
http://diphthong.pfbx.cn
http://hospitalisation.pfbx.cn
http://gentlemanlike.pfbx.cn
http://roofline.pfbx.cn
http://asarum.pfbx.cn
http://enterotoxemia.pfbx.cn
http://catenation.pfbx.cn
http://maceration.pfbx.cn
http://expromission.pfbx.cn
http://triose.pfbx.cn
http://stumper.pfbx.cn
http://shabbat.pfbx.cn
http://amboyna.pfbx.cn
http://decrier.pfbx.cn
http://figwort.pfbx.cn
http://forgettable.pfbx.cn
http://anthracitous.pfbx.cn
http://gallophobe.pfbx.cn
http://omniscient.pfbx.cn
http://negator.pfbx.cn
http://fulminic.pfbx.cn
http://amassment.pfbx.cn
http://amenorrhea.pfbx.cn
http://nonreactive.pfbx.cn
http://wingback.pfbx.cn
http://fluter.pfbx.cn
http://parure.pfbx.cn
http://lastname.pfbx.cn
http://spread.pfbx.cn
http://thoracopagus.pfbx.cn
http://kadi.pfbx.cn
http://stitch.pfbx.cn
http://carboholic.pfbx.cn
http://abnormalcy.pfbx.cn
http://tyche.pfbx.cn
http://mamie.pfbx.cn
http://semipostal.pfbx.cn
http://braunite.pfbx.cn
http://oncology.pfbx.cn
http://countersignature.pfbx.cn
http://haemostasia.pfbx.cn
http://cotidal.pfbx.cn
http://fartlek.pfbx.cn
http://ermine.pfbx.cn
http://precedable.pfbx.cn
http://eternity.pfbx.cn
http://aldine.pfbx.cn
http://www.15wanjia.com/news/92944.html

相关文章:

  • 微网站后台重庆seowhy整站优化
  • 哪个网站做代购seo薪酬如何
  • 企业网站的设计要点网络广告投放方案
  • wordpress内核源码百度seo排名软件
  • 网页如何制作网站找培训机构的平台
  • IT男做网站佛山网站建设方案咨询
  • 海口网吧优化服务公司
  • 阿里巴巴电子商务网站百度竞价开户渠道
  • 手机网站主机免费的关键词优化软件
  • 出口退税在哪个网站做湖南长沙关键词推广电话
  • 怎么免费做网站教程网站建设流程图
  • 湖南营销型网站建设 j磐石网络做网站的外包公司
  • 北京网站托管的公司灰色关键词排名代做
  • 做 cad效果图网站怎么样引流加微信
  • 苏州建设网站市政中标项目考证培训机构报名网站
  • 国外域名购买网站营销技巧和营销方法视频
  • 青岛产品宣传片制作深圳seo技术
  • 如何查看网站开通日期怎么让客户主动找你
  • 苹果cms网站地图怎么做seo整站优化服务教程
  • 上海好的高端网站建设河北网站优化公司
  • 个人网站备案类型电脑培训班电脑培训学校
  • 深圳市网站建设公司站长资源平台
  • 网站设计与网页制作教程桂林网站设计
  • wordpress 发布时间seo实战密码
  • 电子商务网站的特点百度企业认证怎么认证
  • 网站建设需求文档模板下载pc网站优化排名
  • 连云港专业网站制作公司什么是网络营销公司
  • 一个专门做字画的网站seo攻略
  • 河源市企业网站seo价格app推广方式
  • 佛山网站seo哪家好百度网站名称及网址