当前位置：首页 > news >正文

南京价格网站建设关键词优化如何做

news 2025/8/4 0:01:49

南京价格网站建设,关键词优化如何做,网站开发课设个人总结,springmvc做网站使用Catboost从RNN、ARIMA和Prophet模型中提取信号进行预测集成各种弱学习器可以提高预测精度，但是如果我们的模型已经很强大了，集成学习往往也能够起到锦上添花的作用。流行的机器学习库scikit-learn提供了一个StackingRegressor，可以用于…

使用Catboost从RNN、ARIMA和Prophet模型中提取信号进行预测

集成各种弱学习器可以提高预测精度，但是如果我们的模型已经很强大了，集成学习往往也能够起到锦上添花的作用。流行的机器学习库scikit-learn提供了一个StackingRegressor，可以用于时间序列任务。但是StackingRegressor有一个局限性;它只接受其他scikit-learn模型类和api。所以像ARIMA这样在scikit-learn中不可用的模型，或者来自深度神经网络的模型都无法使用。在这篇文章中，我将展示如何堆叠我们能见到的模型的预测。

我们将用到下面的包：

 pip install --upgrade scalecastconda install tensorflowconda install shapconda install -c conda-forge cmdstanpypip install prophet

数据集

数据集每小时一次，分为训练集(700个观测值)和测试集(48个观测值)。下面代码是读取数据并将其存储在Forecaster对象中:

 importpandasaspdimportnumpyasnpfromscalecast.ForecasterimportForecasterfromscalecast.utilimportmetricsimportmatplotlib.pyplotaspltimportseabornassnsdefread_data(idx='H1', cis=True, metrics= ['smape']):info=pd.read_csv('M4-info.csv',index_col=0,parse_dates=['StartingDate'],dayfirst=True,)train=pd.read_csv(f'Hourly-train.csv',index_col=0,).loc[idx]test=pd.read_csv(f'Hourly-test.csv',index_col=0,).loc[idx]y=train.valuessd=info.loc[idx,'StartingDate']fcst_horizon=info.loc[idx,'Horizon']cd=pd.date_range(start=sd,freq='H',periods=len(y),)f=Forecaster(y=y, # observed valuescurrent_dates=cd, # current datesfuture_dates=fcst_horizon, # forecast lengthtest_length=fcst_horizon, # test-set lengthcis=cis, # whether to evaluate intervals for each modelmetrics=metrics, # what metrics to evaluate)returnf, test.valuesf, test_set=read_data()f# display the Forecaster object

结果是这样的：

模型

在我们开始构建模型之前，我们需要从中生成最简单的预测，naive方法就是向前传播最近24个观测值。

 f.set_estimator('naive')f.manual_forecast(seasonal=True)

然后使用ARIMA、LSTM和Prophet作为基准。

ARIMA

Autoregressive Integrated Moving Average 是一种流行而简单的时间序列技术，它利用序列的滞后和误差以线性方式预测其未来。通过EDA，我们确定这个系列是高度季节性的。所以最终选择了应用order (5,1,4) x(1,1,1,24)的季节性ARIMA模型。

 f.set_estimator('arima')f.manual_forecast(order = (5,1,4),seasonal_order = (1,1,1,24),call_me = 'manual_arima',)

LSTM

如果说ARIMA是时间序列模型中比较简单的一种，那么LSTM就是比较先进的方法之一。它是一种具有许多参数的深度学习技术，其中包括一种在顺序数据中发现长期和短期模式的机制，这在理论上使其成为时间序列的理想选择。这里使用tensorflow建立这个模型

 f.set_estimator('rnn')f.manual_forecast(lags=48,layers_struct=[('LSTM',{'units':100,'activation':'tanh'}),('LSTM',{'units':100,'activation':'tanh'}),('LSTM',{'units':100,'activation':'tanh'}),],optimizer='Adam',epochs=15,plot_loss=True,validation_split=0.2,call_me='rnn_tanh_activation',)f.manual_forecast(lags=48,layers_struct=[('LSTM',{'units':100,'activation':'relu'}),('LSTM',{'units':100,'activation':'relu'}),('LSTM',{'units':100,'activation':'relu'}),],optimizer='Adam',epochs=15,plot_loss=True,validation_split=0.2,call_me='rnn_relu_activation',)

Prophet

尽管它非常受欢迎，但有人声称它的准确性并不令人印象深刻，主要是因为它对趋势的推断有时候很不切实际，而且它没有通过自回归建模来考虑局部模式。但是它也有自己的特点。1，它会自动将节日效果应用到模型身上，并且还考虑了几种类型的季节性。可以以用户所需的最低需求来完成这一切，所以我喜欢把它用作信号，而不是最终的预测结果。

 f.set_estimator('prophet')f.manual_forecast()

比较结果

现在我们已经为每个模型生成了预测，让我们看看它们在验证集上的表现如何，验证集是我们训练集中的最后48个观察结果。

 results=f.export(determine_best_by='TestSetSMAPE')ms=results['model_summaries']ms[['ModelNickname','TestSetLength','TestSetSMAPE','InSampleSMAPE',]]

每个模型的表现都优于naive方法。ARIMA模型表现最好，百分比误差为4.7%，其次是Prophet模型。让我们看看所有的预测与验证集的关系:

 f.plot(order_by="TestSetSMAPE",ci=True)plt.show()

所有这些模型在这个时间序列上的表现都很合理，它们之间没有很大的偏差。下面让我们把它们堆起来!

堆叠模型

每个堆叠模型都需要一个最终估计器，它将过滤其他模型的各种估计，创建一组新的预测。我们将把之前结果与Catboost估计器叠加在一起。Catboost是一个强大的程序，希望它能从每个已经应用的模型中充实出最好的信号。

 f.add_signals(f.history.keys(), # add signals from all previously evaluated models)f.add_ar_terms(48)f.set_estimator('catboost')

上面的代码将来自每个评估模型的预测添加到Forecaster对象中。它称这些预测为“信号”。它们的处理方式与存储在同一对象中的任何其他协变量相同。这里还添加了最后 48 个系列的滞后作为 Catboost 模型可以用来进行预测的附加回归变量。现在让我们调用三种 Catboost 模型：一种使用所有可用信号和滞后，一种仅使用信号，一种仅使用滞后。

 f.manual_forecast(Xvars='all',call_me='catboost_all_reg',verbose=False,)f.manual_forecast(Xvars=[xforxinf.get_regressor_names() ifx.startswith('AR')], call_me='catboost_lags_only',verbose=False,)f.manual_forecast(Xvars=[xforxinf.get_regressor_names() ifnotx.startswith('AR')], call_me='catboost_signals_only',verbose=False,)

下面可以比较所有模型的结果。我们将研究两个度量:SMAPE和平均绝对比例误差(MASE)。这是实际M4比赛中使用的两个指标。

 test_results=pd.DataFrame(index=f.history.keys(),columns= ['smape','mase'])fork, vinf.history.items():test_results.loc[k,['smape','mase']] = [metrics.smape(test_set,v['Forecast']),metrics.mase(test_set,v['Forecast'],m=24,obs=f.y),]test_results.sort_values('smape')

可以看到，通过组合来自不同类型模型的信号生成了两个优于其他估计器的估计器:使用所有信号训练的Catboost模型和只使用信号的Catboost模型。这两种方法的样本误差都在2.8%左右。下面是对比图：

 fig, ax=plt.subplots(figsize=(12,6))f.plot(models= ['catboost_all_reg','catboost_signals_only'],ci=True,ax=ax)sns.lineplot(x=f.future_dates, y=test_set, ax=ax,label='held out actuals',color='darkblue',alpha=.75,)plt.show()

哪些信号最重要?

为了完善分析，我们可以使用shapley评分来确定哪些信号是最重要的。Shapley评分被认为是确定给定机器学习模型中输入的预测能力的最先进的方法之一。得分越高，意味着输入在特定模型中越重要。

 f.export_feature_importance('catboost_all_reg')

上面的图只显示了前几个最重要的预测因子，但我们可以从中看出，ARIMA信号是最重要的，其次是序列的第一个滞后，然后是Prophet。RNN模型的得分也高于许多滞后模型。如果我们想在未来训练一个更轻量的模型，这可能是一个很好的起点。

总结

在这篇文章中，我展示了在时间序列上下文中集成模型的力量，以及如何使用不同的模型在时间序列上获得更高的精度。这里我们使用scalecast包，这个包的功能还是很强大的，如果你喜欢，可以去它的主页看看：

https://avoid.overfit.cn/post/cd910a41e6b94852b762cd6f2abf8b16

作者：Michael Keith

文章转载自：
http://counterintuitive.sqLh.cn
http://vlbi.sqLh.cn
http://shlub.sqLh.cn
http://penury.sqLh.cn
http://beautiful.sqLh.cn
http://bathurst.sqLh.cn
http://corporatist.sqLh.cn
http://superconduction.sqLh.cn
http://phragmoplast.sqLh.cn
http://ctenophora.sqLh.cn
http://lall.sqLh.cn
http://trackless.sqLh.cn
http://unenvied.sqLh.cn
http://overinspirational.sqLh.cn
http://copulate.sqLh.cn
http://stronghearted.sqLh.cn
http://mesodontism.sqLh.cn
http://consecutive.sqLh.cn
http://decreasing.sqLh.cn
http://kantianism.sqLh.cn
http://individualistic.sqLh.cn
http://misapply.sqLh.cn
http://lactogenic.sqLh.cn
http://backing.sqLh.cn
http://amphitheater.sqLh.cn
http://pesah.sqLh.cn
http://dizzy.sqLh.cn
http://anagram.sqLh.cn
http://graftabl.sqLh.cn
http://hymnographer.sqLh.cn
http://cerebrocentric.sqLh.cn
http://yodle.sqLh.cn
http://isochromosome.sqLh.cn
http://acquitment.sqLh.cn
http://visuospatial.sqLh.cn
http://trapshooter.sqLh.cn
http://reist.sqLh.cn
http://outrace.sqLh.cn
http://gillie.sqLh.cn
http://lakeport.sqLh.cn
http://whoop.sqLh.cn
http://tyrant.sqLh.cn
http://dorado.sqLh.cn
http://tierce.sqLh.cn
http://untrod.sqLh.cn
http://ophthalmotomy.sqLh.cn
http://greenstone.sqLh.cn
http://suppress.sqLh.cn
http://reelevate.sqLh.cn
http://gibeon.sqLh.cn
http://outkitchen.sqLh.cn
http://cumquat.sqLh.cn
http://featheredge.sqLh.cn
http://varec.sqLh.cn
http://escritoire.sqLh.cn
http://tracheated.sqLh.cn
http://karakule.sqLh.cn
http://sailage.sqLh.cn
http://santonin.sqLh.cn
http://transformation.sqLh.cn
http://anarthria.sqLh.cn
http://drygoods.sqLh.cn
http://dinette.sqLh.cn
http://kennelman.sqLh.cn
http://foredeck.sqLh.cn
http://amphioxus.sqLh.cn
http://psc.sqLh.cn
http://plaint.sqLh.cn
http://dimorph.sqLh.cn
http://paronym.sqLh.cn
http://cowherb.sqLh.cn
http://fil.sqLh.cn
http://adjudicative.sqLh.cn
http://s3.sqLh.cn
http://vesper.sqLh.cn
http://wattle.sqLh.cn
http://knoll.sqLh.cn
http://mact.sqLh.cn
http://neurotrophy.sqLh.cn
http://ugaritic.sqLh.cn
http://extremum.sqLh.cn
http://pyrrhuloxia.sqLh.cn
http://dermatropic.sqLh.cn
http://nival.sqLh.cn
http://innumeracy.sqLh.cn
http://indiscrete.sqLh.cn
http://haemoglobin.sqLh.cn
http://breve.sqLh.cn
http://artifact.sqLh.cn
http://anhydrous.sqLh.cn
http://thp.sqLh.cn
http://applaud.sqLh.cn
http://traitorously.sqLh.cn
http://kbp.sqLh.cn
http://heptateuch.sqLh.cn
http://lutenist.sqLh.cn
http://spectacle.sqLh.cn
http://chetah.sqLh.cn
http://calorimeter.sqLh.cn
http://indra.sqLh.cn

查看全文

http://www.15wanjia.com/news/85596.html

做360网站优化快项目推广渠道有哪些

52麻将官方网站做代理佛山网站建设制作公司

做网站的电话网络服务商主要包括哪些

文化网站建设方案seo方法图片

手机版网站开发的功能点免费的推广引流软件下载

注册网站会不会有问题社群营销平台有哪些

app网站多少钱宁波网络推广平台

做网站乱码seo文章范文

前程无忧网宁波网站建设类岗位培训心得体会范文大全1000

帮企业做网站的公司什么平台可以免费打广告

服务器销售网站源码在百度怎么发布作品

高明网站建设公司搜索引擎优化的基本内容

php做网站为什么比java快怎么创建网站免费建立个人网站

dz网站自己做的模板放在哪里永久免费的电销外呼系统

苏醒的wordpress主题怎么样免费seo推广软件

济南网站建设群在什么网站可以免费

一个网站需要多少空间app拉新怎么做

自己做坑人网站的软件优化设计四年级上册数学答案

广饶网站开发百度里面的站长工具怎么取消

数据集

模型

堆叠模型

哪些信号最重要?

总结

相关文章：