当前位置：首页 > news >正文

给企业建设网站的流程图百度推广怎么收费标准

news 2025/7/24 4:22:51

给企业建设网站的流程图,百度推广怎么收费标准,北京城乡建设网站首页,坡头手机网站建设目录 1. 执行过程1.1 分割1.2 Map1.3 Combine1.4 Reduce 2. 代码和结果2.1 pom.xml中依赖配置2.2 工具类util2.3 WordCount2.4 结果参考 1. 执行过程假设WordCount的两个输入文本text1.txt和text2.txt如下。 Hello World Bye WorldHello Hadoop Bye Hadoop1.1 分割将每个文…

1. 执行过程

假设WordCount的两个输入文本text1.txt和text2.txt如下。

Hello World
Bye World

Hello Hadoop
Bye Hadoop

1.1 分割

将每个文件拆分成split分片，由于测试文件比较小，所以每个文件为一个split，并将文件按行分割形成<key，value>对，如下图所示。这一步由MapReduce自动完成，其中key值为偏移量，由MapReduce自动计算出来，包括回车所占的字符数。
在这里插入图片描述

1.2 Map

将分割好的<key，value>对交给用户定义的Map方法处理，生成新的<key，value>对。处理流程为先对每一行文字按空格拆分为多个单词，每个单词出现次数设初值为1，key为某个单词，value为1，如下图所示。
在这里插入图片描述

1.3 Combine

得到Map方法输出的<key，value>对后，Mapper将它们按照key值进行升序排列，并执行Combine合并过程，将key值相同的value值累加，得到Mapper的最终输出结果，并写入磁盘，如下图所示。
在这里插入图片描述

1.4 Reduce

Reducer先对从Mapper接受的数据进行排序，并将key值相同的value值合并到一个list列表中，再交由用户自定义的Reduce方法进行汇总处理，得到新的<key，value>对，并作为WordCount的输出结果，存入HDFS，如下图所示。
在这里插入图片描述

2. 代码和结果

2.1 pom.xml中依赖配置

	<dependencies><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>4.11</version><scope>test</scope></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifactId><version>3.3.6</version><exclusions><exclusion><groupId>org.slf4j</groupId><artifactId>slf4j-log4j12</artifactId></exclusion></exclusions></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-mapreduce-client-core</artifactId><version>3.3.6</version><type>pom</type></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-mapreduce-client-jobclient</artifactId><version>3.3.6</version></dependency></dependencies>

2.2 工具类util

util.removeALL的功能是删除hdfs上的指定输出路径(如果存在的话)，而util.showResult的功能是打印wordcount的结果。

import java.net.URI;
import java.util.regex.Matcher;
import java.util.regex.Pattern;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;public class util {public static FileSystem getFileSystem(String uri, Configuration conf) throws Exception {URI add = new URI(uri);return FileSystem.get(add, conf);}public static void removeALL(String uri, Configuration conf, String path) throws Exception {FileSystem fs = getFileSystem(uri, conf);if (fs.exists(new Path(path))) {boolean isDeleted = fs.delete(new Path(path), true);System.out.println("Delete Output Folder? " + isDeleted);}}public static void  showResult(String uri, Configuration conf, String path) throws Exception {FileSystem fs = getFileSystem(uri, conf);String regex = "part-r-";Pattern pattern = Pattern.compile(regex);if (fs.exists(new Path(path))) {FileStatus[] files = fs.listStatus(new Path(path));for (FileStatus file : files) {Matcher matcher = pattern.matcher(file.getPath().toString());if (matcher.find()) {FSDataInputStream openStream = fs.open(file.getPath());IOUtils.copyBytes(openStream, System.out, 1024);openStream.close();}}}}
}

2.3 WordCount

正常来说，MapReduce编程都是要把代码打包成jar文件，然后用hadoop jar jar文件名主类名称输入路径输出路径。下面代码中直接给出了输入和输出路径，可以直接运行。

import java.io.IOException;
import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class App {public static class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable> {public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {System.out.println(key + " " + value);Text keyOut;IntWritable valueOut = new IntWritable(1);StringTokenizer token = new StringTokenizer(value.toString());while (token.hasMoreTokens()) {keyOut = new Text(token.nextToken());context.write(keyOut, valueOut);}}}public static class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> {public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {int sum = 0;for (IntWritable value : values) {sum += value.get();}context.write(key, new IntWritable(sum));} }public static void main(String[] args) throws Exception {Configuration conf = new Configuration();String[] myArgs = {"file:///home/developer/CodeArtsProjects/WordCount/text1.txt", "file:///home/developer/CodeArtsProjects/WordCount/text2.txt", "hdfs://localhost:9000/user/developer/wordcount/output"};util.removeALL("hdfs://localhost:9000", conf, myArgs[myArgs.length - 1]);Job job = Job.getInstance(conf, "wordcount");job.setJarByClass(App.class);job.setMapperClass(MyMapper.class);job.setReducerClass(MyReducer.class);job.setCombinerClass(MyReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);for (int i = 0; i < myArgs.length - 1; i++) {FileInputFormat.addInputPath(job, new Path(myArgs[i]));}FileOutputFormat.setOutputPath(job, new Path(myArgs[myArgs.length - 1]));int res = job.waitForCompletion(true) ? 0 : 1;if (res == 0) {System.out.println("WordCount结果:");util.showResult("hdfs://localhost:9000", conf, myArgs[myArgs.length - 1]);}System.exit(res);}
}