Apache Hadoop Stack: MapReduce, Pig, Spark, Hive

Sunday, Jun 19, 2022

Apache Hadoop Stack: MapReduce, Pig, Spark, Hive 1. HDFS CLI: load input data list all directories hadoop fs -ls make a new directory to store movie data hadoop fs -mkdir movieData copy data file from local to hdfs hadoop fs -copyFromLocal u.data movieData/u.data 2. MapReduce with Python: movies sorted by rating counts script from mrjob.job import MRJob from mrjob.step import MRStep class RatingsBreakdown(MRJob): def steps(self): return [ MRStep(mapper=self.mapper_get_movie, combiner=self.combiner_count_ratings, reducer=self.reducer_count_ratings), MRStep(reducer=self.
@ rushi
6 minutes read

Collaborative Filtering on Amazon Products With PySpark

Sunday, Jun 19, 2022

Goals Recommend top 5 products for an user: RMSE = 1.22 Data set description This is a list of over 34,000 consumer reviews for Amazon products like the Kindle, Fire TV Stick, and more provided by Datafiniti’s Product Database. The dataset includes basic product information, rating, review text, and more for each product. Note that this is a sample of a large dataset. The full dataset is available through Datafiniti. from pyspark.
@ rushi
4 minutes read
Golden Hour of Publishing Comments

Golden Hour of Publishing Comments

Sunday, Jun 19, 2022

Hacker News is a site similar to Reddit where user-submitted stories (known as “posts”) are voted on and commented on. In the tech and startup worlds, Hacker News is immensely popular, and pieces that reach the top of the site’s listings can get hundreds of thousands of views. We’ll compare these two types of posts to determine the following: 1. Do ‘Ask HN’ or ‘Show HN’ posts receive more comments on average?
@ rushi
6 minutes read

Modeling and Analysis of One Finger QWERTY Keyboard Typing Using Fiit's and Zipf's Laws

Sunday, Jun 19, 2022

Goals 1. Modeling the keyboard 2. Fiit’s law parameter estimation: r-squared = 0.709 3. Average typing time of 1000 most frequent words: 0.99 4. Zipf’s law parameter estimation & average typing time of 1000 most frequent words: 0.71 import matplotlib.pyplot as plt import math import numpy numpy.set_printoptions(precision=2) import scipy.stats as stats import statsmodels.api as sm from statsmodels.graphics.regressionplots import abline_plot import seaborn as sns 1. Keyboard modeling # Define keyboard line1 = 'qwertyuiop' line2 = 'asdfghjkl' line3 = 'zxcvbnm' # Define a keyboard as a list of keys.
@ rushi
10 minutes read

关于我

g1eny0ung 的 ❤️ 博客

记录一些 🌈 生活上,技术上的事

一名大四学生

马上(已经)毕业于 🏫 大连东软信息学院

职业是前端工程师

业余时间会做开源和 Apple App (OSX & iOS)

主要的技术栈是:

  • JavaScript & TypeScript
  • React.js
  • Electron
  • Rust

写着玩(写过):

  • Java & Clojure & CLJS
  • OCaml & Reason & ReScript
  • Dart & Swift

目前在 PingCAP 工作

– 2020 年 09 月 09 日更新

其他

如果你喜欢我的开源项目或者它们可以给你带来帮助,可以赏一杯咖啡 ☕ 给我。~

If you like my open source projects or they can help you. You can buy me a coffee ☕.~

PayPal

https://paypal.me/g1eny0ung

Patreon:

Become a Patron!

微信赞赏码

wechat

最好附加一下信息或者留言,方便我可以将捐助记录 📝 下来,十分感谢 🙏。

It is better to attach some information or leave a message so that I can record the donation 📝, thank you very much 🙏.