I have been worked on spark related projects for almost 2 years. Today I submit a small patch to spark community. Hope to be a contributor~
https://issues.apache.org/jira/browse/SPARK-21859
CyannyLive
AI and Big Data
Spark Streaming Exactly-Once Analysis
最近对Spark Streaming接触比较多,主要关注的是streaming的准确性方面的需求, 忙了快半年,不禁想问为什么需要在exactly-once上花费这么多时间呢。streaming和batch的处理逻辑有什么区别呢?我觉得streaming更适合一些简单的过滤,能在100ms以内能算完的逻辑,而这些逻辑用batch也可以算完,为什么要streaming呢?用户们更希望的是更快。如果batch也能满足低延迟的需求,streaming系统就不需要了。而问题是为什么我们需要一个单独的streaming系统?
Set Up Apache Storm on Mac in 10min
Storm is a great real time streaming system. Recently, my project is about spark streaming. I want to learn storm either to know more about streaming system. Okay, let’s fire up.
Today I tried to install storm cluster on my local mac.
It was easy to install. It will cost you about 10min.
Machine Learning Logistic Regression
Logistic Regression is for classification problem, and the predication value is fixed descrete values, such as 1 for positive or 0 for negative. The essence of logistic regression is:
- hypothesis function is sigmoid function
- cost function: J(theta)
- gradient descent and algorithms
- advantanced optimization with regularization to solve overfitting problem.
Binary Search Algorithm in Scala
One day, I wanted to use binary search in one of my feature in my project. My friend said the algorithm was not easy to implement bug free. I did’t believe that. I spent 10min to write it.
My Booklist and Reservations for 2017
一直没有写关于2016的回顾,有很多方面吧。2016年发生很多事儿的一年,对于技术上的发展也有了新的思考,搞技术不再是死磕某种工具、算法或bug,其实本质上是为了解决问题或者做更好的产品。虽然我做的不是具体的产品而是底层的工具和平台,但这些工具的出口也是依赖”pillar application”, 多想想也是好处的。
2016工作忙,读的书没有很多,但想想扎克伯克比我们还忙一年能挑战23本书确实很牛,其实自己的时间管理是不太到位的,大部分周末都懒散睡觉或者出去逛街了,回归2016年,读的书们:
Scala Collections
In scala there are many fancy collections with great utilities. Here are some key notes for scala collections which did a great help to me.
春江花月夜
有人说张若虚的这首诗很值得背诵下来
Eight Queens Problem in Scala
I have dedicated in Programming in Scala for about 4 months. My work is busy, but I can’t give up reading more books.
Scala is a fabulous language, both object oriented and functional.
Eight qeens problem can be expressed in scala easily and concise.
Machine Learning Neural Networks
This week is about the mysterious Neural Networks. The courses in this week just explain the basics about Neural Networks.
What is Neural Networks
It’s a technique to train our data based on how human brains works. A simple Neural Network has:
- input layer
- hidden layer
- output layer
We use Neural NetWorks to make classification and regression.
We use sigmoid function the map data from input layer to hidden layer then the output layer, the function is called activation function.