Archives

This is the complete archive of posts from Informatica in reverse chronological order.

25 Dec 2016

Deep Sentiment Prediction as Web Service View Comments

I have been thinking for a long while to build a web service for sentiment analysis, the idea of which is tell the emotional positivity (negativity) given a piece of text. Despite of the potentially huge and interesting applications or use-cases, we will be focusing the sentiment analysis for tweets. Basically this article is telling what happened and how.

01 Feb 2016

Spark on time series preference data View Comments

To be more general here in the introduction, the situation is that we have a user-item preference matrix which is also evolving over time. Essentially, we have a collection of user-item preference matrices, one for each time point. The preference matrix can be, for example, user’s preference on a collection of books, popularity of movies among people, effectiveness of a set keywords on a collection of campaigns. The prediction task is really to forecast a user-item preference matrix of the next time point.

31 Jan 2016

GPU computation on Amazon EC2 View Comments

Running a deep learning algorithm properly is not a big deal. We discuss the setting that allows us to run a deep learning algorithm, in particular neural stype on Amazon GPU instances.

21 Jan 2016

2015年NIPS会议中酷炫的东西 - Neural Style View Comments

NIPS是理论机器学习人工智能领域顶级的学术会议。NIPS论文的接受率相比其他顶级机器学习会议(ICML,AISTATS,ICCV,CVPR)略低。NIPS会议偏好一些理论性很强的工作,开创性和前瞻性的工作,以及里程碑性的工作。自己也曾被导师邀请审过历年NIPS的文章,审稿过程中最经常问自己的问题就是,手上这篇文章是原创性的还是建立在之前工作基础上的,跟后者沾边的文章基本夭折。有些刚刚步入机器学习和人工智能的小伙伴可能会有疑问,为什NIP文章的引用率会比其他机器学习会议文章引用率偏低,其实原因很简单嘛,因为大多数人都读不懂NIPS文章,看不懂怎么去引用啊。翻开就是一页一页数学公式的学术论文,我用真心向明月啊,我读你知道我缺不知道啊。不过我认为数学向来以简洁准确通用著称,我想这也是NIPS的魅力之一吧。身为一个机器学习人工智能的从(lan)业(ling)人(gong)员(ren),去年12月我也飞去这个又黑又冷的蒙特利尔,去让自己的智商被无数次的凌辱,凌辱,凌辱。在接下来的几个帖子里面,我会陆续总(zi)结(can)2015年NIPS的点点滴滴。

05 Jan 2016

Cool stuff in NIPS 2015 (symposium) - Neural Style View Comments

The deep learning algorithm, Neural style, is also known as neural art. Some similar algorithmic techniques have been seen in so called deep dream. It is a recent work in the filed of deep learning, and of course it’s super cool. The algorithm has been there for a few months already and I have noticed it for a while. Let’s take a close look at technology behind the scene.

01 Jan 2016

A super fabulous beginning of a super great year 2016 View Comments

Wish you all a super great New Year of 2016. Here are some pictures taken on the very first day of the year 2016 and the very last day of the year 2015 to celebrate a super great New Year of 2016.

31 Dec 2015

Data science in the next 50 years - are machine learning and statistics complementary? View Comments

The following is a brief/summary of the note ‘Are ML and Statistics Complementary?’ by Max Welling.

26 Dec 2015

Cool stuff in NIPS 2015 (workshop) - Non-convex optimization in machine learning View Comments

“When I am a grown-up I want to do non-convex optimization”. This blog post is about NIPS 2015 workshop of non-convex optimization in machine learning and some interesting papers on non-convex optimization appeared in the NIPS conference.

21 Dec 2015

A rich and dynamic December View Comments

This December appeared to be very busy but super rich and dynamic. I traveled to many places, met and chatted with many brilliant and great people, sat down and talked with a few old friends, got familiar with some cool and smart guys, and experienced the advances in modern information technology - machine learning and artificial intelligence mostly. Finally, I came back to Helsinki but realized that there is still no snow :snowflake:

20 Dec 2015

My research on machine learning and AI View Comments

I have been doing machine learning and AI for a while, about 5-6 years? My research is along the direction of Machine Learning for Structured Data. Currently I focus on developing accurate learning models for structured output prediction. In general, I am interested in

16 Dec 2015

NIPS conference 2015 View Comments

Some experience in NIPS2015 conference.

15 Dec 2015

Me View Comments

Find here some pages of slide about me as a living being :laughing:

09 Nov 2015

Xplanner in Junction Hackathon 2015 View Comments

As a team of four with Xiao, Li, and Shen, we had a great time in Junction Hackathon 2015, Helsinki.

29 Oct 2015

Teaser solution View Comments

The task is to provide an easy-to-read map for recruiters to find the top analyst. This was one of the interview quiz from Zalando.

20 Oct 2015

Pabulo, my lovely cat View Comments

As you might notice already, this is my technical blog hosted in Github. The blog documents my work and life. With my cat Pabulo, my life is not very technical.

19 Oct 2015

Spark regression models View Comments

Spark regression models

19 Oct 2015

Chinese national day celebration in China embassy Helsinki View Comments

Oh, I was invited to join the Chinese National day celebration happening in China embassy Helsinki. By the way, the invitation letter looks very impressive. Like it :thumbsup: I would assume the invitation letter is a byproduct of the outstanding Phd candidate awards :relaxed:

18 Oct 2015

Spark classification models View Comments

Spark classification models.

13 Oct 2015

Spark with Python: collaborative filtering View Comments

Spark with Python: collaborative filtering

12 Oct 2015

Feature extraction, selection and predictive modeling with Scikit View Comments

Feature extraction, selection and predictive modeling with Scikit

10 Oct 2015

Novelty detection and outlier detection with Scikit View Comments

Novelty detection and outlier detection with Scikit.

28 Aug 2015

One class classification with Scikit View Comments

One class classification with Scikit.

25 Aug 2015

Predicting transporter proteins View Comments

In bioinformatics, transporter proteins correspond to a family of proteins that are specialized to transporting various metabolites through cell walls. Understanding transporter proteins is essential in e.g., analyzing the interactive between cells and the environment, modeling the dynamics of biochemical reactions. In this recent project, we aim to reliably predict the transporter classification (TC) of an arbitrary protein with machine learning approaches. The transporter classification is a hierarchical classification scheme where each element in the hierarchy is a category of a function. The preliminary results demonstrate the proposed machine learning framework is very accurate in transporter protein prediction achieving about 99% accuracy and 98% AUC over a collection of 12000 proteins.

24 Aug 2015

Searching Algorithm View Comments

Searching algorithm.

20 Aug 2015

BFS and DFS View Comments

BFS and DFS.

18 Aug 2015

SQL related View Comments

SQL stuffs.

16 Aug 2015

Compute TF-IDF with Hadoop Python View Comments

Compute TF-IDF with Hadoop Python

15 Aug 2015

Mapreduce with Hadoop via Python with Examples View Comments

Mapreduce with Hadoop via Python with Examples

13 Aug 2015

Scikit: A machine learning package for Python View Comments

Scikit: A machine learning package for Python

12 Aug 2015

Outstanding doctoral candidate award of 2014 View Comments

On 5th August 2015 (last Wednesday), I received from our ambassador the Chinese government award for outstanding doctoral candidate of 2014. This award eventually marks the end of my life being a student :relaxed:. Now I am trying to briefly write here my 20ish years’ memories and experience just in case that I will forget some of these someday.

12 Aug 2015

Get Emoji support for Jekyll pages View Comments

Get Emoji support for Jekyll pages

03 Aug 2015

Heap View Comments

Heap

30 Jul 2015

Stack and Queue View Comments

Stack and queue.

29 Jul 2015

Recursion View Comments

29 Jul 2015

Dynamic programming related problems View Comments

Here are some interesting algorithmic problems related to dynamic programming

27 Jul 2015

Setup Hadoop on Macos View Comments

Setup Hadoop on Macos

26 Jul 2015

Spark via Python: basic setup, count lines, and word counts View Comments

This post is about how to set up Spark for Python. In particular, it shows the steps to setup Spark on an interactive cluster located in University of Helsinki, Finland. In addition, there are two super simple but classical problems: count lines in a files and word counts, together with the solution codes.

22 Jul 2015

Palindrome problems View Comments

Palindrome problems

19 Jul 2015

SQL refreshment View Comments

SQL

12 Jul 2015

Bit integer for operating large numbers View Comments

Bit integer for operating large numbers

17 Jun 2015

Feature extraction for protein sequences via InterProScan View Comments

This post aims to illustrate the installation and running InterProScan on a local machine. InterProScan is frequently used in Bioinformatics data analysis the goal of which is to extract various protein features based on sequence alignment of database search. As scanning a protein sequence is time consuming even on a local machine, we install the lookup database based on which features of known protein can be directed extracted from the database.

16 Jun 2015

Sequence alignment with NCBI-BLAST search View Comments

This post will illustrate how to install NCBI BLAST package on your local compute, how to install private sequence databases for BLAST search, how to run BLAST search with sequence databases, and how to write a parallel python script for running BLAST search no a computer cluster.

09 Jun 2015

Facebook challenge of detecting robots View Comments

This blog post contains the strategies, models and algorithms I used in Facebook challenge of detecting bots in the online bidding environment. The model I developed has an AUC score of 93.269% where the best AUC score reported in the leaderboard is about 94.254%, ranking 56 out of 1004. Everything documented here is done within about one week’s time, and mostly at night. The model can be surely improved given more time. In particular,

22 May 2015

A projected Newton method for optimizing structured output model View Comments

The slides illustrates a projected Newton method to optimize a multilabel structured output learning model with random spanning tree approximation.

21 May 2015

Spark with Python: optimization algorithms View Comments

Spark with Python: optimization algorithms

17 May 2015

Spark with Python: linear models in MLlib View Comments

Spark with Python: linear models in MLlib

15 May 2015

Some useful Coding techniques View Comments

Some useful Coding techniques

12 May 2015

Spark with Python: configuration and a simple Python script View Comments

Spark with Python: configuration and a simple Python script

11 May 2015

The quickest way to blog, GitHub + Jekyll View Comments

The quickest way to blog, GitHub + Jekyll

29 Dec 2011

Jekyll Introduction View Comments

This Jekyll introduction will outline specifically what Jekyll is and why you would want to use it. Directly following the intro we’ll learn exactly how Jekyll does what it does.