Thursday, May 21, 2015

A study plan

Now that I have a better idea of what I need to learn, it's time to make a study plan. The point is to put things in an order that makes sense, knock off the highest priorites as early as I can, and to set quantitative goals to meet.

I found a really lovely graphic (left) with an explanation of what a data scientist needs to know, and nature jobs has a fairly good article from 2013 on the subject. There are also good discussions on Quora. Taking these sources together, here are the major tasks I need to accomplish, in no particular order.


*) Get good at math, statistics, and machine learning
By "math," they mean algebra, calculus, and linear algebra. I may need to brush up on the last of these. A friend recommended this book, which is free, and has some overlab with machine learning.

*) Learn to code
Spefically, pick a first language. Python, I choose you! Learn it at Codecademy and Google Classroom. I've already gone through the Codecademy course.

*) Learn about databases
This is where they keep the data. I should at least learn SQL.

*) Learn about data munging, visualization, and reporting
Munging means putting the data into a digestable form, which I assume involves dimension reduction. I understand the principles, but need to look into it more. This category seems like it should come out organically when I start working on side projects for fun.

*) Start using Big Data
This happens after smaller projects succeed.

*) Get experience
Kaggle competitions, side projects, and the like. Has to happen once I'm comfy with the basics.

*) Internship, bootcamp, job
I'll apply for an internship, but I won't have a shot unless I make huge progress before then.

*) Engage with the community
I already read fivethirtyeight. I signed up for a couple of societies and followed some people on Twitter. That's a start. Not enough time in the day to consume the content created by popular data scientists.


Here's a plan that makes sense to me:

  1. Take the Coursera machine learning course. It's free, I have the required math chops, and I've already started it. There's about 43 hours worth of video in this course, plus assignments. They estimate 5-7 hours per week for an unknown number of weeks, so... that's not useful. Let's say three months, so I expect to be done by August.
  2. Simultaneous with (1), start programming. Python will be my default language, but I'll use something else if I have a good reason. I already have a project lined up, but it's in C++. I'll tell you about it soon.
  3. After the Machine Learning course, take more statistics. Intro to Statistics by Udacity, and/or OpenIntro Statistics.
  4. Simultaneously with (3), start entering Kaggle competitions and engaging the community there. Scale up to big data when possible.


During all of this, I'll keep up with the blog and try to post mini-projects. I think this is a good start. I have a quantitative deadline of finishing the ML course by August, and I have a project that must be done by the end of the summer that involves programming. Hopefully I have not succumbed to the planning fallacy.

- b

2 comments:

  1. This is a really nice outline. part of me wishes i didn't just get drunk and trust in my luck when i had to learn this stuff...

    ReplyDelete
  2. Well, it all turned out ok. We're results-oriented people, right?

    ReplyDelete