Wednesday, May 13, 2015

Hello World!

I'm a physicist, and I want to be a data scientist. This blog documents my transition. I will now pretend that you are asking me questions.


Q: What is a data scientist?

A: Data scientists look at (typically large) collections of various kinds of data, subject them to statistical analysis, and draw descriptive, predictive, and prescriptive conclusions. Then they present those conclusions to other people. The job relies heavily on machine learning, statistics, probability, decision theory, and programming ability.


Q: Why do you want to be a data scientist?

A: I've been a data nerd for a long time, and have a... let's say "strong interest" in probability theory since about mid-grad-school. My partner says that every conversation with me ends in Bayes' Rule if it goes on long enough. I see the world mostly in mathematical models, and like explaining technical things to people, whether or not they're usually into technical things.


Q: Why do you want to leave physics?

A: I don't, in particular. But something that surprises people about me is that I've never cared much about physics. I do care about learning new things, working with smart people, and solving problems. Physics has been a wonderful place to do those things, but it's not the only place.

I'm also getting pretty discouraged about the driving motivations of academic research. Most of my research energy is derected at (1) what can get funded, and (2) what can be published, leading to more funding. It's how we can afford to eat. There was an episode of The West Wing (which I can't find right now) where someone comments that they spend all their time getting elected, hoping that they will accidentally do some good in the process. This is how I feel a lot of the time. I want to produce something that people need produced.


Q: Why are you blogging about this?

A: Two reasons. First, I work better with structure. This will hopefully force me to make progress, even if it's slow. Second, I'm going to need a portfolio to get a job. That includes projects on GitHub, competition entries at Kaggle, and it wouldn't hurt to have evidence of my progress along the way.


Q: What do you have to learn before you can be a data scientist?

A: I only sort of know the answer to this question. I definitely need either Python or C++ fluency, experience with data science problems, a good understanding of high level statistics, and a network of contacts. I may need to be familiar with SQL, R, or other specialized software.


Q: What do you have going for you?

A: I'm more than competent at MATLAB. Compared to the average physicist, I'm a good speaker, and a damn good technical writer. For whatever perverse reason, I love uncertainty and noise analysis more than almost everything.


Q: What's the plan?

A: The nice thing about starting a project is that there's a lot of low-hanging fruit. Today I made a GitHub account and did the tutorial. I'm going to take a Coursera on machine learning, and brush up on my Python. If I know you, and you know about data, I'm going to talk to you.

That's it for now. See you next time.

- b

No comments:

Post a Comment