Technical skills I've used
When I was at Insight I was given a two-page list of skills to brush up on, from abstract data structures to algorithms to interview tips. I'm sure that each of these is useful across the wide range of job descriptions that go with the title of data scientist, but here are the ones that are useful to me:
- SQL: I spend a lot of time writing database queries, and my SQL coding has improved drastically. I've learned that the capabilities of the language go far beyond what's covered in online tutorials, and that there are many things that can go wrong. There are also multiple ways to accomplish the same task, and they may vary greatly in efficiency. I think the only way to learn this is through experience
- R and Python: My usage is about 40%-60% in favor of Python. I've found that R is convenient for quick manipulation of data frames. By comparison to R's dplyr package, Pandas in Python is longwinded and unintuitive. But Python is better for longer scirpting projects for a few reasons - not the least of which is that package version control is easier. The point is, learn them both.
- Microsoft Office: First of all, Excel is the bomb. I hadn't really used it since high school, but for very standard analysis like filtering, histograms, and pivot charts on small-ish data sets, it can't be beat. It blows IPython notebook out of the water for speed, and the chart styles have come a long way since 1996. PowerPoint is still the gold standard of deck-building, like it or not. And I work at a consulting company, so I build decks. PPT gets the job done.
- Machine learning: Here's a helpful hint about machine learning. Gradient boosted decision trees get you 90% of the way there 90% of the time.
What my supervisors expect and appreciate
- Technical ability is essential, but taken as a given. 90% of my job is technical in nature, but very little of the interview process or later evaluations directly tested those abilities. It's also taken as given that I will be rigorous and intellectually honest. It's in my interest and the company's to test my results at every step along the way, and to ask other people to look over things when I need a pair of eyes outside the problem.
- When I've received explicitly positive feedback, it has without exception been due to my ability to translate my results to our clients.
- Catching mistakes, before or after they happen, is crucial. I think this skill follows nicely from the skepticism that is learned from academic research.
- My workplace is a community, and my contribution to building that community is appreciated. I trust my colleagues to be highly competent and helpful, and they trust the same of me. My former PI used to say that in order to be successful in research you need (1) devotion to work, (2) creativity, and (3) the ability to work with others. You can do it with only two of these, but it's much harder. I've found that this is not true for business. You must have all three.
Where I'm going next
Day by day, I'm choosing a trajectory in data science. Through a combination of expressing interest, volunteering to take on responsibilities, and performing well on certain tasks, I'm more likely to be assigned tasks like that. To balance that, my supervisors have an incentive to make me a well-rounded employee so I can be applied to a wider scope of problems. But where should I aim to go?
Should I try to become the modeling expert in the company? Should I learn more about data engineering to be a more well-rounded technical resource? Should I aim for project management and client interaction? There is little feedback from my supervisors on this, mostly because they want me to do what I like, and to accomplish my own goals. I'll be useful to them regardless.
My inclination has always been to increase breadth of expertise, sometimes at the expense of depth. I find that by having more context I can work efficiently and be creative. It helps me aim for a few big wins instead of many small wins. I can also take on more diverse projects that way, which is part of the reason I transitioned to data science. Right now this means trying to get more client interaction, and absorbing as much domain knowledge as I can. That's a frustratingly slow process, but I'm not in a rush. I'm having fun.