Thursday, July 23, 2015

10 tips for 10-minute presentations

One of the things I love most in life is helping people improve their academic talks. My advice is pretty consistent, and I thought it would be helpful to gather it here. Specifically, I've been thinking about ten-minute presentations, are often misunderstood. Below is some advice, and a video of an example talk.

The problem with short talks

There’s a lot of room for error in an hour-long talk. You have time to build a rapport with the audience, reacting to confused or sleeping faces by changing your pace or giving some extra background information. According to the peak-end principle, the audience will mostly remember the best part of the presentation and the end of it, so if you nail those you’ll be fine. People may also be there specifically to see you, so they start out primed to listen carefully and make an effort to understand. 

Not so for a ten-minute presentation: you get once chance to grab and keep the attention of an audience who might rather be playing with their phones.

In other words, you have to be better than Candy Crush.

If you only have ten minutes to speak, it’s probably because the organizers had to squeeze in too many talks into too short a time period. People will be tired of listening, will be thinking about their own talk, and will probably not have expertise in your field. So all of the decisions we make will revolve around three principles:
  • Give the audience an incentive to pay attention by being interesting or entertaining.
  • Don't make the audience work hard to follow your talk, or require prior specialized knowledge.
  • Make the most of your ten minutes by trimming anything unnecessary.
These principles apply in longer talks as well, but they're absolutely critical for short ones.

Here are some tips for putting these principles into practice:

  1. A 10-minute talk is not a shorter or faster version of a long talk. You can tell exactly one short story in ten minutes. Decide on the one thing you want your audience to take home with them, and write it down in a sentence. Don’t be afraid to repeat this sentence more than once in your talk. Your job is to provide just barely enough context to understand that story, and then tell it well. And no matter how slowly you think you're talking, you're talking faster than that. Slow down.
  2. The audience can either listen or read, but not both at the same time. Mostly you want them listening, so eliminate as much text as you can. If you’re afraid you’ll forget to say something, put it in the presenter notes. It can help to think of the slides as being there for the presenter’s benefit: they jog your memory so you can remember what comes next in the story.
  3. You will have the audience’s full attention when the talk starts. You will lose it immediately if you have a bad cover slide. Put the effort into making it tasteful but eye-catching, and spend time talking with that slide shown. That will buy you an extra minute of attention while you set up your story. Also, shorter titles are better. They're easy to understand, and the audience can listen to you instead of reading it.
  4. Don't use an outline slide. Outlines are useful when you’re going to talk about several topics and you don’t want your audience to get lost. In a short talk, you won't need to organize a complicated story, so outlines just eat the time you could be using to tell a simple one.
  5. Include animation only when it helps your audience follow the message. Make text come up a line at a time when you don’t want people reading ahead. That goes for figures too. You don't want the audience trying to digest your slide instead of listening to you.
  6. If you ask your audience to read text, make it easy on them. Using no font smaller than size 30 will not only guarantee readability, but will force you to limit the total amount of text on each slide (this includes chart labels!).  If you’re still using Comic Sans… don't.

    6.1: Related note on equations: include them only if they make the talk easier to follow. Equations are compact, efficient sentences that can be read in English. Being compact, they're difficult to unpack in real time. Don’t make the audience work that hard. Help them understand the symbols, and make it crystal clear why they lead to better understanding of the material.
  7. Spend the time and effort to make beautiful figures, and emphasize them instead of text. Help your audience understand them by pointing, and literally telling them "look over here." If you have graphs, always identify the axes out loud and teach people how to read them. Point out the important features. As usual, we want people listening instead of parsing data.
  8. End by saying “That concludes my talk. Thank you for your attention.” Don’t read your acknowledgements out loud (it eats time and gives the audience a chance to forget what they were going to ask), and don’t ask for questions (only the moderator knows how much time there is for questions).
  9. Make extensive use of supplementary slides. Paste entire presentations into the supplemental section, and load it up with equations and text. It doesn’t have to be pretty. This is your security blanket. If someone asks a question you can’t answer easily, you want to be able to find the answer in your supplementary slides. You’ll look like a genius for having thought ahead.
  10. Practice, but not to memorize your lines. Practice so you can find out what works and what doesn't, what's clear and what's not, and what you can safely trim away from the talk. Find the clunky transitions or a graphs that take too long to explain. There's no reason for a ten-minute talk not to sound smooth, relaxed, and well-oiled. If you can't get it to that point, you're trying to say too much.

Example talk

I made this talk as an example of how to implement some of the above suggestions, using a recent Python/Twitter project as subject material. It's not perfect (there was no easy way to record a pointer, for example), but hopefully you'll find it useful anyway.




Sunday, July 19, 2015

Data mining Twitter [with code]

I recently applied to the Insight Data Science Fellowship, and was invited to do a short Skype interview. The interview includes a short demo, which is supposed to show them what kinds of data I work with, and highlight some of the skills I bring to the data science table. Since I mostly work with MATLAB, I wanted to do a mini-project emphasizing some more relevant skills.

I got the email on Thursday with the interview scheduled the following Monday morning, so I needed something I could do over the weekend. Data mining Twitter seemed like a good option, since I could do it in Python (highly relevant for data science), it’s “real” data (as opposed to experimental data, I guess), and it lends itself to varied analysis including statistics and language processing. I just had to pick something to track.

Tracking #NorthFire

Friday night, there was a wildfire near Los Angeles, in the Cajon Pass. The fire jumped the highway, and about 20 cars burned. The hashtag #Northfire started trending, and I tracked it for about 14 hours, creating a database of about 5500 tweets, taking up about 32 megabytes of space.

First I’ll show the results, and then I’ll go into detail about the analysis. I’m also going to include the code I used, in case it's helpful to anyone. I made this graphic summarizing the analysis:

Summary of analysis for #NorthFire tracking over 14 hours. Thanks to my talented friend and freelance graphic artist Bethany Beams for helping me with this. She looked at my first draft and gave me some tips that improved readability substantially.

What did I find?

There’s not too much that’s surprising here. People were using the word “fire” a lot to describe the fire. Popular tweets include comparisons to Armageddon and references to exploding trucks. Standard, IMO. Tweets became less frequent as the fire raged on and people went to bed, and then picked up again when people started waking up and reading the news. One interesting thing is the popularity of the word “drone.” It turns out that some hobbyists had flown some drones in to get a closer look at the fire, which prevented the helicopters from dropping water. That’s why it’s important not to have a hobby.

Details on collection and analysis

I followed this wonderful tutorial to collect the tweets and perform some of the analysis. Collecting tweets basically involves:
  1. Registering an app with Twitter, which gives you access to their API
  2. Using Python to log on with your authentication details
  3. Using a package called “Tweepy” to open a stream and filter for a particular hashtag
  4. Saving tweets to file in the right format

Anatomy of a Tweet

A tweet is an ugly object. If you want to know how the sausage is made, look here:



It’s a database entry with the text of the tweet, the time it was created, a list of everyone involved, and about 30 other things I didn’t care about. The saved file is in JSON format, which is convenient for data science. Funny story, Twitter automatically supplies the tweets in this format, but Tweepy reformats them, so you have to manually change them back. Thanks, Tweepy.

Counting Tweets and Retweets

The next step was to count the number of originals and retweets, and save a new data file containing only the originals. This was important for the language analysis I wanted to do: having 600 retweets would seriously throw off the statistics. To find the retweets, I just looked in the text of the tweet, where retweets always begin with “RT”.

Most Common Words

I then used the file with the original tweets to track the most common words and bigrams. First, the text of each tweet has to be tokenized, where we parse the string of text into words and symbols. It’s also prudent to ignore “stop words” like “the”  and “a,” and punctuation. Python has a natural language toolkit that makes all of this pretty easy to do. Again, I used this tutorial.

Most Retweeted

Finding the most-retweeted tweets (getting tired of typing “tweets”) is similarly straightforward. I found some code here, but basically it just looks up the number of retweets for each tweet, puts them in order, and prints a list. You can set the minimum number of retweets and the number it displays.

Tweet Frequency Chart

The final thing I wanted to do was track the tweet frequency as a function of time. Each tweet contains a timestamp that reports the year, month, day, hour, minute, and second. I converted that to seconds using very straightforward code adapted from this page, and then saved all of the timestamps to a text file. I used MATLAB’s “histcounts” function to make a histogram, and plotted the counts as a line using the area plot function. In Adobe Illustrator, I recolored the histogram using the gradient tool.

Code

Here is the code on Github. Everything is a separate file in the interest of coding time. I may go turn each of the files into a function at some point. The important things are:

listen_tweets.py: Stream tweets from Twitter, filtering for a certain string. You have to put in your authentication details, like the consumer key and secret. You get those when you register an app.

discard_RT.py: For each tweet, check if it's a retweet. If not, save it to a file. Count the number of original tweets and retweets.

count_frequencies.py: Tokenize the text from all tweets in a file and find the most common words or bigrams using the natural language toolkit.

retweet_stats.py: List the most common retweets in order.

get_timestamps.py: Convert the "created_at" value from each tweet into seconds, and store all of the values to a .txt file.


That’s it for now. I hope this was helpful. Next time I’ll talk a bit about the interview.