DSR Day 13 – You need a Corgie!

Did you know that Corgies are from Wales, and that the Welsh apparently say that in the old days, fairies used them to ride on? And that the white spot on the back of many Corgies is hence called a “fairy saddle”?
You didn’t? Well now you know, and so do I – thanks to today’s presentation training. We’ve covered Corgies, How Shazam works (super interesting!), Sleep (and the lack thereof), Random Graphs, and How to master a new skill.

What have we learned from that?

If you can choose a topic, pick something everybody can relate to.
Everybody likes Corgies! And everybody suffers from a lack of sleep. Also, if you can find something remotely scientific (if presenting for a lay audience) – people will love that. Science papers just look impressive. Careful with scientists though: I personally felt the urge to just look for a paper that claims the opposite of the one presented. And knew there would be one.

Make it personal.
Somebody told his personal story today, the story of how he wanted to learn touch typing. He also told how he failed, and what made him finally succeed. A good personal drama just works.

Be concise.
Goethe (the german Shakespeare) once wrote in a letter: “Please forgive me for sending such a long letter, I simply hat no time to write a short one.” Running overtime is annoying because it means your talk was not well planned, and you don’t value your audience’s time.

Talk to your audience.
Don’t use the “whiteboard of death” (teacher’s words) because it’s difficult to talk and write at the same time, and you will inevitable talk to the whiteboard. Personally I don’t mind whiteboards. You just shut up while you’re writing, then turn around and get the attention back on you. No big problem at all.
Also remove obstacles between you and the audience. A stand doesn’t help you.

– Apart from that…the usual: Visually appealing, non-cluttered slides. Know how to pronounce specific terms (Poisson ≠ Poison!), don’t mumble or stutter if you’re thinking – just make a strategic pause. Be funny.

– Also (my opinion), a bit of jargon (a.k.a. buzzwords) does some good. If you only have ten minutes, you don’t want to eight of them to explain what you’re talking about. Just use the damn buzzword, even if it’s not 100% right. Chances are that your audience will not know the difference anyway – but they immediately understand your talk.

Btw: Shazam recognises songs by fourier-transforming the song (using time slots) – i.e. you end up with a function of frequency at a given time slot. This info is then converted into a (not so) gigantic hash table that can be easily compared to the existing database.

Coming Up: International Open Data Hackathon, Feb 21st 2015

This year’s open data hackathon is just around the corner. The event is about the possibilities of open data, but also aims to connect people working in the field across the world, as well as to attract interested folks who would like to have a first taste of what open data can actually do.

Screen Shot 2015-02-10 at 22.46.04

There’s a wiki listing events around the world. Alternatively, you could just check the german website. Or just directly to Berlin’s Hackathon kicking off at 10 am at Correct!v.

DSR, Day 4 – From WTF to OMG

R, R all over again. It’s fun: I feel like being back on track now. Things are still a little slow for me. I have done very little coding for the past two years, and it’s noticeable. I moved from “WTF!” (I can’t do anything anymore) to “OMG” (this is awesome!).

We’re still covering variable types, and fairly simple operations on them. Today was data frames (love them, extremely versatile), matrices (always hated them, but now made my peace with them); also working with attributes and factors.
I liked factors, because they used to make things run quicker for categorial data, but apparently that feature is gone. Yet, they still make life simpler by easy renaming and ordering (good for plotting graphs!).

Apart from that…cut(), cut bins data into different buckets according to breaks. That does make categorising much easier than using the plain old data$category[which data$x>5]<-category5.

Dolores burritos for lunch – very packed, both the place and the burritos, yet both absolutely great.

Data Science Retreat, Day 3 – Queen of Typos

– Learning two different programming languages at the same time is strange – like learning two foreign langages at the same time. I’ve been using R for most of my PhD, and feel quite okay with it, and with it’s philosophy. I’m learning lots of new things – the kind of stuff, one only learns when somebody with a wide knowledge is explaining it PROPERLY. But still, the R thinking has made me lazy. Learning Python makes me notice. I have absolutely no idea, why python needs a while or for loop so frequently (well, I have, but why is there no such thing as lapply in python?).

– I am the queen of typos: writing more or less complex function is no massive problem, but there will be inevitably a stray bracket somewhere.

– Project ideas! I need one. One my my “colleagues” came up with an idea for media recommendation that involved scraping data from a company (“I just change my IP as soon as they block me.”). Well, if they’re blocking your IP when you scrape data, it means they don’t WANT you to steal their data anonymously. At this point one might at the very least ask them whether they would give you access to the dataset you need?

– I am thinking about a project moving around either social media and prediction of x. Or something around finding flats for sale/sold flats, and trying to predict the price development. Please not a recommendation algothithm. The world doesn’t need another one.

– I haven’t seen Sascha Lobo today.

Data Science Retreat, Day 2 – Intimately R

– Today was R day! 🙂

I’ve been using R since 2008, and I’ve learnt a lot today. My favourite was that

"<-"(x,1)

works the same way as

x<-1

That seems minor, but it is very neat, es. when writing more complex calls. I knew that basically everything relies on functions in R, but somehow I had never thought about what this means for “<-“, “[” etc.

– Also, R is the language for lazy programmers: There are many in-built functions that work like loops. Mostly there’s no need to rely heavily on if/for/while loops. That explains why I usually feel like cheating when saying I’d be quite ok in R “programming”: the language does a lot for you. So, for me, using R was always more about stitching together existing functions and finding the right packages than about fancy programming.

– I still feel a little weird because of that
http://live.amcharts.com/ZmY2E/

but that’s going to pass, probably. We will see.

– Oh, and I saw Sascha Lobo this morning on my way to Zalando.

Data Science Retreat, Day 1 – The Nerd Shock

Disclaimer: I’m German, and my parents think “Not complaining is enough praise”. 😉

I was waking up really early (5.30 am!), got myself a coffee, and decided to quickly update my computer to MacOS Yosemite . Spoiler: This was a spectacularly stupid decision. But more to that later.

On the way to the tube I ran across a Sascha Lobo.

I got to Zalando’s offices fairly early, entered, was welcomed by the organiser in the entrance hall and ushered to the seminar room in the 9th floor. There: billions of cables. I think they had every cable ever produced, to connect monitors to the laptops people brought. The place looked like the mutant child of an unholy wedding of an IT storage room and a student union office.

The view from the 9th floor of Zalando’s Mollstrasse offices is fabulous – even when it’s very overcast. You can see until Prenzlauer Berg.

The people were … mostly a bunch of nerdy looking people. The small talk starter at lunch was the question “So what do you prefer, MongoDB or MySQL?”.

Women (among trainees): 1 out of 10 (incl. me). Women (among teachers/mentors): 0 (up to now.)

There was no internet. How can there be no internet? At some point there was really slow WIFI. And there was cable LAN – but I don’t have an ethernet plug at my computer. I also don’t usually carry a thunderbold-to-ethernet adapter.

My (own, personal) computer lagged massively, up to the point that it was completely unusable. (I blame the Yosemite update.). Later it turned out to be caused by Ghostery (my tracking blocker browser plugin) and/or the WIFI. As soon as I turned off both, everything was running smoothly again.

We covered the basics of python (I am a python idiot novice), and I found that afternoon really useful.

It’s noticeable that the programme is still fairly new. There’s the odd non-organisation here and there. But hey. Next time, they’ll probably send around the software requirements in advance. Then the 80min it took for everybody to install python, anaconda, and iPython notebook might be filled with more useful teaching.

The environment and people are quite charming. And smart. And did I mention charming? Yet, I usually work for an internet agency that sells brand consulting, shiny web applications and marketing. Our style is very different.