Does anyone else know that feeling? You listen to too many people doing awesome things, you eventually get a small existential crisis. Well. I do.
In Day one we explored Google Next ’19 and my highlights. While Google Next was still on, I was up to other shenanigans. My real reason to come to San Francisco: Kaggle Days!
Kaggle is an online community of data scientists and machine learners, owned by Google. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges. Kaggle got its start by offering machine learning competitions and now also offers a public data platform, a cloud-based workbench for data science, and short form AI education.CC-BY-SA https://en.wikipedia.org/wiki/Kaggle
Out of over 800 people, I was selected to come by and participate in talks, workshops and break-out sessions revolving around machine learning and data science competitions. Kaggle has an internal ranking system of contributors, where the top tier is the “Grandmaster”, of which many were sharing insights with the participants. These often work on Kaggle competitions full-time, work in data science or have startups themselves.
Check out my popular open kernel on the TGS Salt detection challenge. You may notice that kaggle has many insider words. Kernel and Grandmaster being two of those. A fact I find makes it hard to introduce people to kaggle for education. Once your in, it does feel good though, climbing the ranks and sharing your insights.
You may have notices from my writings before, that I’m not from the Americas. I took a 10 hour flight to attend this event in San Francisco. Why?
This event was based in San Francisco, which would attract competitive and interesting participants. It will be easy for Kaggle employees to attend, so while my social anxiety kept me from mingling with all of them, especially the social events that were fueled by craft beers helped me get out of my shell and meet awesome people working in the field. Some things never change.
The talks and workshops condensed a few insights I gained and that have helped me tremendously in implementing data science and machine learning pipelines for my PhD, here are the main ones:
- Data leakage / cross-contamination can kill your project
- There is no shame in working with usable programs
- Think about the information in your data
- Always cross-validate
- Shake-Up will happen (it’s a Kaggle thing.)
While data leakage can kill your project, it probably wont and it’s most likely fixable. If you’re at a hackathon, training over night, data leakage is your greatest enemy.
Our Saviour: Keras
Yes, Francois Chollet was there and he talked about building APIs. An API so good, Google is adopting it in v2.0 of the deep learning library Tensorflow.
Outtrained by Computers
This may be fore-shadowing, but on day 2 of kaggle days, the next post in this series, we will encounter AutoML. It is the Neural Architecture Search implementation from Google lead by Quoc Le. This algorithm searches Tensorflow whether it should train a Random Forest or a Neural Network, it will then go on to find the ideal configuration of architecture and hyperparameters. The biggest challenge? Overfitting.
Day One was fantastic. So many inspiring and insightful talks and workshops. The food was incredible, and the people I met were kind, smart and very interesting. Absolutely worth the little burst of anxiety in between.
Absolutely worth the 10h flight.