Keep it Clean: Why Bad Data Ruins Projects and How to Fix it

Thursday Oct 24

15:30 –

16:15

B 07 - B 08

Slides:

The Internet is full of examples of how to train models. But the reality is that industrial projects spend the majority of the time working with data. The largest improvements in performance can often be found through improving the underlying data.

Bad data is costing the US economy an estimated 3.1 trillion Dollars and approximately 27% of data is flawed in the world's top companies. Bad data also contributes to the failure of many Data Science projects. Who can forget Tay.ai, Microsoft's twitter-bot that learned to be genocidal when user's tweets were not cleaned.

This presentation will discuss in what circumstances bad data can affect your project along with some high profile case studies. We will then spend as much time as we have to go through some of the techniques you will need to fix that bad data. This is aimed towards those with intermediate-level Data Science experience.

What will the audience learn from this talk?
This presentation will discuss in what circumstances bad data can affect your project along with some high profile case studies. We will then spend as much time as we have to go through some of the techniques you will need to fix that bad data.

Does it feature code examples and/or live coding?
No

Prerequisite attendee experience level:
Level 200

Phil Winder

CEO of Winder.AI, author of "Reinforcement Learning"

Keynotes

Thursday Oct 24 @ 09:15

Love Letter to the Computer

Linda Liukas

Thursday Oct 24 @ 13:15

Composing Bach Chorales Using Deep Learning

Feynman Liang

Wednesday Oct 23 @ 09:15

The Importance of Laughter

Aino Vonge Corry

Friday Oct 25 @ 13:30

Interaction Protocols: It's All About Good Manners

Martin Thompson

Wednesday Oct 23 @ 17:45

Machine Learning: Alchemy for the Modern Computer Scientist

Erik Meijer

Thursday Oct 24 @ 17:45

Get Ready to Rock with Sonic Pi - The Live Coding Music Synth for Everyone

Sam Aaron

Friday Oct 25 @ 09:15

Welcome to a New Age of Refereeing

Pierluigi Collina

Friday Oct 25 @ 17:40

Fueling the Quantum Application Era with the Cloud

Murray Thom

Wednesday Oct 23 @ 13:15

Extreme Digitalization in China

Christina Boutrup

Friday Oct 25 @ 16:45

Special Appearance - Why Berlin?

Aimée Covo