The Death of the Data Scientist – Here’s How Not to Drown in the Data Lake

Databases of big corporations are like huge, deep lakes. For years, they’ve been slowly filling up with data and the bottom is no longer in sight. Data from every part of the company just flows into the lake through different streams, to be sorted out later.

As a data scientist working for a corporation, you are expected to dive into the data lake and do your magic. But you can spend years trying to find a lifeline, a tactic that works for you. Some data scientists drown and leave the company. 

That’s the scenario Marc Weimer Hablitzel of Gründerallianz sketches at the Data Natives Big Data Meetup on the 23rd of May. He wants startups to use the data lakes of big corporations so that they can develop new products, while helping create solutions for corporations. 

“Startups only have a pond in the middle of the desert,” he says during his talk. “But, most importantly, they have the technology and a team that is excited to bring new products to the market.” 

This is what the large heavy industry companies in the Ruhr region in Germany, the target area of the Grunderallianz project, need. They have lots of data and a desire to explore it. “Its a large untapped potential,” Weimer Hablitzel tells. “Imagine machine data, well structured, built for over forty years.” 

During the first round of the project in 2018, one startup created a profitable product after helping a water treatment company discover in how errors slipped into their rain prediction tools. 

In her talk ‘Self Serve Analytics with Big Data- is it even possible?’, Stefania Olafsdottir, CEO of analytics company AVO, preaches how it’s useless to build data lakes big enough to drown in. Because of the cloud, companies are able to store huge amounts of data without considering if they even need to store it. “You get a feeling of safety that you probably have what you need,” Olafsdottir says. “But we are storing a lot of useless data.” 

In digital product development of today, which moves at lightning speed, this is not sustainable.  “You have to store your data properly, so you can make your decisions when you need to,” Olafsdottir tells.

“Don’t let analytics be a blocker.”

At companies facing this problem, data scientists spend lots of time answering basic questions, instead of working on exciting new projects. 

In order to keep up, companies need self serve analytics that are clear and accessible to everyone in the team – from developers to decision makers. Crucial to these analytics is that they are built on relevant and reliable data, because humans “write bugs in their data all the time”, says Olafsdottir. 

So how do we do that? According to Olafsdottir, you need to define you KPI’s in order to know what data is relevant. Plan metrics for each feature and involve the relevant stakeholders. You will also need to define success of the feature and how you will measure it. In order to get reliable data, Olafsdottir recommends a version control as the source of truth for the data structure. You also need to automate the validation of implementation and to monitor data regression, systems that AVO builds. 

The talk of Tobias Joest, “Why Data Science Needs a Human Centered Design”, is important to keep in mind while creating a product. According to Joest, there is a risk for data scientists swimming in the data lake to develop a tool that no-one will eventually use. It might work perfectly, but just doesn’t really address the issue. That is where human centred design steps in. 

“How would you change a lightbulb?” he asks the audience. Data scientists will look at how high the ceiling is or where the power source is, how many turns you have to twist the bulb and what the safety requirements are. But what about a human centric designer? “It might be a bit of a joke, but it’s also reality,” Joest says. 

“The human centric designer would ask: Why do we need a lightbulb at all?” 

In essence, human centered design boils down to more inventive solutions that actually address human problems. Another example Joest gives is a problematic five-lane roundabout in London. What would Data Scientists look at to understand the issue? “Traffic!” tells someone in the audience. “Speed”, says another. Accidents, pedestrians, times of traffic, public transportation, are also suggested. So what would human centered design look like here? “Observing people, doing interviews,” he says. “Putting yourself in the shoes of other people, cycling around and seeing what happens.”

He explains how Data Science focuses on what is possible, whereas human centered design focuses on what problems to solve. A combined approach is best, thinks Joest. “If we dive in the data lake and use it to create something no-one needs, it won’t be helpful for our business or society” he tells. “On the other hand, if we use human centred design and we don’t have the right data to solve the problem, we won’t achieve anything either.” 

So go, dear Data Scientists, dive deep into the data lake, but remember to keep your human mission in sight as, at the end of the day, data should be serving the common good of humanity. 

__________

Use the opportunity to join us for more exciting talks at DN19 September Tour! We will kick off in Berlin, scatter across 9 cities and head back to for the Data Natives 2019 Conference. Click on your city and secure your free spot:

Berlin, 03.09.19|Hamburg, 04.09.19|Amsterdam, 05.09.19|Cologne, 10.09.19|Vienna, 12.09.19|Munich, 17.09.19|London, 18.09.19|Frankfurt, 24.09.19|Paris, 25.09.19|Zurich, 26.09.19

 

Facebook
Twitter
LinkedIn
Reddit

Up Next....

Magazine

What is blockchain technology used for?

Since its First launch, blockchain technology has been associated with the world of cryptocurrencies and because of this, it has raised certain doubts. The lack of transparency of some virtual