Businesses can now grow and learn about their customers in new and faster ways than ever before. There is a huge amount of information available, that is more and more dimensional and complex. In order to make data valuable, data modeling and visualization is crucial. Let’s discuss with DN19 speakers Samantha Edds from Bunch AI and Adam Janes, fellow at Dalarub and Ettrich.
Through graphs, charts and maps, we can make data insightful, so we are able to see hidden trends and patterns. This is called data visualization and in the world of Big Data, it’s becoming massively important as it makes data more understandable and usable.
Data modeling, on the other hand, makes sure that the data is stored in a database and accurately represented. It includes the data objects, associations and rules. Through modelling, you can create a clear picture of the data and identify missing and redundant data.
As our societies are increasingly data-driven, data modeling and visualization are crucial to that development. Let’s check out what these magical disciplines are all about.
“On the surface, it seems like our society has unanimously agreed that data is an extremely valuable thing,” tells Adam Janes, who taught data visualization to over 8000 students on Udemy. “However in reality, value is generated through insight, not through collection.” We can make sense of abstract information through data visualization. It makes data concrete, so it can guide our future actions.
It is both an art and a science, as a lot of design also comes into play in order to make the data visually attractive. Some of the creative creations are working well, he names interactive data journalism such as those from The New York Times and National Geographic. ”But for most people who make data visualizations this just ends up causing clutter,” he says.
And that is the challenge of Data Visualization: designing something that is actually easy to comprehend. “There’s always a temptation to try to reinvent the wheel and make something completely unique, because we don’t want to just keep drawing bar charts and line graphs,” says Janes.
“If a graph is too different and it takes time and attention to understand, there is a danger that people are not willing to try to understand it.”
In terms of technology for data visualization, Janes works with and recommends D3.js, which specifically helps to display data on web pages. “It has a huge amount of flexibility, and you can really visualize pretty much anything that you can imagine with it, but it ends up being overkill for what most people really want to do,” he tells.
“A more barebones solution that might be familiar to data scientists is something like MATLAB, which can already give some pretty interesting data visualizations out of the box – and even regular statistics programs like R and STATA have some good options for generating summary graphics.”
For Janes, it’s important to point out that it is doable to get into data visualisation. After all, everyone is “into Data Visualization” to some degree: “If you’ve ever made a bar chart in Excel for example, then you’ve already had to wrestle with the problem of how best to communicate abstract numbers into tangible insights.”
Today, companies have lots of data from many different sources. And there are always new ways to view information because of device fragmentation. “Being able to harness all of it to keep up and deliver to your clients is key,” says Samantha Edds, Senior Data Scientist at Bunch AI.
“There are so many dependencies going on that we cannot easily see or model well with regressions. So modeling in any form, particularly machine learning, is very important.”
Edds got into Data Modeling from a Master in Statistics and she is interested in Machine Learning, modeling and validation. She loves how data modelling can uncover patterns and understand information too difficult to understand without a lot of data and processing. She says it was eye-opening to see how in a previous role, she was able to predict the students who would drop out before finishing high school in the US, often as early as the 7th grade. “It gives educators and other services the ability to reach out to these students a lot sooner than traditional information flags which often come too late.”
At the moment a lot is happening in data modeling. Right now, training large models with programs such as BERT is still expensive in terms of time and processing power. But change is on the way:
“There are many companies and research groups dedicated to creating open source solutions, particularly in Python, for modeling extremely large amounts of data in different ways,” Edds tells. “I think creative solutions will keep coming because this is the future of data.”
But even though there are constantly new ways to understand and model data easily, we have to watch out for biases. “We have to keep in mind that regardless of the way in which we choose to model data, if it is biased, we will get biased results,” she tells. “There are a lot of dangers that come from biases being perpetuated, look at the Amazon hiring algorithm, because if you feed data into a model that are biased, whether or not it is captured directly in a variable, the model often picks up on it.” At DN19 she will talk about this challenge in data modeling.
For those interested to get into data modeling, Edds advises them to take a look at the many online courses and certificates which you can do. Right now the language of choice is Python and generally coding is required. “Also creating your own projects on Github is a great way to show what you can do,” she tells. “Finally I would recommend meet-ups to learn from others, and to reach out to people in the field to discuss.”