Weekly Big Data Catch-Up

*|MC:SUBJECT|*

Dataconomy » Weekly Newsletter

Big Data News, Events, and Expert Opinion

Excerpts:

Starting an Analytics Function: A Five Step Framework

5 Step Analytics Framework“How do we start an analytics practice?”  I have often been asked this question. While there is no one size fit all solution to it and various organizations have achieved success through various models. Yet, we at Kruxonomy, have come up with a comprehensive 5 step framework to seamlessly get started with an analytics function, that mitigates risks and exemplify analytics output. These steps could be taken up internally by businesses, though we recommend employing external experts that bring in a third eye perspective to your business. Identify Opportunities Even before you acquire skillsets and set aside your yearly budget for analytics, know what is needed. Get an assessment of your current data landscape and key use cases that are helpful for your business success. This is essential to identify...
Read on »

Open-Sourcing the Human Body With the “Open Humans” Platform

Open SOurce Human Body Humans Harvard NYUResearchers from Harvard, NYU and the University of California San Diego have come together to set up the “Open Humans Network,” in an effort to let individuals share their personal health data to accelerate medical breakthroughs. Funding for the initiative comes from John S. and James L. Knight Foundation and the Robert Wood Johnson Foundation, in the form of $500,000 from each in separate grants. The project’s director, Jason Bobe who also runs the project’s parent organization, PersonalGenomes.org, explains, “Think of it as open-sourcing your body.” “There is tremendous potential for accelerating medical discoveries by helping individuals take their health and personal data out of data silos and making the data more broadly used.” The project will allow “willing individuals” ease in access and sharing of data with researchers through...
Read on »

MapR’s 4 Takeaway Lessons from the Hadoop Revolution

MapR Lessons from the Hadoop RevolutionTo get value out of today’s big and fast data, organizations must evolve beyond traditional analytic cycles that are heavy with data transformation and schema management. The Hadoop revolution is about merging business analytics and production operations to create the ‘as-it-happens’ business. It’s not a matter of running a few queries to gain insight to make the next business decision, but to change the organization’s fundamental metabolic rate. It is essential to take a data-centric approach for infrastructure to provide flexible, real-time data access, collapsing data silos and automating data-to-action for immediate operational benefits. Lesson #1 – Journey to the Real-Time Data-Centric Enterprise There are two types of companies in the big data space. Those that are born in big data to deliver a competitive edge through the software or...
Read on »

The National Data Science Bowl Makes Plankton Research Sexy

DIGITAL CAMERARare is the man whose heart starts racing upon hearing the word “plankton”. It might not be the sexiest field in the world, but the study and categorisation of this marine microorganisms can tell us a huge amount but the health of our oceans. The problem? There’s tonnes of them. The dataset of the Oregon State University Hatfield Marine Science Center is the same size of 400,000 3-minute YouTube videos. Manually categorising all that data would take marine biologists 2 lifetimes to complete. Of course, in our age, there’s a solution- machine learning-optimised automation. The Oregon State team handed their dataset over to the National Data Science Super Bowl, which challenged entrants to automate the ocean health assessment process. The entries to the contest- co-sponsored by Kaggle & Booz Allen Hamilton- were...
Read on »

New York By the Numbers

New York by the Numbers Place I Live VisualisationPlaceILive is a startup that uses big and open data to visualize what is happening in your city. Whether that is air pollution, unemployment, internet speeds or the number of coffee places in your neighborhood. The goal is to measure livability through all this data so that you, as a smart citizen, can make better decisions and governments can make more sustainable cities. Crucial in this, is their signature Life Quality Index: an algorithm that takes aspects like transportation, safety, and affordability into account to score every building in New York, Chicago, San Francisco, London and Berlin. This score makes it possible for people to easily compare the livability of different houses. You can read more on their methodology and sources here. The data comes from open government sources and...
Read on »

SQL for Documents is the Next Frontier for NoSQL Startup Couchbase

Couchbase SQL NoSQLCouchbase Server, the open source, distributed NoSQL document-oriented database has unveiled a new query tool compatible with SQL. The new offering, has already been at the disposal of developers as N1QL since over a year now. It will henceforth be shipped as part of Couchbase Server under the name SQL for Documents, Couchbase explained president and chief executive Bob Wiederhold. With this offering, Couchbase stands out among other NoSQL outfits like MongoDB and DataStax that don’t offer, SQL-based database Wiederhold said. VentureBeat points out how SQL query languages becoming the standard is evidence that, ” database companies are looking to expand their relevance beyond developers who build applications and need to store and serve up data. Many business analysts know SQL well and could end up feeling comfortable enough to run...
Read on »

14 Best Python Pandas Features

Mei Xiang (L) and Tian Tian (R)Pandas is the most widely used tool for data munging. It contains high-level data structures and manipulation tools designed to make data analysis fast and easy. In this post, I am going to discuss the most frequently used pandas features. I will be using olive oil data set for this tutorial, you can download the data set from this page (scroll down to data section). Apart from serving as a quick reference, I hope this post will help you to quickly start extracting value from Pandas. So lets get started! 1) Loading Data “The Olive Oils data set has eight explanatory variables (levels of fatty acids in the oils) and nine classes(areas of Italy)”. For more information you can check my Ipython notebook. I am importing numpy, pandas and matplotlib...
Read on »

Meet Pinnability- Pinterest’s Machine Learning Tool for Personalising Your Feed

Pinnability Pinterest Recommendation Machine LearningWith the aim to offer a more nuanced content to the user, Pinterest have unveiled a machine learning tool that provides the user with the most personalized and relevant Pins. The ML tool, titled Pinnability, runs on smart feed, “and estimates the relevance score of how likely a Pinner will interact with a Pin. With accurate predictions, we prioritize those Pins with high relevance scores and show them at the top of home feed,” explains Pinterest’s software engineer Yunsong Guo. Earlier all home feed content from each source (e.g., following and Picked For You) was put together chronologically; a newer Pin from the same source would show up before an older Pin, irrespective of the degree of interest either of the pins hold to the user. It stunted the discovery...
Read on »

Dataconomy » Upcoming Big Data Events

It looks like you have a lot of content. To preview your campaign, send a test email.

Dataconomy » Big Data Jobs

It looks like you have a lot of content. To preview your campaign, send a test email.
Was this email forwarded to you? Click HERE to sign up for Dataconomy updates in your inbox! 
Facebook
Twitter
Copyright © *|CURRENT_YEAR|* *|LIST:COMPANY|*, All rights reserved.
*|IFNOT:ARCHIVE_PAGE|* *|LIST:DESCRIPTION|*

Our mailing address is:
*|HTML:LIST_ADDRESS_HTML|* *|END:IF|*

unsubscribe from this list    update subscription preferences