Friday, June 26, 2015
Big data needs to be fast and smart.
DAUNTING DATA
Every minute, 48 hours of video are uploaded onto Youtube. 204 million e-mail messages are sent and 600 new websites generated. 600,000 pieces of content are shared on Facebook, and more than 100,000 tweets are sent. And that does not even begin to scratch the surface of data generation, which spans to sensors, medical records, corporate databases, and more.As we record and generate a growing amount of data every millisecond, we als
o need to be able to understand this data just as quickly. From monitoring traffic to tracking epidemic spreads to trading stocks, time is of the essence. A few seconds’ delay in understanding information could cost not only funds, but also lives.
BIG DATA’S NOT A BUBBLE WAITING TO BURST
Though “Big Data” has been recently deemed an overhyped buzz word, it’s not going to go away any time soon. Information overload is a phenomenon and challenge we face now, and will inevitably continue to face, perhaps with increased severity, over the next decades. In fact, large-scale data analytics, predictive modeling, and visualization are increasingly crucial in order for companies in both high-tech and mainstream fields to survive. Big data capabilities are a need, not a want today.“Big Data” is a broad term that encompasses a variety of angles. There are complex challenges within “Big Data” that must be prioritized and addressed – such as “Fast Data” and “Smart Data.”
SMART DATA
“Smart Data” means information that actually makes sense. It is the difference between seeing a long list of numbers referring to weekly sales vs. identifying the peaks and troughs in sales volume over time. Algorithms turn meaningless numbers into actionable insights. Smart data is data from which signals and patterns have been extracted by intelligent algorithms. Collecting large amounts of statistics and numbers bring little benefit if there is no layer of added intelligence.IN-THE-MOMENT DECISIONS
By “Fast Data” we’re talking about as-it-happens information enabling real-time decision-making. A PR firm needs to know how people are talking about their clients’ brands in real-time in order to mitigate bad messages by nipping them in the bud. A few minutes too late and viral messages might be uncontainable. A retail company needs to know how their latest collection is selling as soon as it is released. Public health workers need to understand disease outbreaks in the moment so they can take action to curb the spread. A bank needs to stay abreast of geo-political and socio-economic situations to make the best investment decisions with a global-macro strategy. A logistics company needs to know how a public disaster or road diversion is affecting transport infrastructure so that they can react accordingly. The list goes on, but one thing is clear: Fast Data is crucial for modern enterprises, and businesses are now catching onto the real need for such data capabilities.GO REAL-TIME OR GO OBSELETE
Fast data means real-time information, or the ability to gain insights from data as it is generated. It’s literally as things happen. Why is streaming data so hot at the moment? Because time-to-insight is increasingly critical and often plays a large role in smart, informed decision making.In addition to the obvious business edge that a company gains from having exclusive knowledge to information about the present or even future, streaming data also comes with an infrastructure advantage.
With big data comes technical aspects to address, one of which is the costly and complex issue of data storage. But data storage is only required in cases where the data must be archived historically. More recently, as more and more real-time data is recorded with the onset of sensors, mobile phones, and social media platforms, on-the-fly streaming analysis is sufficient, and storing all of that data is unnecessary.
STREAMING VS. STORING & DATA’S EXPIRATION DATE
Historical data is useful for retroactive pattern detection, however there are many cases in which in-the-moment data analyses are more useful. Examples include quality control detection in manufacturing plants, weather monitoring, the spread of epidemics, traffic control, and more. You need to act based on information coming in by the second. Re-directing traffic around a new construction project or a large storm requires that you know the current traffic and weather situation, for example, rendering last week’s information useless.When the kind of data you are interested in does not require archiving, or only selective archiving, then it does not make sense to accommodate for data storage infrastructure that would store all the data historically.
Imagine that you wanted to listen for negative tweets about Justin Bieber. You would either store historical tweets about the pop star, or analyze streaming tweets about him. Recording the entire history of Twitter just for this purpose would cost tens of thousands of dollars in server cost, not to mention physical RAM requirements to process the algorithms through this massive store of information.
It is crucial to know what kind of data you have and what you want to analyze from it in order to pick a flexible data analytics solution to suite your needs. Sometimes data needs to be analyzed from the stream, not stored. Do we need such massive cloud infrastructure when we do not need persistent data? Perhaps we need more non-persistent data infrastructures that allow for data that does not to be stored eternally.
Data’s Time-To-Live (TTL) can be set so that it expires after a specific length of time, taking the burden off your data storage capabilities. For example, sales data on your company from two years ago might be irrelevant to predicting sales for your company today. And that irrelevant, outdated data should be laid to rest in a timely manner. As compulsive hoarding is unnecessary and often a hindrance to people’s lifestyles, so is mindless data storage.
BEYOND BATCH PROCESSING
Aside from determining data life cycles, it is also important to think about how the data should be processed. Let’s look at the options for data processing, and the type of data appropriate for each.Batch processing: Batch processing means that a series of non-interactive jobs are executed by the computer all at once. When referring to batch processing for data analysis, this means that you have to manually feed the data to the computer and then issue a series of commands that the computer then executes all at once. There is no interaction with the computer while the tasks are being performed. If you have a large amount of data to analyze, for instance, you can order the tasks in the evening and the computer will analyze the data overnight, delivering the results to you the following morning. The results of the data analysis are static and will not change if the original data sets change – that is unless a whole new series of commands for analysis are issued to the computer. An example is the way all credit card bills are processed by the credit card company at the end of each month.
Real-time data analytics: With real-time data analysis, you get updated results every time you query something. You get answers in near real-time with the most updated data up to the moment the query was sent out. Similar to batch processing, real-time analytics require that you send a “query” command to the computer, but the task is executed much more quickly, and the data store is automatically updated as new data comes in.
Streaming analytics: Unlike batch and real-time analyses, stream analytics means the computer automatically updates results about the data analysis as new pieces of data flow into the system. Every time a new piece of information is added, the signals are updated to account for this new data. Streaming analytics automatically provides as-it-occurs signals from incoming data without the need to manually query for anything.
REAL-TIME, DISTRIBUTED, FAULT-TOLERANT COMPUTATION
How can we process large amounts of real-time data in a seamless, secure, and reliable way?One way to ensure reliability and reduce cost is with distributed computing. Instead of running algorithms on one machine, we run an algorithm across 30 to 50 machines. This distributes the processing power required and reduces the stress on each.
Fault-tolerant computing ensures that in a distributed network, should any of the computers fail, another computer will take over the botched computer’s job seamlessly and automatically. This guarantees that every piece of data is processed and analyzed, and that no information gets lost even in the case of a network or hardware break down.
IN SHORT
In an age when time to insight is critical across diverse industries, we need to cut time to insight down from weeks to seconds.Traditional, analog data-gathering took months. Traffic police or doctors would jot down information about patients’ infections or drunk driving accidents, and these forms would then be mailed to a hub that would aggregate all this data. By the time all these details were put into one document, a month had passed since an outbreak of a new disease or a problem in driving behavior. Now that digital data is being rapidly aggregated, however, we are given the opportunity to make sense of this information just as quickly.
This requires analyzing millions of events per second against trained, learning algorithms that detect signals from large amounts of real, live data – much like rapidly fishing for needles in a haystack. In fact, it is like finding the needles the moment they are dropped into the haystack.
How is real-time data analysis useful? Applications range from detecting faulty products in a manufacturing line to sales forecasting to traffic monitoring, among many others. These next years will hail a golden age not for any old data, but for fast, smart data. A golden age for as-it-happens actionable insights.
Original post: http://www.augify.com/big-data-fast-data-smart-data/
READ: Workshop Report: Data Revolution in Africa
Following on from our update from the Regional Agenda Setting workshop in Addis Ababa, here is our extended workshop report, outlining our open data roadmap in Africa. We’re excited that open data is becoming a critical element of the Africa Data Consensus and is already being discussed in the Implementation Roadmap meetings in Lagos and OGP Regional Civil Society meetings.
Some of the key requirements identified in order for open data to have a
transformative impact in Africa include the need for political
goodwill, the role of intermediaries and moving beyond open data portals
to understanding the real use and impact and use of open data.
Building on this, we will be rolling out our Africa research and architecture for the Open Data Labs Africa
by the end of June 2015. Thanks to all those who were a part of this in
Addis and beyond. If you have comments or questions, please contact our
Open Data Research Lead — Savita Bailur.
Source: http://webfoundation.org
Source: http://webfoundation.org
Labels:
Africa,
Data,
Data Report,
Report,
Revolution,
Workshop
East Africa Data Centre announces KSH1BN expansion on galloping demand for secure data space
East Africa Data Centre, the only Tier 3 secure electronic data
centre in east and central Africa, has today announced a Ksh1bn
expansion to meet rack space demand that initially forced it to ration
allocations to customers. Unveiling the second and third floors of the
data centre, the East Africa Data Centre announced that the centre would
now be extended to four floors, totalling 2000 sq. m.
The data centre, which now houses Kenya’s Internet Exchange Point,
has been credited by the global Internet Society as a key factor in
driving down internet prices in Kenya, to among the lowest in Africa.
The East Africa Data Centre hosts the Points of Presence for global
carriers with international coverage, including Tata, Level3, Seacom,
and Liquid Telecom, as well as carriers owning fibre network
infrastructure, including Safaricom, JamiiTelcom, Access Kenya, Orange
Telkom Kenya Ltd, Wananchi Online, and Frontier Optical Network.
It is further playing a key role in enabling financial and corporate
organisations to hold data securely, protecting them in the event of
cyber crime and offering 24/7 secure housing for their data and
back-ups.
“The East Africa Data Centre has transformed how data traffic is
handled in the region. By providing a central point for interconnect
services, it has reduced latency, improved data services, reduced costs
and made it easier to transfer data across networks,” said Dan Kwach,
General Manager East Africa Data Centre.
“By keeping African data in Africa we continue to help reduce the
costs of internet access while creating an environment that encourages
innovation and entrepreneurial culture in the field of ICT and local
businesses,” said Mr Kwach.
Within six months of starting, it had fully sold Phase 1 of the
centre’s rack space – which houses the servers holding data – amounting
to the entire first floor. It has now opened another 500 sq m floor,
which is already 90 per cent occupied, and with the third floor already
prepped for occupancy, East Africa Data Centre unveiled plans to expand
to the fourth floor immediately, to cater for demand.
“We had to ration the rack space when we were selling the first floor
due to huge demand, until we could get the second floor built and
operational and the third floor ready to go quickly. The second floor
took roughly 8 months, but now we have the space ready, we can move much
quicker and customers can buy the amount they want,” said Mr Kwach. The
accelerated expansion in EADC’s rack space has benefitted the local
engineering and construction services sector, with all of the
contractors for the expansion sourced locally. It also comes amid
growing concern for data security. In late 2013, Kenya’s Information,
Communication and Technology Cabinet Secretary, Fred Matiangi, raised
the flag on estimates that the country would lose an estimated KSh 2
billion (about US$23 million) through cyber crime, with the number of
cyber attacks detected in Kenyan cyberspace more than doubling last year
to 5.4m attacks, compared to 2.6m in 2012. Uganda, which last year
reported a surge in economic crimes, up 14.9 per cent, singled out
cybercrime, principally in mobile money and Automated Teller Machine
(ATM) fraud, as responsible for the loss of about USH1.5bn (Ksh51m),
while Bank of Tanzania (BoT) statistics indicate TZS 1.3bn (Ksh69.5) has
been stolen across the country through cyber fraud, according to the
Kenya Cyber Security Report 2014.
Financial institutions are also introducing potentially vulnerable
web and mobile applications, with a recent study that sampled 33 online
banking portals finding that only 2 of the 33 portals sampled had
adequate online security deployed on their web application. As a result,
many financial institutions are now looking into EADC to store their
data, reported the Kenya Cyber Security Report 2014.
“Banks and financial institutions are the second largest type of
occupant at the East Africa Data Centre, at about 30 per cent. With
about 43 banks in Kenya, the demand for highly secure stable
environments like ours, for use as disaster recovery, high-availability,
or primary sites, has been rising,” said Mr Kwach.
East Africa Data Centre recently signed a collaboration agreement
with Teraco Data Environments in South Africa to share synergies between
the two Data Centre Operators and improve EADC’s efficiency and
maximise on available investment opportunities.
About East Africa Data Centre
East Africa Data Centre, a carrier-neutral data centre in Nairobi, is
the largest and most sophisticated in East Africa, offering secure and
reliable space for dedicated hosting, interconnect services,
collocation, disaster recovery, network-based services, applications and
cloud services. A Tier 3 data centre, built to international standards,
it is the only purpose-built data centre in East Africa.
East Africa Data Centre is an independent company within The Liquid Telecom Group with a dedicated management team.
Subscribe to:
Comments (Atom)


