Lurking within these bulging repositories of data lie previously undreamt clues and patterns shedding new light on everything from consumer behaviour in the retail sector to the complex forces underpinning mineral exploration, disease and drug research and even how to discover, pre-empt and prosecute criminals.
Once the sort of issue that only enterprises would have troubled to think about, the expanding possibilities, falling costs and easier deployment of big data solutions suddenly opens a new door to small businesses. At the same time the jury is far from in at the big end of town with enterprises still grappling with an abundance of choice, not to mention complexity.
But with so many different “Big Data" solutions in the market it is hard for companies to know which are the best for their purposes. After all, big data is only as powerful as the tools we use to mine it.
In this, the latest roundtable to be hosted by CRN, in partnership with Nextgen Distribution and Oracle, our expert group of executives and IT entrepreneurs reveal the essential business and technical insights to help resellers align their customers with the big data solutions that best address their needs.
In other words, to help the channel and its customers ride the wave of this phenomenal and disruptive trend, rather than being swept away in the tsunami.
Featuring:
Stuart Long, system sales director, Oracle
Scott Newman, senior pre-sales director Oracle
Chris Mendes, chief technology officer Sirca
Rob Silver, senior software engineer, Remora
Laura Brundle, NSW district manager Dataflex
Tiberio Caetano, senior researcher NICTA
Jonathan Rubinsztein, CEO Redrock Consulting
Andrew McLean, enterprise solutions manager Intel
John Walters, managing director Nextgen Distribution
Anthony Viel, partner Deloitte forensics
Peter Kazacos, managing director and chairman Anittel
CRN: Anthony, you were at the ground level with Deloitte Australia some years ago leading a global initiative with big data. It took a few years to get going, but now it sounds like one of the fastest growing businesses in the organisation.
Anthony: My business is about six years old. It started from nothing. I was the first recruit in what we call the analytics business. When I came into the organisation at Deloitte, in Australia, I was part of redefining what we offer within Forensic. Anyone who’d not come across a Forensic issue, the most public one in most recent years is Siemens’ [bribery scandal of 2007].
We used data and analytics to transform what the Forensic offering was. It wasn’t a big jump for forensic. Because you’re generally in data, you need to form the position pretty quickly to represent in a Court of Law or in front of a regulator or whatever the case may be – so a very robust, 100 percent focus.
Some things were not obvious to clients so you had to leverage some of the best analytics techniques around.
In the course of doing that we transitioned the business from five or six in the Australian marketplace to number one, where we will double number two in the course of four years. With that, we then started to say, ‘hey there’s something in this data stuff – our clients want to talk to us about this data stuff, everyone says we’ve got mountains of data but no insight --or oceans of data and only islands of insight if you will'.
We started to focus analytics on audits, the traditional backbone of a business like Deloitte’s. We started focusing analytics on internal audit risk services. We started focusing analytics on tax, and so on and so forth – so across everything that we offer in the space. We not only took it out to market, but now we’re starting to really transform our business,
CRN: So what do you think organisations need to focus on in order to develop effective big data solutions?
Anthony: Action. Everyone can build stuff around it and point to how sexy the opportunities that are presented are big data having more data available, having it faster available to you in a shorter period of time, but until you actually get the organisation to change, the justification for an investment that you need, or transformation that you need to take full advantage of big data, is not going to stack up.
Big data is at risk of becoming a dirty word, or two words, as the case may be, like ‘cloud’.
I’m proud to say for Deloitte the big data journey began in Australia.
Our CEO, who is an ex auditor, doesn’t speak like an auditor any more. I’ll take a bit of credit for that
CRN: Peter, as one of the largest and most successful resellers and more recently carrier organisations in Australia, talk us through where you see the opportunities in big data.
Peter: If you look at the three words, three product sets, you’ve got cloud, you’ve got big data and you’ve got tablets. I think they’re three of the biggest trends at the moment, but if we focus on the big data side, it’s interesting from words that have been used to describe it, how quickly people can understand - whereas cloud, ok, more people are understanding it, but they still don’t really get it in a lot of cases.
But ‘big data’, people know that they’ve got that, because it’s been cheap, people have been storing it. In terms of storage, before it was people saying ‘what should we store, because it’s going to cost us a lot of money’ --- now they are storing it – even though they’re not doing much with it, and traditionally in the past, the words around data mining was something that was really out there, really hard to do and for larger organisations.
Now it’s about people saying ‘we need to be able to understand more about what’s there’, and one of the areas I think there’s a huge opportunity is within marketing organisations that really understand that to be able to deliver targeted marketing campaigns.
But if you look at the types of ‘big data’, you’ve got stuff logged in data bases, but also more importantly is the stuff that’s held in all the social media sites around and I think you can’t really digest that information, because it isn’t structured, so you can’t just use simple tools, and you do need to have a more rich set of tools.
One of the issues I’ve got at the moment is that I see it as an opportunity. We are not a creator of products, so we need to have product that’s available to us to be able to use in that space, as well as service. We focus on two ends. We have this huge base in our SMB space.
Based on service they don’t really want something on board, they want something to be able to deliver, some intelligent information and so I think it’s a huge opportunity in this space.
Anthony: It’s the fact that you are keeping more and more information about what it is that you’ve done or have not done, to discharge your duties and responsibilities to regulators, consumers and the like, and we have started to see the tipping point of that in the US in particular with all these large discovery actions.
As an example, there’s one going on now around BP - obviously they had a problem in the Gulf, but the point is that the corporate, the director of the corporate has some accountabilities and responsibilities, and I haven’t been in the Forensic situation where you come into an organisation and the information hasn’t been there.
Now that we’ve put more and more obligations on things, in some spaces like in safety and in some of the sanctions on money laundering and that sort of stuff, jail terms – the flipside of the opportunity is to say ‘wait a second, if you’re not collecting it, somebody else is, because they’re more corporate responsible’ and it’s becoming more pervasive you want access to that information whether it be through social media or what have you.
So it’s just a bigger lever to say get interested in this quickly, as the exposure that it’s creating is a real opportunity.
CRN: We are hearing a lot of talk about the exciting possibilities of ‘big data’. We are still at an early stage in terms of realising them, but Chris you seem to be someone who is at the coalface in terms of what could be discovered, particularly in terms of financial data, about what’s going on in the exchanges around the world.
Chris: Yes, there is actually a lot of stuff happening in that space. Obviously there’s a lot of things that we can’t talk about, and part of the reason that we can’t talk about it is that the people who use that data don’t want to tell us, who have got their magic algorithms - but interestingly one of the things that Anthony hinted at there was not so much from a strictly compliance perspective, but a purely competitive perspective such as people doing comparative costings on the efficiency of exchanges, based on stock market data.
It was a bit of an unexpected in this case, where it’s no longer the guy who wants to make money purely on transactions, but it’s actually having the exchanges compete with each other.
One of the other things that relates to what Peter was saying, is just the sheer use, and we’ve got a very big data set, so we’ve got all the data going back to 1996 for every exchange in the world, and that in itself is a huge problem, and obviously it is our core business, and we’ve spent a lot of time solving that.
The problem we haven’t been solving is actually processing the logs of what our customers are actually doing on that data – and recently I thought that should be easy, I’ll just go and get it done off the log and I’ll load it into the spreadsheet, because that’s about where my data processing capability begins.
And of course Excel just died a horrible death – so the logs going back for five years was a quarter of a terabyte, and that doesn’t sound like a lot of data, you’re talking 250GB – but to process that data and gain intelligence from it is a challenge.
I think the real issue, or real surprise for our industry is the guy who comes out and can say ‘I’ve got a tool that that lets my accounts department stab at things’.
Jonathan: I think we have got the hardware and software that allows us to get insight from data. We do have the risk also that just having the ability to get insight, the insight itself needs to be relevant. So I think working out what insight you want, and investing your limited resources is critical.
It all steps back to strategy. If you understand what your business is and where that insight can add value, because the problem is that you are hoping that there is insight and some structured data.
You’re hoping that you’ve got a whole lot of this data and so you might be able to find something cool in it, because you’ve got a tool – but guess what, you might (a) not find anything cool and (b) you might have no ability to take that insight and convert it into a return to your business.
I agree with Anthony’s comment that this not only being about revenue return, but there’s a risk also that you can manage, I think is really an insight, but I still think we do have the risk that you get technologists. I always say a ‘fool with a tool is still a fool’ and you’ve got technologists who get excited about technology, because guess what you can find out that a whole lot of people have done X, Y, Z and you go ‘that’s good’ and they’ll still do XYZ and you’ve just spent five million dollars working that out.
It’s important that we step back and work out where the insight might be. It’s like mining for minerals – guess what, you might mine in the wrong place and find nothing. So I think that where the risk we have with this Big Data, if we continue down the wrong paths, is we’ll waste a lot of money.
Tiberio: This is a very important point, and my view is that there are ways to prevent that from happening, and that has pretty much to do with trying to start from your business problem, or from your interpretation of a business problem.
Let’s include our business problems, some abstract way that the computer can understand and then let’s do the type of analysis in our ‘big data’ project that actually meets the requirements.
I’m a research scientist, I’ve been working for more than ten years on technology that tries to do that sort of stuff. Of course ‘big data’ is booming only now, but it’s been around for a while. I can tell you that particle accelerators have been there for a long time. We have been struggling with this problem for a long time. It’s just now everyone has access to ‘big data’. It’s got a cool name and it’s cheap to store.
But now we are making that link between business problems and technology. There’s a name for it, we call it ‘machine learning’.
I’ve been working on this for ten years. At NICTA what they do is to try and leverage that technology, that science, which is an important science in itself for the ‘big data’. We have realised that it’s really a key component of making the connection.
Jonathan: The question is around limited resources - getting to have an opportunity to spend in the new sales force for ‘big data’ and getting your eye on getting the ability to understand and drive it from a business perspective. So it’s not just articulating the problem, but trying to establish the return on the investment.
As people who either consult or sell, most of us think about these problems. One has to be able to step back and understand actually what you’re trying to achieve, how significant is this risk, where’s the revenue opportunity, quantify that.
How likely am I to find that return if I do some cool analysis? Will I find that if Telstra is looking for, understanding that there might be some trends to hot spots or black spots in a mobile network, what is my business return if I spend 10 million bucks?
Anthony: With machine learning we can’t think of some of the hypotheses for instance that we should be testing for anyway. But I do agree with you about the need to link to strategy. I’ve been working in this business now for five years, and I haven’t put a case study on a table for a client that doesn’t have a significant return on doing things differently – of the order sometimes 50:1 better at a minimum 3:1. That is I could spend a third of my marketing dollars and still get the same result.
Then it comes back to us and changing the way the way we run our business, and if you can’t get people to adapt and accept that machine learning is saying to turn left instead of right, and you can’t do that right to your coalface, then it doesn’t matter how good the technology is collecting this stuff, doesn’t matter how good the analytic technique is, doesn’t matter how good the analytics is linked to the strategy that you’re trying to implement.
Then that’s why I come back to my first point, Big Data will be a dirty word before you know it.
John: An important issue in this area is that of static analysis versus real time analysis. Looking at the spreadsheet and doing the analysis of what we’ve already stored is one thing, but looking at real time ‘big data’ analysis allows us to market better or do this better or that better.
Anthony: That’s an excellent point. I think real-time is coming, the only ones that I see that are close to real-time at present would be the law enforcement agencies, some of the more advanced ones. They’ve got lots of money and generally come from North America.
Then there’s the credit card monitoring by the banks, but that would be less than 2 percent of what’s going on. The rest of it is looking back on static data in my experience. Very few organisations – and I smiled when you said back to 1996 - very few organisations are looking at stuff back three years ago, four years ago.
John: Five years is usually the minimum and most people aren’t interested beyond that.
Anthony: They’re not interested, yes I agree with you, but with some organisations you’re going to fall down. Brambles the pallet guys, they’ve got data back to 1964 available on their system, and transaction data back to ’87, and you could say that’s not real relevant, but when the GFC hit us, you go back and look for the last catastrophic event and that was around 2001, and then back to ’91 was the last time that it happened, so you can really benefit from that sort of longevity of information. But real-time, it’s coming but it’s not here yet.
CRN: Presumably one of the important technologies in terms of real time and the general advancement of big data and analytics is closeted in-memory analytics,
Tiberio: Yes that’s a big trend. Of course you can have a big cluster of computers, and to have your data distributed only in-memory. There are ways to leverage substantial amounts of data. Of course it’s hard to actually do that with extremely large amounts of data, because otherwise everybody would be doing it.
So there is a sort of a threshold there, and for different questions, different business problems. If you have everything in memory it would be worthwhile to invest in infrastructure for that. For others, you just have to live with everything and explore at least a substantial part of what you have in this. So I think it really depends on your question. It’s not clear, it’s not something that you can say that there is the right answer for.
Andrew: It’s definitely a growth area. We expect to have two terabyte databases in memory, and certainly some of the products that we’ve [Intel] been bringing to markets make that much more possible.
IT is now front of centre in any organisation. Even if you’re looking at just the marketing of the organisation, the amount of spend that’s going from traditional above-the-line marketing into looking at social networks, and how do we really talk to our customers, and what can we learn from our customers. IT has to enable that to happen for an organisation.
And the question about the desire to do it, but also the budget to go and do these things is very real. There are so many things that an organisation can be doing in this space, but do they have any extra money to go and do it? A lot of the time they don’t.
IT departments need to start looking at how to drive down the costs of our infrastructure that we have at the moment. How old is that infrastructure? How much is it costing us? How much more efficiency can we get in a data centre if we can have huge levels of consolidation?
How, if we have simpler infrastructure, how much more manageability – so all the time we spend on managing our infrastructure, if we can get rid of some of those costs, it starts freeing up money and freeing up time, for IT professionals to start working on these problems that organisations have.
Organisations will live and die by their ability to respond to what their customers are saying, and the speed of response is going to be critical.
It won’t just be in low latency financial environments where that will be important. It will be your average consumers out there shopping. They are standing in a shop and they’ll get online and say ‘where else can I get this product, and what sort of prices can I get?’ and you have to be able to respond extremely quickly to customers.
I think more and more you’ll see in-memory will be very important, and certainly outside of science and outside of financial services, it will really start to grow.
CRN: Robert, I’m curious as to what your customers are saying about big data.
Robert: Utilities is probably a space where we are actually playing with ‘big data’ in quite a substantial way. A term we quite often use is ‘data exhaust’. A lot of our clients have got a lot of machine-generated data that is not actually being captured. It’s just being blown out the window, and nobody is doing anything with it to actually gain market intelligence to try to improve their operations.
A utility organisation who has a number of power stations around Australia, obviously has market data on the sell prices of electricity, and prices go up and prices go down based on demand. Their power stations are generating a vast amount of data from all the SCADA devices in their machinery that can tell them a lot of useful information from a historical perspective, as well as in real-time and they’re doing this today where they’re getting real visibility of what’s going on.
But one profound thing that actually happened throughout this exercise is that this organisation realised that power goes into a negative cost, so if the grids are over-producing and the data is telling them that, somebody has got to consume that data.
And so capturing all this data as well as capturing or sourcing additional data, over and above what you’re generating, you’ve got a lot of data to deal with obviously, and you need the tools and techniques to actually utilise those data to get effective outcomes.
For example projecting how much power might be used or be required next week, or the week after, because of the high temperature situation or something like that.
This client, from accessing all this data, interrogating it, and developing the reports and everything, was able to actually make vast amounts of money by consuming off the grid, rather than producing power. So with being able to access all this ‘big data’ and having the appropriate analytical tools to actually get that information and provide the appropriate reports, the ROI on that project was absolutely phenomenal.
We are actually seeing a lot of that, but it’s really just about the data exhaust that we’re finding. Storage is cheap now and capturing that data, keeping it, you might not have a business use for it right now, but having that historical capability to then start to use a lot of these emerging tools on the market to actually interrogate that data against what you already have, is actually going to give you a lot of business benefits in the future. That’s what we are finding has been the most profound change.
Some of our customers are capturing some 100, 200, 500 gigabytes of data a day from machinery that they’ve never actually captured data from before in the past. That’s a lot of data, and so you need very clever tools to be able to actually interrogate that to get what you need. So the tools are the next step.
Scott: The interesting question for me is what gave them the impetus or insight to actually start looking at the data tools?
Robert: Large power stations they have sensors and transducers, scatter software currently is very expensive to deploy, but scatter software is more of a control function to control scatter devices rather than to actually monitor and provide analytics. So capturing that data and storing it, gave the capability to actually look and see what is actually going on.
Scott: They were capturing it, but what made them think to start looking at it, when previously it was regarded as exhaust or not actually that valuable or low value density, and they weren’t really looking at it, what made them start? Are they the first ones that have done it that you’re aware of?
Robert: With this particular product yes. Splunk is the technology that was actually utilised in this case. And it was more about stumbling across the information that actually changed their business. There wasn’t a business directive involved in the project.
They just wanted to capture the data, so they have lots of analytical people within their organisation. After playing with it, they stumbled across a very important piece of information that business was missing out on, and the outcome of that was very positive.
Scott: It would be interesting to see whether you could control some of that stumbling shall we say. That seems to be the thing, people stumble across these insights, and that drives them to invest further – and then they start to wonder ‘what if’, but the tricky thing is how do you get them to deliberately, intelligently, architecturally start to stumble?
It’s interesting when you talk about big time, load stuff and do it yourself. I think one of the issues that’s happened in the past is that end users have relied on tools like Excel to make their own interpretation, whereas in the old days, they needed to get these big reports out of the data centres, and they sent each one out, and now they say ‘well I can do something in Excel’.
But Excel has a limited view of the data, and it would be good for them to say ‘can’t do that anymore, but here is something else’. But they still want to be able to do it themselves.
Peter: Yes, why should I change the way that I get this information and why should I change the way I solve problems.
Scott: Within the next couple of years, 25 percent of marketing managers will have transferred to digital, so that will be at the expense of some of the media. Now if they’re doing that, they’re going to want to make sure they’re getting a return on investment, and so that will be an impetus for how can I get more insights.
Chris: There are some examples of that, I think some of the simpler things with Google Analytics for your website - it sounds trivial, but it’s actually a non-trivial problem if you’re a big company and you’ve got a very large number of hits on your website.
You can get some very good insights out of that and it’s not in your own shop, but I think it still comes back to this issue of the guy who runs this business is the one who wants to be able to make the decisions, particularly if he has to pick the strategy.
Peter: Marketing people are going to be more accountable because you can in fact measure it. If someone says ‘okay I’ll put that over here, I want a lot of people seeing it’, you couldn’t tell whether they could or not, whereas you know whether they’ve looked at the page, you know they’ve flicked through and there’s a lot more information that you can analyse. So there’s a large accountability here that wasn’t there before.
Scott: Yes, it also makes the brand more accountable, right, because you have a more immediate link with your consumer.
Jonathan: I think you’re seeing the industry emerge, if you look at the SEO in the industry which reverse engineers the ‘big data’ to provide results and then analyse that to put key words then as an example. Look at online shopping or some of the online gaming industry which is very sophisticated online. You just need to follow where the money is, and dare I say it’s probably in the porn.
I’m sure it’s in the online gaming, where people are releasing new games very quickly, looking at the ‘swarming’ effects of who’s actually playing the games and working up ROIs, and shutting down those that don’t meet that criteria. So I think there’s a lot of insight.
I think your question is really interesting around how to control the stumbling. It’s really back to business basics, which is you’ve kind of got revenue and costs, and you understand you’ve got a cost issue or a revenue opportunity, you can then go and explore and say ‘maybe I should go and explore around here’ and ‘maybe I can get the right people with the right tools, because the tools are there.
John: One of the interesting things is because you’ve got the digital capability, you also have a lot of digital knowledge, so people have been hired to go through and do the click throughs, and you’ve got to have intelligent systems that can get beyond the knowledge, to know is your site really popular, or is it someone telling you it’s popular. There’s a lot of that stuff starting to happen more and more and you won’t really be able to understand it unless you’ve got the right tools to be able to handle it.
CRN: So do we feel that humans have essentially become stupid in this world?
Tiberio: No, humans are essential. You have to turn the power on. I agree with Peter, basically it’s a lot about the intelligence of the tools. I agree that the tools are there, but the tools we’ll see in five years time will look very different.
It’s essential that we keep on inventing the tools, simply because there’s lots of value in improving your tools and the system, and so whether there’s an improvement of tools or I guess there’s a lot of energy put on improving the tools themselves – so there are the two ends of the spectrum so to speak, improving the tools and actually how do you actually make the tools useable by the business. Which is this point about helping the customer.
John: The other thing about stumbling, with each of the market segments what we’ve got to do is find the first movers and help them get a competitive advantage, and as soon as they get a competitive advantage, the others will move pretty quickly.
Scott: Everyone is playing a waiting game of when do we get into this, we know we need to invest, but we’re just not sure as to where that value’s going to lie, but then the risk is if you wait too long, then you will suffer from others using this to their advantage and you’re the client.
I think the classic example is that little app that Amazon offers on your phone, which gives you the capability to walk out of the store, because they’ll offer you that product.
So you use their app to do a search on the product, and they will guarantee you 5 percent cheaper or whatever it is if you walk out of the store and buy from them. Now what are they doing with that information? That’s big data in real-time.
Jonathan: With big data the insight is slightly obscure because you’re trying to find it, or stumble across it. On the other hand, and this gets back to my ROI question, there is data or information sitting in organisations that people aren’t going to access there, and it’s not obscure, it’s there, there’s IP that’s just not in a format and it’s so obvious.
So I think my view is that again one needs to understand where your assets are in the organisation, because we roll out a whole lot of enterprise content management applications in Oracle, where the IP is sitting in the organisation and people can’t even access that, so never mind ‘big data’ where you’re not even sure where the IP might be, I think that technologists do get excited, and I think especially in some large organisations like utilities, telcos, banks, they offer value because there’s so much data and so many transactions, the return is multiplied so much.
I think certainly as an ROI, there is sometimes a lot easier places to start with just collecting the internal assets and putting them in a repository and getting a brain across that to identify that.
CRN: Laura your company DataFlex has put a lot of computer power into government agencies over the year. We know from research that governments have been fairly receptive to the cloud concept, but are you finding in the government sector that they are stumbling with regard to big data as well?
Laura: Well I’m rather enjoying this concept of stumbling, because I think we are an enabler of stumbling. As an IT reseller, we are the conduit between the emerging technologies and the business requirements of these organisations.
Certainly, talking about Federal Government, all the things we’ve discussed today, even though we’ve spent years putting in place infrastructure storage, storage has become cheaper, they’ve got the data, now it’s the paradigm between us as an IT reseller responding to the requirements of a client spanning hardware and software and becoming a thought leader.
Helping them and assisting them to address what they’re trying to achieve with these major data silos in data and getting benefits and worth out of the data they already have on premise.
So it’s interesting the discussion around the so-called dirty words like ‘cloud’ and ‘big data’.
Peter you mentioned ‘tablets’ as among the ‘dirty words’ for you. One of ours is probably ‘mobility’.
It’s an opportunity for an organisation like DataFlex, because we want to help organisations, especially in the Federal Government space, to go from what exists now, to what they could potentially be doing in the future, and I like this term ‘stumbling’, because that is it, we are the conduits from one phase to another.
Part 2
CRN: Do we feel that the conversation about Big Data is still pretty much limited or exclusive to CIOs, or is it seeping beyond CEOs and other decision makers catching on to the dirty words and calling for action?
Chris: I think that you’d have to be on a desert island not to have heard the term. But I still think people are wondering how to deal with it and work out what to do. We are sitting around the table talking about we know what the tool set is, and that’s good, I think undoubtedly we do know what the tool set is, but the CEO of a small business down the road doesn’t, and even if they did know what the tool set is, they couldn’t harness it. I still think it’s a really niche thing to do.
That’s why there are still people in the research sector who are thinking about it.
Rob: I 100 percent agree with that. It’s all about making the data relevant. You can capture so much data, but it’s about making it relevant, and marketing departments are going to want to have certain visibility, and certain pieces of information and linkages between certain data cells as to actually give them reporting or insight that they’re not currently getting.
But then you’ve also got other areas within business which are going to need to see the different pieces of data mining and analytics, for example security departments within organisations which need to see different pieces of converge data to give them the information they need to make their decisions. I think that’s where it’s actually sitting right now.
CIOs are hearing enough, they’re starting to understand that there are opportunities business-wise to utilise all this data, but it’s now time to get down into the real specialist areas within businesses, where they are going to actually be able to make good business use out of them. That’s where I think it’s heading right now.
CRN: Chris you touched on this point via email, the reported distinction between analytics and reporting. It speaks to the point of massive amounts of data and having to do something with it, but then the difference between extracting information and doing something with it.
Chris: That’s right, a couple of people have touched on the subject today. We were talking about getting the report from the data centre. We do get lots of information thrown at us, and I think one of the things that have come out from some of the people around the table today is the opportunity to take that knowledge or that information that we’ve got - sometimes it’s digestive information, not even big data information - and use it.
I just wonder sometimes if we have got to a point in our businesses, where we are running them so lean and we’re pressing people so tightly, that they don’t even have time. I keep thinking about the accounts department in our office. They’re constantly dealing with invoicing and purchase orders, do they have time to give me the business intelligence that I need?
Peter: That’s the point. One of the issues right now is that IT departments have been asked to cut costs. So cut costs in the IT department, not use IT to cut costs in the business, so I think there’s going to be a bit of resistance there.
It’s actually a bit of a strange thing, because IT was brought in to actually save the business money, and now it’s seen as a cost centre in itself. That’s going to right some difficulties in the IT department actually providing funding, so when you mentioned earlier who should be approached, I think it’s the CEO and CFO, not the IT department, because they’re going to still be defending having to cut their cost, but if I look at it from a purist perspective, information technology has been, until now, a bit about how do you process things efficiently, the processing.
Now it’s analysis and I think it’s the era of analysing. Things have been processed to death. You’ve got all sorts of business applications, you can do it all electronically, we’ve done all that – now it’s time to start to analyse, and I think it’s going to be the era of analysis.
CRN: Do you think a number of CEOs could be justified in turning around and saying this sounds a bit like Y2K, like you’ve got to do it, but we don’t really know why, but you need to invest, otherwise you’re going to fall behind?
Peter: I think like what John said, if you get a couple of organisations who say ‘I’ve done something here and it’s saved me money or it’s got me more sales’ then you’ll actually start to see a domino effect.
Jonathan: I totally agree, but it will be people following the money though. So you’ll get a hedge fund that has the ability to trade and arbitrage in XYZ by analysing all this data and making money, people will be going ‘well that’s exciting’ – or a utility going ‘I’ve found this cool thing’ ---- or the CIO getting a better real-time understanding of risks in whatever, and that’s not only revenue, but I think people will do this. People will make money.
Google Analytics are a great example of those scenarios and I think that’s what is happening in most of our businesses in different levels of maturity in different scale, but I think we are analysing and getting insight, and those businesses that are faster and better will be more successful and there will be case studies that people start to follow.
Tiberio: So in a way that’s precisely what has happened with internet property. Some large companies were extremely innovative, like Google, like Facebook, and they just broke a few paradigms in terms of which technology they should be using and how they should be using it, and guess what happened, they demonstrated value.
As soon as they demonstrated value companies tried to come along and play the same game. There is no reason why this is not going to happen in other industries.
John: I spent a number of years in the military, and the military has been a leader in the evolution of technology generally. If you go back to the Prussian wars, artillery and all that sort of thing, the military has been using big data for battlefield analytics for a long time. Now that started with humans in data and analysts and all the rest of it, and you can throw in the FBI and CIA in there.
But if you think about the complexity of warfare now, and what they have been using in big data to get human intelligence signals intelligence, all the other type of intelligence fields into a central command area to determine what is the target, or what are the targets and process that very quickly, with all of that data coming in which is real time unstructured data problem – and then they’re doing that now.
How did they catch Bin Laden, and you can go on with lots of other examples, and the speed of warfare, and we’re talking real-time. We’re not talking about holding that for a few weeks and then processing it at a later date, and then doing something with it. I think a lot of the technologies are now starting to come out of there into commercial segments and problems as well.
CRN: It’s like looking back to the origins of the internet isn’t it?
John: That’s right, it is. Putting my telco hat on here big data needs fibre. Our ability to have fibre around the place is important in our going forward although a lot of people go backwards and say ‘well you don’t need fibre because we’re doing okay’. You’re starting to push big data around, and the way you’re pushing it, it’s going to go to the cloud, it’s not going to be closest inhouse, and you’ll need fibre and stuff.
Chris: That’s a really interesting point. You want to be able to actually use obviously the computer today. You’ve got your own data centre and the big challenge for us is not the ability to do that from a technological perspective, it’s the fact that I’ve got two petabytes of data and I need a fast link to the cloud to do the processing.
Nobody can sell me that at a competitive rate, or at a rate that would make it attractive. So we actually have done the analysis on this, and we worked out that in fact nothing’s going to stop us using the cloud as an adjunct to our hardware, and it’s about the price of the coms.
Scott: It’s ironic really, because you’ve taken a new cost effective processing tool and made it cost ineffective.
Chris: Exactly and I think the whole term ‘communications’, how do we shift the data around, that is actually with current lockup.
Scott: Another form of data, it’s still early days and that’s video analytics. There are some leaders in that space that can do very effective analytics in crowded situations, and the potential there, because now you’re introducing another sense in human terms into the whole big data conversation, and the enhanced context that could give you around what you’re doing with all those boxes is amazing.
But how are you going to shift that video around? If it works well it’s going to be working well at a low resolution, because it’s going to be more easily adoptable, but it’s still bigger than a text file.
Stuart: I think it’s almost the opposite. Big data would solve that problem, because it would solve data getting big So a lot of the stuff we’re talking about here is how you move to these micro transactions, and location-based services.
That’s not big data right, it’s very small amounts of data, but the problem is that we’re storing it, so we’re doing post-processing, as soon as we turn it into real-time, you will find that data has a value over time, and you will put a value on it, and you’ll either use it or you’ll get rid of it, but you won’t keep storing it, because a lot of the stuff we’re talking about has only value for a very small period of time. A lot of the people I speak to around the region aren’t going to store all the data.
They just can’t afford to do that, even though disk is cheap, it’s still not effective for them to store it, and so it’s really saying, well do I capture everyone who came to my website, or do I only capture the people who made a transaction? There’s information you can build around both of those, but it’s just working out which has actually got value to you.
Chris: It may go beyond just extracting or filtering the data. It may go to the point where you really need to look at local processing to extract some intelligence from that data, so what’s the meaning of the information you were talking about?
What might you do instead of transmitting video to a central place and having processing there? You build a network camera that understands that’s a person, that’s a briefcase, the five points of facial recognition, you transmit the face.
Tiberio: Just to really subscribe to your point, it’s much cheaper to move computation around than to move data around, that’s something we’ve learned with the internet, again. So the internet is really a good teacher for us.
I see that this is going to really be an incentive for actually the processing being done locally, simply because of the very reason you explained, it’s just really too expensive. To Stuart’s point, I totally agree.
I think the more data you have, the more you learn about which pieces of your data are the really relevant pieces of data, and then you want to revise your protocol to really sample and acquire the data that is informative. There are signs to actually guide you on how to do those things.
Jonathan: But there’s still storage, so you can still store, and if you ever think that data has value in the future, you can still store it locally without shifting or moving it. So there is that lifecycle management of data, where you can just store it on really low-cost stuff, and you can get infinite storage at really low cost, if you one day think it might have value back.
Stuart: We were talking before about internet data. When you capture it, you don’t understand why you captured it; it’s just unstructured data. It’s about trying to work out that value.
The issue most people I talk about this to is the real time nature and what they can then do, because when they look at their organisation, and even if they could make a decision, they can’t actually respond in real time.
So it then questions the whole organisation. We are dealing with an energy company that is looking to do coal seam gas. Instead of the old way of running a minerals business, where you used to go and build a really big mine and you mined somewhere there for ten to fifteen years - now they’re going to build 1500 coal seam gas mines across the whole outback of Australia.
What that allows them to do is basically produce gas by igniting the coal when they require it. They’ve got a real-time way of producing fuel, to produce real-time electricity. So these are the ways that nature’s changing. When you try to compete against that model it’s very difficult, so it’s changing the way that customers are looking at their business, and trying to work out how I can compete in real time.
It comes back to the way people make decisions today, given that it’s based on investments they made in the past. How do you change that whole nature? As you see new entrants come into the market, they’re not using the same technology, not using the thing in the same way, and so this is what’s causing the issue.
If you’ve gone and invested millions of dollars in data warehousing and if you bought Teradata stuff, you have invested huge amounts of money, so why are you going to move off that? And at what point have you basically got a return on that investment.
These are the issues where a lot of people are saying ‘That’s why I’m not investing now, because I don’t know what I’m going to need to invest in, and whether my business is capable of running real time’.
CRN: How important you think open sourced software, open sourced approaches are for solving big data problems?
Tiberio: It’s very important I think in my opinion for many reasons, but let me focus on one of them. Many creative people out there really subscribe to the open source depository. It’s a very simple principle. So you can actually leverage a legion of extremely smart intelligent creative individuals who will actually create this infrastructure.
But if you close your eyes to open source, you are not leveraging that. It’s playing an important role and it’s not the whole solution, but I think it plays a critical role. Talking to Scott, he was making the point that it’s very important for big corporations as well. So the answer is definitely, extremely important, not a single solution, but extremely important.
CRN: Chris something you were talking about earlier that I thought was interesting is this issue of knowing too much, or discovering, opening up whole new avenues of information about which could potentially expose organisations to legal threats around issues like privacy.
Chris: Yes. Looking at what we do as our core business, selling stock information, and you look at what the hedge traders are doing, and what’s actually starting to happen now is that the regulator sector is now looking at the data, and they’ve started looking at – the first thing they do is they say I wonder what the exchanges are doing, let’s make sure the exchanges aren’t being naughty and that they’re doing their transactions as cheaply as possible.
But the question that a few regulators have come up with for example, is to say ‘well shouldn’t we start regulating hedge funds?’ and there’s a whole set of questions about how do you use this data the right way. How much data do you reveal and to whom, and what are they going to do with it.
So the minute something’s there, someone’s going to find a use for it. It may not always be a pleasant one. Look at what happened with Google and the Google maps thing. What’s come out in the press is really quite appalling really.
There’s a whole element of should you publish what you know, should you use what you know necessarily. If it’s available, does that just make it right to do something with it?
CRN: So are we talking about opportunities in terms of customers, large organisations only. Or is there an opportunity for mid-size companies, or SMBs to harness big data or is it a bit too early, too immature and maybe too expensive?
Stuart: I think it’s actually easier for the mid-sized people to adopt technology. We’re seeing the smaller companies, mid-sized and smaller companies do adopt the technology, purely because they don’t have huge investments in other areas, and they’re more able and capable of responding in real time.
When you get into large organisations where supply chains become fixed, and decisions are made nine months out, having real time data doesn’t change anything. You’re still budgeted and you’re going to produce this many things no matter what. So it becomes a very different market when you look at that type of thing.
CRN: Are they actually looking at big data solutions?
Andrew:They are looking at it as a service, for someone to provide that as a service to them. I think analytics is a service that will be growing.
Chris: The opportunity’s there. The smaller people come to us, because they’re the ones who are nimble, and not only that, often times, because they’re a small firm, they’ve got people who are dead keen on making a difference to the company and they can.
But I still think that the tool set’s not ready for them yet, and maybe the answer is that it comes up as a service. Maybe the other side of it is, and I went to this talk recently, by the guy who founded Freelancer, and maybe this is where you say ‘okay I’ve got this Big Data problem, here’s the data, I want this answer’ and Mr IT in Bangalore says ‘I can do that for you here, just give me $500’ and the average cost of putting that for the generation of an iPhone app was $650.
Now maybe they’re absolutely rubbish as an iPhone application, but the fact is that you can get one registered in the app store for that, and that’s where you might find that analytics becomes outsourced, and becomes outsourced cheaply.
Tiberio: The main barrier really is privacy. For some corporations, you just cannot do that. So I see it as one of the main barriers to success. It’s going to grow, Google has proven that it’s definitely a new revolution as a service, an analytical service, but for some critical industries, it seems not yet to be possible, because people are still too conservative, and they’re concerned about privacy problems.
Jonathan: I think that it’s different by industry. If you look at the tech industry, I’ve seen billionaires who have 20 or 30 people in their organisation run rings around most large organisations, because they’ve got really tech savvy smart guys who understand some of this stuff, and whilst I think Stuart is right, you might find some of the big cumbersome organisations that just have big investments in the older data applications.
The question is the ability. I’m just seeing so much disruption in every industry place, that you don’t need a billion dollars to be able to disrupt an industry.
The technology is making smaller players who are nimble, giving them such competitive advantage, and often you don’t need the capital invested that’s costing you so much money, to be able to disrupt and play.
In financial services the smart hedge fund guys can leverage and play without that much capital these days, and especially if they can do it enough times, and the data is right and they’re getting the right, and they’re hitting more often than anyone else.
So I do think that in the right industry, and I think we’re making the point, technology is becoming the critical success factor in so many of these industries that if we can leverage it and make the right plays and the right bets, and if you are one step ahead, and often if you’re not hamstrung by your old technology ---- but I see the world changing so quickly. Look at the top organisations today, versus the top 15 years ago.
We have to step back and say ‘where is our asset in our organisation?’ If we can work out where our asset is, and if it happens to be in the data or the meta data or the interaction between the meta data and you can identify it and leverage off it, and play, and make money out of it, you’ll be ahead of everyone else.
But I do think we have a risk where you have the organisation, someone going ‘big data is very cool, I want to spend five million dollars looking for something’. As opposed to someone saying ‘I think I’ve got some real asset here, and because I’m a smart guy, and guess what there might be a different way of drilling for coal seam gas, or running a hedge fund, or whatever, and maybe I can look at the data differently. That’s where the money is and people will make money out of it.
Rob: With online retailing, you’ve actually seen a massive transformation recently in Australia with online retail organisations challenging the large retailers who are trying to catch up, and they’re trying hard. Myer, David Jones, Kmart whatever, are all trying to keep up with Catch of the Day and so on.
They’re all acquiring new technology that is transforming their knowledge in how the market actually operates, because it’s a different way of operation. There is no bricks and mortar. All the data you collect in an online business is pure.
For traditional bricks and mortar businesses with an online presence they’ve got to start integrating that data together with the physical versus the online, and that becomes quite a complex thing.
Online retailers are just scooping it up, they really are, and they’re growing at hundreds of millions of dollars a year, very very quickly. They are adopting new technology and better insight into intelligence on what they’re doing, and that’s why they’re actually achieving the outcomes that they are. They’re already doing it.
CRN: Tiberio, presumably this group of smaller, smarter companies are largely the target market for this spinoff you were talking about earlier, can you tell us more about the company and the markets.
Tiberio: So NICTA is looking after a company that has big data analytics, it’s called Enviata, and it’s going to well basically harness data to grow businesses. We are basically focusing a large amount of our energy in sectors such as financial services and retail and are working with some of the largest Australian companies.
So what is our vision? It is in just about everything, in all of the data. What form is able to be scalable? It’s based on principles of internet property, large clusters of computers, infrastructure amongst other technologies, and the pillars, the algorithmic pillars below this infrastructure are larger than basically what is the machine running behind.
In other words the model is predictive in its ability to really automate the process of finding which patterns are relevant for which business problems, and the patterns will be different for different business problems, and the algorithms are going to decide which patterns are relevant.
Contrary to what Jonathan was saying, I think that intuition to a great extent is overrated. It is very important to get you started, but in my experience as a researcher. Often what you would guess has been the most informative aspects of your data are simply not, and the most predictive features end up being something that is completely unintuitive. How can you come up with that if you give a very strong attention to your prior knowledge, your domain expertise?
You need to learn domain expertise, that’s a fundamental but you need a critical technology that properly balances prior knowledge and exploration of new realities, new things that the human mind just cannot grasp, and that’s what we call machine learning. It just adds so much value and it has revolutionised the way people search on the internet, sy for example the ranking of web pages.
That’s why video technology, video analytics is actually working so well. The computer can better recognise a human face than a human today. How do you guess that is done? It’s not based on rules, it’s not a rule based system, if you have this pixel you use red and the other is blue and so on, it’s not like that.
It’s a very huge statistical approach, where you just ingest huge amounts of mug shot pictures of people and it just is this person this job, this person this job, and you just train the system to memorise so to speak, all that stuff.
Precisely because you’ve got a lot of data you can do that. That’s the whole thing. So you use the data. You use the data that you have to make a very sophisticated technology that’s actually doing something that’s very simple, but you need to be able to leverage the idiosyncrasies of this data, because big data is different from small data.
Big data is a mixture of different collections. You have online data, offline data, you have data that can properly analyse by batch processing, and you have data that is just streaming through. You have free text, you have transactional data, lots of everything. So if you make strong assumptions about your technology for one particular data silo so to speak, that’s going to wash all the information from the other.
So you need techniques that can jointly leverage all of these systems, without washing, without fuzzing the information from different sources. And that’s largely the type of business that we are using at Enviata. NICTA has the largest machine learning group in the Southern Hemisphere. One of the largest in the world.
I would say probably only Google, or Microsoft are ahead of us. We are several hundred people that includes scientists, research programmers, and top of experts in the field of machine learning. We really want to make machine learning out there to make a difference for big data.
Andrew: So if you take it like Tiberio said to the next level and once you get all this data how do you look at it, what do you see. To do what you’re talking about and to do lots of table forms of pros and cons, it’s going to take even longer to think about how to put it together, and I think the graphical display, graphical databases is the next evolution.
I saw an example the other day, which blew me away, where with some smart algorithms and some coding, you were looking at how two people interrelated, based on the phone numbers they had and therefore the phone calls, and then you added degrees of separation – so this was pumped up to six degrees, five degrees of separation.
It was 1.5 million phone records, or phone numbers in there, with all the phone records behind it, real set of data out of America and then they pick two phone numbers – it could be yours Stuart, it could be yours Tiberio, and they hit the button with some Intel compute power behind it, and within 1.5 seconds it traversed the 1.5 million phone records and put a spider web graph of how those two people were interrelated, based on what phone calls they’ve made with everyone else, the five degrees of separation.
1.5 seconds bang! So you can imagine a law enforcement agency using that, and then you look at that example, and then you transpose across other market segments where it makes sense. But the power of that was not only computer and the algorithms and all the rest of it, the power was actually the visualisation which I think is the next step.
Tiberio: We work with some of Australia’s largest organisations. For example financial services, and the problems are how they best make a decent profit. It’s really behaviour driven right. So we’ve got a transaction log. We’ve got so much data which hasn’t been yet used, as Robert mentioned previously.
If you are really clever with the way you do your test and control and treatment in your marketing design, you can elicit the right information to be able to really customise. All this talking about is internet property here, and we know that the business model of a company like Google and Facebook for example is largely reliant on time to advertise, and technology can be transferred to normal businesses to a large extent, and you can really improve marketing.
Think about drugs, I’m talking about non elicit drugs, drugs for the ageing and the ill. If you are in Sydney today and you are prescribed these drugs, you then go to Perth tomorrow for business, and when you get there, you’re sick again, because of the flight or whatever, and you go to another doctor and he diagnoses you with a certain thing and wants to give you other drugs.
Now those two drugs put together could form a deadly cocktail. But the doctor can’t rely on you telling him what you’ve taken the day before either. So if he can go into your network and see what drugs are being taken. But more than just what drugs have been taken.
They go into a database which tells you how it will interact with other drugs, which goes beyond human of intervention because it’s there --- straight away --- that is a real live example that’s happening in the US today, where the drug enforcement agency has got that availability with smart technology that system of doctors and hospitals, they’ve got that ability to do that, to work out what drug you’ve used and what you can use, and what’s the impact.