CRN roundtable: the lowdown on big data

By on
CRN roundtable: the lowdown on big data
Page 4 of 4  |  Single page

I see that this is going to really be an incentive for actually the processing being done locally, simply because of the very reason you explained, it’s just really too expensive. To Stuart’s point, I totally agree.

I think the more data you have, the more you learn about which pieces of your data are the really relevant pieces of data, and then you want to revise your protocol to really sample and acquire the data that is informative. There are signs to actually guide you on how to do those things.

Jonathan: But there’s still storage, so you can still store, and if you ever think that data has value in the future, you can still store it locally without shifting or moving it. So there is that lifecycle management of data, where you can just store it on really low-cost stuff, and you can get infinite storage at really low cost, if you one day think it might have value back.

Stuart: We were talking before about internet data. When you capture it, you don’t understand why you captured it; it’s just unstructured data. It’s about trying to work out that value. 

The issue most people I talk about this to is the real time nature and what they can then do, because when they look at their organisation, and even if they could make a decision, they can’t actually respond in real time. 

So it then questions the whole organisation. We are dealing with an energy company that is looking to do coal seam gas. Instead of the old way of running a minerals business, where you used to go and build a really big mine and you mined somewhere there for ten to fifteen years -  now they’re going to build 1500 coal seam gas mines across the whole outback of Australia.

What that allows them to do is basically produce gas by igniting the coal when they require it. They’ve got a real-time way of producing fuel, to produce real-time electricity. So these are the ways that nature’s changing. When you try to compete against that model it’s very difficult, so it’s changing the way that customers are looking at their business, and trying to work out how I can compete in real time.

It comes back to the way people make decisions today, given that it’s based on investments they made in the past. How do you change that whole nature? As you see new entrants come into the market, they’re not using the same technology, not using the thing in the same way, and so this is what’s causing the issue. 

If you’ve gone and invested millions of dollars in data warehousing and if you bought Teradata stuff, you have invested huge amounts of money, so why are you going to move off that?  And at what point have you basically got a return on that investment. 

These are the issues where a lot of people are saying ‘That’s why I’m not investing now, because I don’t know what I’m going to need to invest in, and whether my business is capable of running real time’.

CRN: How important you think open sourced software, open sourced approaches are for solving big data problems?

Tiberio: It’s very important I think in my opinion for many reasons, but let me focus on one of them.  Many creative people out there really subscribe to the open source depository.  It’s a very simple principle. So you can actually leverage a legion of extremely smart intelligent creative individuals who will actually create this infrastructure. 

But if you close your eyes to open source, you are not leveraging that.  It’s playing an important role and it’s not the whole solution, but I think it plays a critical role. Talking to Scott, he was making the point that it’s very important for big corporations as well. So the answer is definitely, extremely important, not a single solution, but extremely important.

CRN: Chris something you were talking about earlier that I thought was interesting is this issue of knowing too much, or discovering, opening up whole new avenues of information about which could potentially expose organisations to legal threats around issues like privacy.

Chris: Yes. Looking at what we do as our core business, selling stock information, and you look at what the hedge traders are doing, and what’s actually starting to happen now is that the regulator sector is now looking at the data, and they’ve started looking at – the first thing they do is they say I wonder what the exchanges are doing, let’s make sure the exchanges aren’t being naughty and that they’re doing their transactions as cheaply as possible.

But the question that a few regulators have come up with for example, is to say ‘well shouldn’t we start regulating hedge funds?’ and there’s a whole set of questions about how do you use this data the right way. How much data do you reveal and to whom, and what are they going to do with it. 

So the minute something’s there, someone’s going to find a use for it.  It may not always be a pleasant one. Look at what happened with Google and the Google maps thing. What’s come out in the press is really quite appalling really. 

There’s a whole element of should you publish what you know, should you use what you know necessarily. If it’s available, does that just make it right to do something with it?

CRN: So are we talking about opportunities in terms of customers, large organisations only. Or is there an opportunity for mid-size companies, or SMBs to harness big data or is it a bit too early, too immature and maybe too expensive?

Stuart: I think it’s actually easier for the mid-sized people to adopt technology. We’re seeing the smaller companies, mid-sized and smaller companies do adopt the technology, purely because they don’t have huge investments in other areas, and they’re more able and capable of responding in real time. 

When you get into large organisations where supply chains become fixed, and decisions are made nine months out, having real time data doesn’t change anything.  You’re still budgeted and you’re going to produce this many things no matter what. So it becomes a very different market when you look at that type of thing.

CRN: Are they actually looking at big data solutions?

Andrew:They are looking at it as a service, for someone to provide that as a service to them. I think analytics is a service that will be growing.

Chris: The opportunity’s there. The smaller people come to us, because they’re the ones who are nimble, and not only that, often times, because they’re a small firm, they’ve got people who are dead keen on making a difference to the company and they can. 

But I still think that the tool set’s not ready for them yet, and maybe the answer is that it comes up as a service.  Maybe the other side of it is, and I went to this talk recently, by the guy who founded Freelancer, and maybe this is where you say ‘okay I’ve got this Big Data problem, here’s the data, I want this answer’ and Mr IT in Bangalore says ‘I can do that for you here, just give me $500’ and the average cost of putting that for the generation of an iPhone app was $650. 

Now maybe they’re absolutely rubbish as an iPhone application, but the fact is that you can get one registered in the app store for that, and that’s where you might find that analytics becomes outsourced, and becomes outsourced cheaply.

Tiberio: The main barrier really is privacy.  For some corporations, you just cannot do that.  So I see it as one of the main barriers to success. It’s going to grow, Google has proven that it’s definitely a new revolution as a service, an analytical service, but for some critical industries, it seems not yet to be possible, because people are still too conservative, and they’re concerned about privacy problems.

Jonathan: I think that it’s different by industry. If you look at the tech industry, I’ve seen billionaires who have 20 or 30 people in their organisation run rings around most large organisations, because they’ve got really tech savvy smart guys who understand some of this stuff, and whilst I think Stuart is right, you might find some of the big cumbersome organisations that just have big investments in the older data applications.

The question is the ability. I’m just seeing so much disruption in every industry place, that you don’t need a billion dollars to be able to disrupt an industry.

The technology is making smaller players who are nimble, giving them such competitive advantage, and often you don’t need the capital invested that’s costing you so much money, to be able to disrupt and play.

In financial services the smart hedge fund guys can leverage and play without that much capital these days, and especially if they can do it enough times, and the data is right and they’re getting the right, and they’re hitting more often than anyone else.

So I do think that in the right industry, and I think we’re making the point, technology is becoming the critical success factor in so many of these industries that if we can leverage it and make the right plays and the right bets, and if you are one step ahead, and often if you’re not hamstrung by your old technology ---- but I see the world changing so quickly.  Look at the top organisations today, versus the top 15 years ago. 

We have to step back and say ‘where is our asset in our organisation?’ If we can work out where our asset is, and if it happens to be in the data or the meta data or the interaction between the meta data and you can identify it and leverage off it, and play, and make money out of it, you’ll be ahead of everyone else.

But I do think we have a risk where you have the organisation, someone going ‘big data is very cool, I want to spend five million dollars looking for something’. As opposed to someone saying ‘I think I’ve got some real asset here, and because I’m a smart guy, and guess what there might be a different way of drilling for coal seam gas, or running a hedge fund, or whatever, and maybe I can look at the data differently. That’s where the money is and people will make money out of it.

Rob: With online retailing, you’ve actually seen a massive transformation recently in Australia with online retail organisations challenging the large retailers who are trying to catch up, and they’re trying hard. Myer, David Jones, Kmart whatever, are all trying to keep up with Catch of the Day and so on.

They’re all acquiring new technology that is transforming their knowledge in how the market actually operates, because it’s a different way of operation. There is no bricks and mortar.  All the data you collect in an online business is pure. 

For traditional bricks and mortar businesses with an online presence they’ve got to start integrating that data together with the physical versus the online, and that becomes quite a complex thing. 

Online retailers are just scooping it up, they really are, and they’re growing at hundreds of millions of dollars a year, very very quickly.  They are adopting new technology and better insight into intelligence on what they’re doing, and that’s why they’re actually achieving the outcomes that they are. They’re already doing it.

CRN: Tiberio, presumably this group of smaller, smarter companies are largely the target market for this spinoff you were talking about earlier, can you tell us more about the company and the markets.

Tiberio: So NICTA is looking after a company that has big data analytics, it’s called Enviata, and it’s going to well basically harness data to grow businesses. We are basically focusing a large amount of our energy in sectors such as financial services and retail and are working with some of the largest Australian companies.

So what is our vision? It is in just about everything, in all of the data. What form is able to be scalable? It’s based on principles of internet property, large clusters of computers, infrastructure amongst other technologies, and the pillars, the algorithmic pillars below this infrastructure are larger than basically what is the machine running behind.

In other words the model is predictive in its ability to really automate the process of finding which patterns are relevant for which business problems, and the patterns will be different for different business problems, and the algorithms are going to decide which patterns are relevant.

Contrary to what Jonathan was saying, I think that intuition to a great extent is overrated.  It is very important to get you started, but in my experience as a researcher. Often what you would guess has been the most informative aspects of your data are simply not, and the most predictive features end up being something that is completely unintuitive.  How can you come up with that if you give a very strong attention to your prior knowledge, your domain expertise? 

You need to learn domain expertise, that’s a fundamental but you need a critical technology that properly balances prior knowledge and exploration of new realities, new things that the human mind just cannot grasp, and that’s what we call machine learning.  It just adds so much value and it has revolutionised the way people search on the internet, sy for example the ranking of web pages.

That’s why video technology, video analytics is actually working so well. The computer can better recognise a human face than a human today.  How do you guess that is done?  It’s not based on rules, it’s not a rule based system, if you have this pixel you use red and the other is blue and so on, it’s not like that. 

It’s a very huge statistical approach, where you just ingest huge amounts of mug shot pictures of people and it just is this person this job, this person this job, and you just train the system to memorise so to speak, all that stuff. 

Precisely because you’ve got a lot of data you can do that. That’s the whole thing. So you use the data. You use the data that you have to make a very sophisticated technology that’s actually doing something that’s very simple, but you need to be able to leverage the idiosyncrasies of this data, because big data is different from small data. 

Big data is a mixture of different collections. You have online data, offline data, you have data that can properly analyse by batch processing, and you have data that is just streaming through. You have free text, you have transactional data, lots of everything. So if you make strong assumptions about your technology for one particular data silo so to speak, that’s going to wash all the information from the other. 

So you need techniques that can jointly leverage all of these systems, without washing, without fuzzing the information from different sources.  And that’s largely the type of business that we are using at Enviata. NICTA has the largest machine learning group in the Southern Hemisphere. One of the largest in the world. 

I would say probably only Google, or Microsoft are ahead of us. We are several hundred people that includes scientists, research programmers, and top of experts in the field of machine learning. We really want to make machine learning out there to make a difference for big data. 

Andrew: So if you take it like Tiberio said to the next level and once you get all this data how do you look at it, what do you see. To do what you’re talking about and to do lots of table forms of pros and cons, it’s going to take even longer to think about how to put it together, and I think the graphical display, graphical databases is the next evolution.

I saw an example the other day, which blew me away, where with some smart algorithms and some coding, you were looking at how two people interrelated, based on the phone numbers they had and therefore the phone calls, and then you added degrees of separation – so this was pumped up to six degrees, five degrees of separation.

It was 1.5 million phone records, or phone numbers in there, with all the phone records behind it, real set of data out of America and then they pick two phone numbers – it could be yours Stuart, it could be yours Tiberio, and they hit the button with some Intel compute power behind it, and within 1.5 seconds it traversed the 1.5 million phone records and put a spider web graph of how those two people were interrelated, based on what phone calls they’ve made with everyone else, the five degrees of separation. 

1.5 seconds bang! So you can imagine a law enforcement agency using that, and then you look at that example, and then you transpose across other market segments where it makes sense.  But the power of that was not only computer and the algorithms and all the rest of it, the power was actually the visualisation which I think is the next step.

Tiberio: We work with some of Australia’s largest organisations. For example financial services, and the problems are how they best make a decent profit.  It’s really behaviour driven right. So we’ve got a transaction log. We’ve got so much data which hasn’t been yet used, as Robert mentioned previously.  

If you are really clever with the way you do your test and control and treatment in your marketing design, you can elicit the right information to be able to really customise. All this talking about is internet property here, and we know that the business model of a company like Google and Facebook for example is largely reliant on time to advertise, and technology can be transferred to normal businesses to a large extent, and you can really improve marketing. 

Think about drugs, I’m talking about non elicit drugs, drugs for the ageing and the ill. If you are in Sydney today and you are prescribed these drugs, you then go to Perth tomorrow for business, and when you get there, you’re sick again, because of the flight or whatever, and you go to another doctor and he diagnoses you with a certain thing and wants to give you other drugs. 

Now those two drugs put together could form a deadly cocktail. But the doctor can’t rely on you telling him what you’ve taken the day before either. So if he can go into your network and see what drugs are being taken. But more than just what drugs have been taken.

They go into a database which tells you how it will interact with other drugs, which goes beyond human of intervention because it’s there  --- straight away  --- that is a real live example that’s happening in the US today, where the drug enforcement agency has got that availability with smart technology that system of doctors and hospitals, they’ve got that ability to do that, to work out what drug you’ve used and what you can use, and what’s the impact.

Previous Page
1 2 3 4 Single page
Got a news tip for our journalists? Share it with us anonymously here.
Copyright © nextmedia Pty Ltd. All rights reserved.
Tags:

Log in

Email:
Password:
  |  Forgot your password?