Last week I was presenting on big data and the impact on eHealth at the CeBIT Big Data Conference.
The slides are now available via slideshare.
Gartner have put Big Data on the upslope of the hype cycle, stating that Big data is one of the most hyped terms in the market today. Some (admittedly with an interest in this field grow) have declared Gartner wrong, but for me, a telling measure of `hype’ is that I’ve presented at three Australian big-data industry workshops in last the few weeks. Nonetheless, of the technologies on the hype cycle most likely to change the world, big data is at the top of my list: and the biggest change is likely to be felt in Health.
Big Data changes everything for Health
The Big Data that I’m talking about is not the data sitting on a billing system server, it’s the breadcrumbs of information about everything that is known and recorded about everything to do with my health. And that includes the health of anyone who is anything like me to give an understanding of what interventions could (or should) apply to improve my health. And it’s not just the existence of that data, Big Data is about turning all that data into actionable information. Although actionable information grew from business intelligence, the concept is directly applicable to health care services.
Another thing that’s quite interesting is the whole big data and analytics side. It is quite hyped up right now, but the promise is there. Companies are able to process information that they were not able to do before – unstructured information – but it’s also about the speed at which they can do this
Part of Big Health Data is the massive data of genomics (about a terabyte per person) and proteomics (maybe hundreds of terabytes per person). The main problem for that data is massive pattern matching, and the main advances are through building efficient signal processing that can crunch those patterns on commodity hardware. This is the so-called “lab at the bedside” translational medicine that NICTA and others are working on. For the ‘omics, the problem is to either crush the data (reduce storage) or crunch the data faster, producing the equivalent of supercomputing with commodity hardware.
For me, the more interesting aspect is making all the unstructured data — the stuff that shows up in the other field of SQL databases — become actionable.
That is, the myriad small pieces of clinical notes and information floating around describing the whole-of-person and whole-of-nation health. Sebastian Seung has suggest this might be called my connectome. In this sense, Big Data is not a new technology, it’s a new philosophy. It’s not that the data is suddenly available: we’ve had electronic records and digitally share-able information forever. It’s the demand that we must combine all this unstructured data and use it and the expectation that this is possible that is the key.
Three tennets for Big Unstructured Health Data
Value the other
Current medical informations systems are often designed with relational databases in mind: fields are created that characterise the information, and perhaps with a view to allowing researchers access later. Eventually, the designer creates a field other where free text information is placed. This field captures the main value in the health record. In some cases many other fields are left blank, and the full notes placed in other. For this reason, natural language and unstructured data mining approaches for the other data are required. The key message: some structure is good, but forced structure is not needed if sufficiently powerful analytic approaches are used.
The next generation of informatics leader is not likely to be working in the same health service that needs the next generation of informatics software. The key here is social innovation: building structures that facilitate the social innovation ecosystem will drive major advances. The objective here is to harness the best of the open innovation community to the task at hand.
Some emerging areas are starting to appear, such as Kaggle who have coined the phase Making Data Science a Sport and have developed the approach of data analytics as a competitive sport. The $3m Heritage Health Prize, where NICTA’s combined team is in the top 20 of 1,441 teams, is one example.
Analytics as a service
There is growing interest in the idea of Machine Learning as a Service. This is where web-like (RESTful) API’s are built to allow the best algorithms access to data in a safe way, that allows end-users to build inference engines in much the same way that a child might build a Lego city. Google has announced a similar approach, and an excellent review is available. Analytics as a service for health expands the data mining and social innovation concepts to allow anyone who wants analysis done to get the best results without first employing a team of data analysts. The approach combines
- A data service that exposes API’s and de-identified data to analytics engines
- An analytics service that allows algorithms to be safely housed and applied to the data
- Storage for the data, meta-data and privacy-preserving techniques (middleware) that ensures the service conforms to specifications
This allows medical data users to treat analytics as a commodity resource, in the same way that one might view IT as a commodity. or telephone carriers as a commodity.
Opening the unstructured data of health will allow data triage and highlighting the actionable information from the raw data of clinical records. We are not at the point that Google suggested for a 2008 April Fool’s joke: searching tomorrow’s internet today but we are moving toward analysing tomorrow’s health today.
Changing the 80-20 rule of health
There are many references to the concept that the software catechism of 80-20 applies to health. In software development, there is a belief that about 80% of the processor power is chewed up by about 20% of the code. In health, similar statistics exist: about 80% of the spending of health is directed toward around 20% of the health outcomes — such as, the massive spending on acute care. Conversely, around 80% of the need (eg. preventative care) receives 20% of the attention. Actually, preventative care receives much less than 20% of health spending, see the previous post. But, accessing the unstructured big data of health may start to shift this balance, without requiring a substantial budget shift, at least, that’s the hope for Big Data.