Big Data Analysis


There has been an increase in data produced in Health Informatics,referred to as big data. Kamesh, Neelima and Priya (2015) define bigdata as the apparatus and procedures that make it possible for acompany to create, control and handle huge data sets as well asstorage facilities. Herland, Khoshgoftaar and Wald (2014 p. 2) usefive V’s to define big data, which are “volume, velocity,variety, veracity and value.” Volume is the intensity of data,velocity is the speed of generating new data, variety is the extremeof data’s intricacy, and veracity determines the genuineness ofdata, while value assesses the quality of data. Analysis of such dataresults in limitless opportunities of gaining knowledge inhealthcare.

There have been changes in technology and as a result, the area ofhealth informatics is handling big data knowledge. Using data mining,diagnosis, treatment, assisting and healing all sick individualswithin a healthcare setting is becoming simpler (Ryu &amp Song,2014). Data mining is the evaluation of huge data sets to ascertainsequences and employ them in predicting the possibility of futurehappenings (Crockett, Johnson &amp Eliason, 2014). Healthinformatics employs data mining in healthcare with the objective ofacquiring medical insight. Big data has to be analyzed in a reliablemanner using different approaches.

One approach is gene expression of data with the objective offoretelling clinical outcome. Data collected from the molecular levelof big data frequently faces the challenge of increaseddimensionality. This means that the data comprises of huge amounts ofindependent features since the molecular data level comprisesnumerous probable molecules that are represented using datasets. Byusing gene expression, it becomes possible to respond to clinicalquestions. For instance, when dealing with a patient that has beendiagnosed with cancer, gene expression makes it possible to foretellthe possibility of the patient relapsing to cancer again in a periodof five years (Salazar et al, 2011).

Another way of analyzing big data reliably is the use of MRI data tomake clinical predictions. MRI data when combined with clinicalaspects makes it possible to determine connections amid physicalillnesses and diverse brain locations (Wang, Li &amp Perrizo, 2015).Analysis of big data via MRIs may be crucial to clinical diagnosis aswell as predictions that provide physicians an alternative to usewhen making decisions. Forming a complete connectivity map of apatient’s brain may result in information, which can assist invalidating the reasons individuals have specific brain disorders.This eases physician’s diagnosis, early detecting of prospectdisease or possibly preventing mental or physical diseases.

Big data can be analyzed reliably by using population level data inresponding to both clinical level issues in addition to epidemicscale issues. In general, health informatics information is collectedfrom physicians, the hospital and now the internet. It is possible toobtain internet data from social media sites like twitter, or byGoogle. Post on twitter act as data that can be applied in trackingepidemics. There is a lot of information posted on social media,which relates to health care (McDonald &amp Brown, 2013). It ispossible that the big data volume can comprise of helpful epidemicinformation. Data mining becomes applicable as it makes it possibleto analyze the information posted via social media, and in the endfind useful information (Signorini, Segre and Polgreen, 2011).


Crockett, D., Johnson, R &amp Eliason, B. (2014). What is datamining in healthcare? Health Catalyst, 1-12.

Herland, M., Khoshgoftaar, T. M &amp Wald, R. (2014). A review ofdata mining using big data in health informatics. Journal of BigData, 1(2), 1-35.

Kamesh, D. B. K., Neelima, V &amp Priya, R. R. (2015). A review ofdata mining using big data in health informatics. InternationalJournal of Scientific and Research Publications, 5(3), 2250-3153.

McDonald, E &amp Brown, C. T. (2013). Working with big data inBioinformatics. CoRR ,1–18. Retrieved from

Ryu, S &amp Song, T. (2014). Big data analysis in healthcare.Healthcare Research Information, 20(4), 247-248.

Salazar, R., Roepman, P., Capella, G., Moreno, V., Simon, I.,Dreezen, C., Lopez-Doriga, A., Santos, C… (2011). Gene expressionsignature to improve prognosis prediction of stage II and IIIcolorectal cancer. Journal of Clinical Oncology 29, 17–24.

Signorini A., Segre, A. M &amp Polgreen, P. M. (2011). The use oftwitter to track levels of disease activity and public concern in theU.S. during the influenza A H1N1 pandemic. PLoS ONE, 6 (5).

Wang, B., Li, R &amp Perrizo, W. (2015). Big data analytics inbioinformatics and healthcare. Hershey, PA: Medical InformationScience Reference.