Cloud Computing, 5V ,Data warehousing and Business Intelligence

The 3V volume, variety, velocity,veracity,value Story:

Datawarehouses maintain data loaded from operational databases using Extract Transform Load ETL tools like informatica, datastage, Teradata ETL utilities etc…
Data is extracted from operational store (contains daily operational tactical information) in regular intervals defined by load cycles. Delta or Incremental load or full load is taken to datwarehouse containing Fact and dimension tables which are modeled on STAR (around 3NF )or SNOWFLAKE schema.
During business Analysis we come to know what is granularity at which we need to maintain data. Like (Country,product, month) may be one granularity and (State,product group,day) may be requirement for different client. It depends on key drivers what level do we need to analyse business.

There many databases which are specially made for datawarehouse requirement of low level indexing, bit map indexes, high parallel load using multiple partition clause for Select(during Analysis), insert( during load). data warehouses are optimized for those requirements.
For Analytic we require data should be at lowest level of granularity.But for normal DataWarehouses its maintained at a level of granularity as desired by business requirements as discussed above.
for Data characterized by 3V volume, velocity and variety of cloud traditional datawarehouses are not able to accommodate high volume of suppose video traffic, social networking data. RDBMS engine can load limited data to do analysis.. even if it does with large not of programs like triggers, constraints, relations etc many background processes running in background makes it slow also sometime formalizing in strict table format may be difficult that’s when data is dumped as blog in column of table. But all this slows up data read and writes. even is data is partitioned.
Since advent of Hadoop distributed data file system. data can be inserted into files and maintained using unlimited Hadoop clusters which are working parallel and execution is controlled byMap Reduce algorithm . Hence cloud file based distributed cluster databases proprietary to social networking needs like Cassandra used by facebook etc have mushroomed.Apache hadoop ecosystem have created Hive (datawarehouse)
http://sandyclassic.wordpress.com/2011/11/22/bigtable-of-google-or-dynamo-of-amazon-or-both-using-cassandra/

With Apache Hadoop Mahout Analytic Engine for real time data with high 3V data Analysis is made possible.  Ecosystem has evolved to full circle Pig: data flow language,Zookeeper coordination services, Hama for massive scientific computation,

HIPI: Hadoop Image processing Interface library made large scale image processing using hadoop clusters possible.
http://hipi.cs.virginia.edu/

Realtime data is where all data of future is moving towards is getting traction with large server data logs to be analysed which made Cisco Acquired Truviso Rela time data Analytics http://www.cisco.com/web/about/ac49/ac0/ac1/ac259/truviso.html

Analytic being this of action: see Example:
http://sandyclassic.wordpress.com/2013/06/18/gini-coefficient-of-economics-and-roc-curve-machine-learning/

with innovation in hadoop ecosystem spanning every direction.. Even changes started happening in other side of cloud stack of vmware acquiring nicira. With huge peta byte of data being generated there is no way but to exponentially parallelism data processing using map reduce algorithms.
There is huge data out yet to generated with IPV6 making possible array of devices to unique IP addresses. Machine to Machine (M2M) interactions log and huge growth in video . image data from vast array of camera lying every nuke and corner of world. Data with a such epic proportions cannot be loaded and kept in RDBMS engine even for structured data and for unstructured data. Only Analytic can be used to predict behavior or agents oriented computing directing you towards your target search. Bigdatawhich technology like Apache Hadoop,Hive,HBase,Mahout, Pig, Cassandra, etc…as discussed above will make huge difference.

kindly answer this poll:

Which tool/technology you use more often for analysis
Datawarehousing, ETL toolsBusiness IntelligenceHadoop,hive,Hbase,MahoutOther Bigdata tools

 

Some of the technology to some extent remain Vendor Locked, proprietory but Hadoop is actually completely open leading the the utilization across multiple projects. Every product have data Analysis have support to Hadoop. New libraries are added almost everyday. Map and reduce cycles are turning product architecture upside down. 3V (variety, volume,velocity) of data is increasing each day. Each day a new variety comes up, and new speed or velocity of data level broken, records of volume is broken.
The intuitive interfaces to analyse the data for business Intelligence system is changing to adjust such dynamism  since we cannot look at every bit of data not even every changing data we need to our attention directed to more critical bit of data out of heap of peta-byte data generated by huge array of devices , sensors and social media. What directs us to critical bit ? As given example
http://sandyclassic.wordpress.com/2013/06/18/gini-coefficient-of-economics-and-roc-curve-machine-learning/
f
or Hedge funds use hedgehog language provided by :
http://www.palantir.com/library/
such processing can be achieved using Hadoop or map-reduce algorithm. There are plethora of tools and technology which are make development process fast. New companies are coming  from ecosystem which are developing tools and IDE to make transition to this new development  easy and fast.

When market gets commodatizatied as it hits plateu of marginal gains of first mover advantage the ability to execute becomes critical. What Big data changes is cross Analysis kind of first mover validation before actually moving. Here speed of execution will become more critical. As production function Innovation givesreturns in multiple. so the differentiate or die or Analyse and Execute feedback as quick and move faster is market…

This will make cloud computing development tools faster to develop with crowd sourcing, big data and social Analytic feedback.

Economic slowdown Problem lesson and solution

Einstein: Predicted the Economic slowdown and related problems caused due to over use of mathematics when he said.
Albert Einstein said “Elegance is for tailors warning against mathematics do not believe in it only because its beautiful formulae”.
A monster called Synthetic CDO which was created caused the episode Financial crisis Investment banking fund demonstrated this overindulgence with mathematics..Sentiments are Psychology and sociology, the fundamentals are economics, finance and Mathematics is just calculation…What buddha said everything is in balance who can balance who is equal in all areas..who can do justice to each area and no preference of one. Its easy to make 200 rupees from 100 rupees but its difficult to make 200 crore from hundred crore because it averages out..hedge funds invest in bulk in big ticket investment which has similar effect the Macro and micro Economics conditions from time to time influences the particular sector with high growth trajectory..but the is law of diminishing marginal utility which when applied to market reads out..as more and more market absorb money the return averages out.person increases consumption of a product – while keeping consumption of other products constant – there is a decline in the marginal utility that person derives from consuming each additional unit of that product.

The utility of sector such as now biotech decreases out as more and more its absorbed in market. same way as we consume sweet first time we feel its very sweet as well marginally increase sweet amount of value we drive from it single unit decreses marginally

In buffet-style restaurants operate. They entice you with “all you can eat,” all the while knowing each additional plate of food provides less utility than the one before. And despite their enticement, most people will eat only until the utility they derive from additional food is slightly lower than the original.
Excellent Example: say you go to a buffet and the first plate of food you eat is very good. On a scale of ten you would give it a ten. Now your hunger has been somewhat tamed, but you get another full plate of food. Since you’re not as hungry, your enjoyment rates at a seven at best. Most people would stop before their utility drops even more, but say you go back to eat a third full plate of food and your utility drops even more to a three. If you kept eating, you would eventually reach a point at which your eating makes you sick, providing dissatisfaction, or ‘dis-utility’.

Quant of Investment banking depend on complex mathematics to predict interest rate of future, or how volatile interest rate in future,or prepayment be in future and translate into price depends on your view but mathematics did not cause financial crisis its the greed which caused financial crisis.It  can be corrected to win win for everyone what is needed is middle path which drives equal respect to Economics,Mathematics,Psychology,Sociology are are equally important.Mathematics is just medium if we do not quantise everything we cannot relate and predict it would be worst situation.But problem is actually these models are not predicting they are a views you drive..no body can predict in accuracy how many people are going to prepay there mortages or are going to default in future, or how many company are going to default in future.Its put in algorithms and then for sometime its very statisfying to see everything working according to it.CDO were excellent instruments where anyone get to choose from all risk first packaged into one bond then it can be sliced and diced based on risk you want to take you get to choose the slice.To Sell CDO you need to have big profit margin to cover risk or margin for error.When diminishing marginal utility comes into picture..First utility is great so everyone wants to enter in CDO then more people enter the competition increases the profit margin shrinks. the CDO brand then gets commoditized  which leads to fall in profit which investment banking firms take..so it profit decrease but margin for error is same..as more people enter market in CDO the size of investment becomes huge leading to fall in profit..and making it exposed to risk of margin of error which is not covered due to increased competition.Danger of mathematics credit derivatives become evident.

Minimal computation: abstarct manipulation of symbols,ability to see patterns in abstract mathematical symbol.Like probability of  housing loan defaults happening with behaviour of two companies going independently vis correlation the factors of probability of risk is high..how these mortgages interest with each other..but this is not problem then assumptions are made on top of it then its incorporated into model.