Hadoop Professionals

A Community for Hadoop Users

hi,

i'm working on a data warehouse and am deciding whether to use hadoop or mysql.

the dataset is currently likely to be no bigger than 40gb for the first year, then perhaps 80gb for the next year, and possibly 120gb the year after.

we want to be able to query all of the data at any point in the future - we aren't interested in throwing data away since we can't envisage how we might want to use it.

so, would hadoop be the right choice? i don't need high availability since this will be a back-office application, and the number of different queries won't be problematic - if staff want to perform queries, they don't really need real-time results. would it be better to just have a reasonably powerful mysql server just grinding through the data? at what point does it become useful to use hadoop - initially we won't need more than a single node.

any help would be appreciated.

thanks

Views: 2

Reply to This

Replies to This Discussion

If your data is highly relational, your users will have a simpler time accessing it if it is stored in a more traditional data warehouse.
The sizes you are talking about are very small, I have some of the higher end solid state devices for storage, that could easily encompass several years of your data, and I can peek access at over 1Gigabyte per second. You could probably just run mysql on top of them with that data size.

AsterData offers a hybrid solution.

For non relational data, hadoop can be a very cost effective solution. Hadoop at scale does require significant skill.

Reply to Discussion

RSS




Groups

© 2012   Created by Jason Venner.

Badges  |  Report an Issue  |  Terms of Service