Hello all,
Hope all is well in the community. I am inquiring on how to apply hadoop to retrieve information from various blogs, news feeds, etc.. in a particular fashion.
I have identified three groups of word pairs that are valuable to me. I would like to explore the clustering patterns among particular URL's of these particular word pairs in their respective blog spaces, news feeds, etc.
So, given that I have an expected output structure i.e. three groups of words that I believe to have distinct attributes, I am aware that I am trying to develop a supervised learning method.
The question is, is there a simple way to develop this procedure with Hadoop? Where I can search the web spaces for lets say one week, then record the following information: common occurrence of a particular group of words; host name; and any other interesting meta tag information that I find relevant. Then sequentially analyze the data with a supervised clustering technique that is supported by dendogram and hierarchical cluster graphical output.
I would greatly appreciate any information.
Best,
Mark
You need to be a member of Hadoop Professionals to add comments!
Join Hadoop Professionals