Hadoop Professionals

A Community for Hadoop Users

sheeraz mughal
  • Blog Posts
  • Discussions
  • Events
  • Groups
  • Photos
  • Photo Albums
 

sheeraz mughal's Page

Gifts Received

Gift

sheeraz mughal has not received any gifts yet

Give sheeraz mughal a Gift

Latest Activity

Profile Icon
sheeraz mughal left a comment for Jason Venner
Hi, Thank you very much for your reply and its was really informative. I have been given a responsiblity to head a research group in one of the leading Universities in Pakistan to create and head a research group on behalf of universities funding. I…
Dec 13, 2009
Profile Icon
Jason Venner left a comment for sheeraz mughal
Currently sequence files support compression but not encryption. It would be straight forward to encrypt the keys and values in sequence files by modifying the value class. Block level encryption wold require some changes. If you application is…
Dec 8, 2009
Profile Icon
sheeraz mughal left a comment for Jason Venner
Hi, Can we encrypt the data file in HDFS with any triple DES compatible algorithm or any else and then decrypt the input while map methods before passing it onwards for any business logic and then further to reducer??? I am like beginner in Hadoop…
Nov 13, 2009
Profile Icon
sheeraz mughal left a comment for Gonçalo
Hi Gonçalo, i was bit interested in this area of Image Recogination through hadoop and i would like if you could share your scope of project with me and how you gonna achieve this??? Thanks sheeraz
Nov 11, 2009
Profile Icon
Jason Venner left a comment for sheeraz mughal
Hadoop is a wonderful tool for working with large datasets, particularly when you fold in tools from the mahout project for various data analysis steps. When you have multi terabyte data sets, hadoop is your best friend.
Nov 9, 2009
Profile Icon
sheeraz mughal left a comment for Jason Venner
hi, To all Hadoop professionals and others i am working in a organization having world's largest Biometric Database and by keeping this fact in mind kindly let me know what can be built on top of it using hadoop where hadoop's core…
Nov 9, 2009
Profile Icon
sheeraz mughal is now a member of Hadoop Professionals Nov 9, 2009

Profile Information

Hadoop Experience Level
Beginner
Interests
Technology Research,All Kind of sports
Expertise
java,oracle ERP,LAMP etc
Past Projects
Worked as core developer & designer in following projects:
(CNIC) Computerized National Identity Card for pakistani citiziens. Currently world's largest Biometric Database,
(MRP) Machine Readable Passport, Developing portal for Smart Cart systems
Current Project
Looking for hadoop based problem area to write a research proposal for admission in university.
Available for Consulting
No
Your Website
http://www.jagoyar.com
Search Expertise
Beginner
HBase Expertise
Novice
Machine Learning Expertise
Beginner

Comment Wall (2 comments)

You need to be a member of Hadoop Professionals to add comments!

Join Hadoop Professionals

At 7:15am on December 8, 2009, Jason VennerJason Venner said…
Currently sequence files support compression but not encryption.
It would be straight forward to encrypt the keys and values in sequence files by modifying the value class.
Block level encryption wold require some changes.
If you application is prepared to handle the tokenization of the input stream after decryption, and the packaging of the output, you could shim in encryption.

The simplest case of this that I can thing of is to use the chain mapping from 19, on the map side where the first map in the change handles decrypting the input blocks, and parses out the key values and feeds them to the second map in the chain.
The reduce phase is not straight forward however as the decrypted records would not be available for combining or ordering.

If you are willing to have the data un encrypted while in process, then the problem is much simpler, as a custom input format/record reader and outputformat/record writer can handle this.
The map output data would be stored unencrypted on the local task tracker nodes and passed to the reducer via http.
At 8:14am on November 9, 2009, Jason VennerJason Venner said…
Hadoop is a wonderful tool for working with large datasets, particularly when you fold in tools from the mahout project for various data analysis steps.
When you have multi terabyte data sets, hadoop is your best friend.
 
 
 



Groups

© 2012   Created by Jason Venner.

Badges  |  Report an Issue  |  Terms of Service