CC Open Source Blog

'Summer of Code Project: "Including RDFa support in Nutch: Updating the ccNutch

gravatar

by alankelon on 2007-04-13

Hello!

I'm one of the selected students for Google Summer of Code 2007 and I'm pleased to be joining Creative Commons community this summer. My project title is Including RDFa support in Nutch: Updating the ccNutch plug-in under mentoring of Nathan R. Yergler. The abstract (with hyperlinks missing in soc page) is:

RDFa is emerging standard from W3 Consortium to provide a syntax that expresses semantics in structured data using a set of elements and attributes that embeds RDF in HTML, such as a license on a document or
a photo’s creator name and its camera setting information.

Nutch is an open source search engine that uses Lucene for searching the Web (or a subset of it) or in a customized form for an intranet. ccNutch is a plug-in for Nutch to search Creative Commons content. Currently, ccNutch indexes only text documents and does not support RDFa very well.

The inclusion of RDFa in ccNutch will be a great improvement for the advances of semantic web because we could easily index image, audio and video contained in web pages through their RDFa meta-data and then search them. In this way, we will be increasing our range of searchable artifacts available under creative licenses that is a worth to try.

My first step is to update ccNutch with the source code from Lucene repository. Then I'll start to write the Requirements Document and Architecture to define precisely what I'll do. To do so, I'm going to study the ccNutch and Nutch code base more deeply as well to study the RDFa standard. After that, I'll write the Project Plan document to define our schedule, milestones and make risk assessment.

Right now, let me introduce myself: My name is Alan Kelon, I'm 23 years old and I live in Recife, Brazil. I am a 1st year Ph.D. student in Computer Science (in Portuguese) at Informatics Center (in Portuguese), Federal University of Pernambuco (in Portuguese), a.k.a. CIn/UFPE. The university and my house are very close to Ricardo Brennand Institute :-) I also hold a M.Sc. degree in Computer Science from Federal University of Pernambuco (2005-2007) – entitled as "A Software Process Proposal to Open Source Software Factories" – and a B.Sc. in Computer Science from Federal University of Paraíba (2005). In 2006, I was a teaching assistant in a Software Engineer graduate level class. The course was entitled "Software Engineering: Building Open Source Software Factories". This year edition of the course will be starting at the end of this month and I'll be lecturer again. This year, I lectured in a undergraduate leval class entitled "Advanced Topics in Software Engineering: Open Source Software".

Since my undergrad studies I'm involved with free software. The first contact was to to build and maintain a Beowulf Linux cluster and to developed a high availability system from 10/2002 to 01/2005. In the past, I was also with Debian in my local community, played with AndroMDA in the very early stages of OpenERP, developed VENSSO CRM (in Portuguese), mentored/founded GVS and Telescope. This last one is my active research project as part of my Ph.D. Finally, I lead the research group on Open Source and Distributed Software Development at Informatics Center, Federal University of Pernambuco, and C.E.S.A.R – Recife Center for Advanced Studies and Systems –, with strong collaboration of the local software industry, where I have the opportunity to advocate the open source development model and philosophy.

All in all: "Talk is cheap, show me the code". Let's do it now ;-)