DiscoverEd Code Sprint: Day 3

by akozak on 2010-06-18 AgShare code DiscoverEd nutch

Day 1 and Day 2 of the DiscoverEd code sprint) turned out to be very productive, and the third and final day didn't disappoint either. All teams were able to complete or make significant contributions to useful new features.

Our team that originally developed support for arbitrary metadata and a plugin architecture to pull in external data in Day 1 and 2 switched to working on integrating branches with the main DiscoverEd source tree. They spent some time fixing bugs in new code that connects the RDFa parser with the Jena store, so that it could be merged with other branches in the code repository, including their work on the metadata plugins, the master branch, and the provenance work previously begun by Creative Commons. After merging those branches, they also took some time to update their work from Day 1 and Day 2 to store the provenance of the metadata. Towards the end of the day they began working on support for running all the DiscoverEd tests with a script. After today they plan on doing some housekeeping work, namely creating documentation about new plugin support in the CC wiki.

The "flexible query interface" team continued to debug problems with DiscoverEd. They had identified the bugs encountered in Days 1 and 2, so they worked on transitioning away from an old metadata-writing API that they suspect doesn't work with the current version of Nutch. It appears this issue may be related to the upgrade from Nutch 0.8 to 1.1. Going forward, they plan on migrating code to the new API, and testing if it resolves the problems.

The "user generated metadata" team decided to divide and conquer smaller tasks of their project. They mocked up the form interface and began work on building it into a JSP. They were also building tests for the back end of DiscoverEd, which basically test that once you add a tag, you can get the resource ID and the tag back through a search, and that it's been added into the Jena store such that when the Nutch index is run again the tag will be added to the Lucene index. They worked through a lot of merge conflicts, which had stalled development.

Creative Commons thanks AgShare project funders (The Gates Foundation), MSU, vuDAT, MSU Global, and to the participants#Attendees) in the sprint for making all of the contributions to DiscoverEd over the past three days possible.

What makes DiscoverEd exciting is that, while most search engines use algorithmic analyses of resources alone for search, DiscoverEd can incorporate facts and semantic information provided by the publishers or curators, enabling more useful search. Structured data in standardized formats such as RDFa is a powerful way for otherwise unrelated projects and resource curators to cooperate and express facts about their resources in the same way so that third-party tools (like DiscoverEd) can use that data for other purposes (like search and discovery). We look forward to deploying this innovative tool for our AgShare partners to enable search and discovery of educational resources about agriculture and hope that it's found useful in other contexts as well.

Open source

DiscoverEd Code Sprint: Day 3

Tags