GSoC 2019: Project Ideas

This is the project idea list for the Google Summer of Code 2019 program. We have a mix of projects that are meant to be installed widely (such as plugins for other software) and projects that are more focused on improving user experience for users of Creative Commons licenses. Regardless of scope, these projects all have a broad and positive community impact.



Copyright status and public domain information awareness tool

  • Description:

    A tool that promotes awareness of the copyright status and public domain information (e.g. how long has it been in the public domain) of works using Wikidata properties associated with that work.

    This is an open ended project and there are a lot of tools that can be built to improve copyright status awareness. We'd like a proposal for a single tool to build. Here are a few ideas that we have:

    • a standalone tool or service that acts as an interface on top of Wikidata that displays works according to their copyright status and allows people to edit the information easily.
    • an improved process for bulk-updating Wikidata with information from other sources (e.g. Metropolitan Museum of Art, Cleveland Museum of Art open APIs)
    • an API where cultural heritage institutions could make bulk queries around copyright status.
    • a browser extension that parses and analyzes works from the current website and displays the status of those works.
  • Rationale:

    We'd like public domain resources to be reused and remixed and a good first step to that is for people to be aware of what content is free to use.

  • Resources:
  • Expected result:

    An open-source software project that makes the public domain date and copyright status information of works on Wikidata easier to update or use.

  • Skills recommended: Python, JavaScript
  • Mentors: Sophine Clachar (primary), Kriti Godey (backup)
  • Difficulty: Hard
  • Proposal tag to use: Data Visualization

No-click attribution for CC-licensed content

  • Description:

    Prototype a tool that removes all friction to correct attribution. This could play out in a number of ways, including having attribution and related information attach upon download of an image (0 click attribution) in CC search, an attribution filter/plugin service that bulk links attribution, or a credit that is automatically added by a platform or related service.

    Another way of no-click attribution could be an opt-out watermarking, and most importantly, metadata embedding. How can we add CC metadata to mp3 files, or exif-like content to photos? Is it possible to encourage advertisers to display a non-intrusive barcode or qr with the bare work ID? Can the ID become a visual mark for the commons interesting enough to be displayable (imagine a T-shirt with this unique ID showing for example CC-ID#1 and then looking it up online to find that the visual representation of the very same ID to be the licensed work).

  • Rationale:

    Addresses the following insights from our user research (see the Resources section for link to all insights):

    • People are motivated to give credit to other people, but they find attribution complicated and a hassle.
  • Resources:
  • Expected result:
    • A new feature or set of features added to CC Search to make no-click attribution possible. Some high-level ideas are in the project description but the implementation is completely up to you.
  • Skills recommended: JavaScript, Python
  • Mentors: Alden Page (primary), Breno Ferreira (backup)
  • Difficulty: Hard
  • Proposal tag to use: Search

Reward and delight users of CC licenses

  • Description:

    Prototype a small, fun idea that gives reward and delight to users, e.g. a graphic CC mascot overlaid to help users navigate the licensing process.

  • Rationale:

    Addresses all the insights from our user research.

  • Resources:
  • Expected result:

    There are a wide range of acceptable results for this idea. We're looking for an improvement to one of our existing tools or an entirely new tool that makes working with CC licenses make the user smile.

  • Skills recommended: JavaScript, Python or WordPress/PHP
  • Mentors: Kriti Godey (primary), Breno Ferreira (backup)
  • Difficulty: Hard
  • Proposal tag to use: Usability

Supercharge our search indexer

  • Description:

    CC Search is a system for searching hundreds of millions (eventually billions) of Creative Commons works. We store all of these documents inside of a PostgreSQL database. To enable rapid search performance on a dataset of this size, we mirror the documents to Elasticsearch weekly. It takes about 20 hours to index 276MM documents, but the speed could be greatly improved through parallelization across multiple nodes and multithreading. This project represents a great opportunity to learn about the challenges of distributed computing.

  • Rationale:

    Faster indexing allows us to deliver higher quality search results to our users in less time.

  • Resources:
  • Expected result:

    Ideally, distributing the indexing process across 5 nodes should cut the indexing time by 80% (or 4 hours compared to the current single-node, single-threaded implementation).

  • Skills recommended: Python, basic understanding of threads, basic understanding of databases, benchmark-driven mindset.
  • Mentors: Alden Page (primary), Timid Robot Zehta (backup)
  • Difficulty: Hard
  • Proposal tag to use: Search

Unique IDs for CC-licensed content

  • Description:

    Prototype a CC unique ID registry that links to the CC catalog and provides information about each CC work through the ID, e.g. CC/12345 would display information such as author, number of shares, etc.

  • Rationale:

    Addresses the following insights from our user research (see the Resources section for link to all insights):

    • People are motivated to give credit to other people, but they find attribution complicated and a hassle.
    • People like seeing how their work is used, where it goes, and who it touches, but have no easy way to find this out. This insight incorporated the following two insights:
      • People care that the work they share resonates with people, especially personally, but can only know this if they are told directly by the person it resonated with.
      • People want their work to have real world or social impact, but their sense about what these impacts are are vague. However, people can identify some real or potential outcomes from sharing their work that they enjoy.
    • People want to share and find good work, but find it difficult to navigate the abundance of content and information online.
  • Resources:
  • Expected result:

    A working prototype of the unique ID registry for CC-licensed content. It should be connected to our existing data indexed for CC Search and there should be a frontend to show data for a given work.

  • Skills recommended: Python, basic understanding of databases
  • Mentors: Alden Page (primary), Sophine Clachar (backup)
  • Difficulty: Hard
  • Proposal tag to use: Search

Your idea here

  • We are open to original ideas for projects that will help increase the utility of CC-licensed content, ease the process for creators applying CC licenses to their content, or improve CC's internal tools or processes. Please talk to us on the #cc-gsoc channel on Slack or via the mailing list to find a mentor for the project before submitting your proposal.
Back to top