Learn, unlearn, and relearn - Hello NoSQL Databases!

Jun 30, 2015

I started working on Ask GitHub few months ago to search latest GitHub public timeline to answer questions and identify basic insights. Along the way I wanted to learn and experiment with new technologies.

The Technical Challenge & Solution

During a 24 hour period the GitHub public timeline on average includes 500K commits, 25K new repositories, 30K starred repositories, 100k contributors, adding up to more than 1GB of data. Storing and accessing this data is vital. What are my options?

After some initial trial with different databases my options were clear - MongoDB & Neo4j. Big data ready, cluster friendly, flexible schema, simplicity with storage, processing and managing large streams of non-transactional data and an active user community were the driving factors for implementing a NoSQL database. By taking advantage of the strengths of different NoSQL data storage solutions provide Ask GitHub searches thousands of documents, builds relations and offers visitors interesting perspectives.

Ask GitHub is driven by Compose.io (MongoDB hosting) & GrapheneDB (Neo4j hosting)

Shift in Mindset

Learning to code for a NoSQL databases requires a shift in mindshift - unlearn the traditional relational databases driven by several tables, keys and join and relearn about storing data as documents and writing aggregation operations to process the data for MongoDB, creating nodes and building relations between nodes for Neo4j. It took me a few days to unlearn and relearn.

MongoDB

Aggregate operation to find trending repositories based on stargazers. Code in Python.

  • match for type WatchEvent
  • Result is grouped by full_name and counted as stars
  • Result is sorted by stars in decending order
  • Result is then limited to 10

Do you see the simplicity and power of the aggregation operation?

Neo4j

Cypher to count repositories that have a relation with an organization. Code in Python.

Here the cypher matches nodes of type Repository that have a relation with nodes of type Organization and counts them.

Do you see the simplicity and power of cypher?

Tags

  • nosql
  • mongodb
  • neo4j
Harish Chakravarthy Harish Chakravarthy is an intrapreneur leveraging technology to make a positive difference. Interests include API integration, user experience, data visualization and analytics. Detailed bio.

Connect with Harish on Social Media Github Twitter LinkedIn