Learn, unlearn, and relearn - Hello NoSQL Databases!
I started working on Ask GitHub few months ago to search latest GitHub public timeline to answer questions and identify basic insights. Along the way I wanted to learn and experiment with new technologies.
The Technical Challenge & Solution
During a 24 hour period the GitHub public timeline on average includes 500K commits, 25K new repositories, 30K starred repositories, 100k contributors, adding up to more than 1GB of data. Storing and accessing this data is vital. What are my options?
After some initial trial with different databases my options were clear - MongoDB & Neo4j. Big data ready, cluster friendly, flexible schema, simplicity with storage, processing and managing large streams of non-transactional data and an active user community were the driving factors for implementing a NoSQL database. By taking advantage of the strengths of different NoSQL data storage solutions provide Ask GitHub searches thousands of documents, builds relations and offers visitors interesting perspectives.
Ask GitHub is driven by Compose.io (MongoDB hosting) & GrapheneDB (Neo4j hosting)
Shift in Mindset
Learning to code for a NoSQL databases requires a shift in mindshift - unlearn the traditional relational databases driven by several tables, keys and join and relearn about storing data as documents and writing aggregation operations to process the data for MongoDB, creating nodes and building relations between nodes for Neo4j. It took me a few days to unlearn and relearn.
MongoDB
Aggregate operation to find trending repositories based on stargazers. Code in Python.
match
fortype
WatchEvent
- Result is grouped by
full_name
and counted asstars
- Result is sorted by
stars
in decending order - Result is then limited to 10
Do you see the simplicity and power of the aggregation operation?
Neo4j
Cypher to count repositories that have a relation with an organization. Code in Python.
Here the cypher matches nodes of type Repository
that have a relation with nodes of type Organization
and counts them.
Do you see the simplicity and power of cypher?
Related Articles
Tags
- nosql
- mongodb
- neo4j