Simple recommendation engine using Neo4j
Building a simple recommendation engine to recommend repositories based on contributors and organizations.
Public GitHub timeline from GitHub Archive is parsed hourly using [node.js streaming parser] (https://github.com/harishvc/githubanalytics/blob/master/bin/FetchParseGitHubArchive.js).
Currently event type
WatchEvent are captured.
PushEvent contains information about
WatchEvent contains information about popular repositories. All the data is first stored in MongoDB. Data stored in MongoDB is then
processed using [Neo4jSync.py] (https://github.com/harishvc/githubanalytics/blob/master/bin/Neo4jSync.py) to generate CSV files and imported into Neo4j (hosted on GrapheneDB).
Currently there are three types of nodes -
Repository node contains information about repository and when node was created.
Organization node contains information about the organization specific repository belongs to and when node was created.
People node contains information about contributors (email address of contributors) and when the node was created.
IN_ORGANIZATION relationship exists between
Respository node and
IS_ACTOR relationship exists between
People node. There can be more than one person contributing to a repository.
Nodes & Relationships model developed using YUML
Simple Recommendation Engine
Cypher to find similar based on contributors and organizations.
edx/edx-platform the cypher query finds all repositories that share the relation
IN_ORGANIZATION . The result is then sorted by number of connections in descending order.
amir-qayyum-arbisoft/edx-platform share 25 contributors. Isn’t it amazing!
Additional types of nodes can be created to improve the recommendations. The cypher query could be further optimized.