Launching Ask GitHub!
Ask GitHub searches latest GitHub public timeline to answer questions and provide insights. Ask GitHub is hosted on Heroku and the technology stack includes languages Python & Node.js , NoSQL databases MongoDB & Neo4j , Flask web server, Bootstrap front-end framework, Typeahead integration and scalable-vector-icons from Font Awesome. Visit Ask GitHub.
Background
I developed an inital prototype for Third Annual GitHub Data Challenge. Initial prototype got me thinking about the awesome potential and amazing data points GitHub public timeline provides. Further iterating on my initial prototype I developed Ask GitHub to search GitHub public timeline in the past 24 hours to answer questions and provide interesting insights. After working on this project for the past few months (mostly during late evenings and weekends) I am pleased to officially announce the launch of Ask GitHub! Code driving Ask GitHub is available on GitHub - let’s collaborate!
Data Gathering
Public GitHub timeline from GitHub Archive is parsed hourly using node.js streaming parser.
Currently event type PushEvent
, CreateEvent
& WatchEvent
are captured.
PushEvent
contains information about commits
and authors
. CreateEvent
contains new repositories.
WatchEvent
contains information about popular repositories.
Output log of data gathered hourly
Data Storage
Data is stored in MongoDB hosted on Compose. Text index is set on field full_name
and search results are sorted by dynamically generated document score.
Documents older than 24 hours are deleted.
Search aggregation pipeline using text index score
Web Framework
Flask is a lightweight web application framework written in Python serving Ask GitHub using reponsive design tempates built using Bootstrap. Bootstrap offers a highly customizable grid sytem that works across different devices, numerous built-in classes for styling and extensive list of components.
Bootstrap badges & accordion
User Experience
Twitter’s Typeahead is integrated into the search box to provide list of pre-defined questions and matching repositories are dynamically generated to guide users. Scalable-vector-icons from Font Awesome are used in addition to text to share interesting data points. FuzzyWuzzy string comparision library is used to provide suggestions and did you mean queries.
Example 1: Questions?
Example 2: Top new repositories
Example 3: User commit frequency
Example 4: Suggestion
Roadmap
Automate recommendation engine using Neo4j and continue to keep interating.
askgithub-commit-frequency.png
Tags
- github
- analytics
- bootstrap
- typeahead
- mongodb
- Neo4j