Launching Ask GitHub!

Feb 15, 2015

Ask GitHub

Ask GitHub searches latest GitHub public timeline to answer questions and provide insights. Ask GitHub is hosted on Heroku and the technology stack includes languages Python & Node.js , NoSQL databases MongoDB & Neo4j , Flask web server, Bootstrap front-end framework, Typeahead integration and scalable-vector-icons from Font Awesome. Visit Ask GitHub.

Background

I developed an inital prototype for Third Annual GitHub Data Challenge. Initial prototype got me thinking about the awesome potential and amazing data points GitHub public timeline provides. Further iterating on my initial prototype I developed Ask GitHub to search GitHub public timeline in the past 24 hours to answer questions and provide interesting insights. After working on this project for the past few months (mostly during late evenings and weekends) I am pleased to officially announce the launch of Ask GitHub! Code driving Ask GitHub is available on GitHub - let’s collaborate!

Data Gathering

Public GitHub timeline from GitHub Archive is parsed hourly using node.js streaming parser. Currently event type PushEvent, CreateEvent & WatchEvent are captured. PushEvent contains information about commits and authors. CreateEvent contains new repositories. WatchEvent contains information about popular repositories.

Output log of data gathered hourly

Data Storage

Data is stored in MongoDB hosted on Compose. Text index is set on field full_name and search results are sorted by dynamically generated document score. Documents older than 24 hours are deleted.

Search aggregation pipeline using text index score

Web Framework

Flask is a lightweight web application framework written in Python serving Ask GitHub using reponsive design tempates built using Bootstrap. Bootstrap offers a highly customizable grid sytem that works across different devices, numerous built-in classes for styling and extensive list of components.

Bootstrap badges & accordion

repository information

User Experience

Twitter’s Typeahead is integrated into the search box to provide list of pre-defined questions and matching repositories are dynamically generated to guide users. Scalable-vector-icons from Font Awesome are used in addition to text to share interesting data points. FuzzyWuzzy string comparision library is used to provide suggestions and did you mean queries.

Example 1: Questions?

pre-defined questions

Example 2: Top new repositories

pre-defined questions

Example 3: User commit frequency

pre-defined questions

Example 4: Suggestion

pre-defined questions

Roadmap

Automate recommendation engine using Neo4j and continue to keep interating.

askgithub-commit-frequency.png

Tags

  • github
  • analytics
  • bootstrap
  • typeahead
  • mongodb
  • Neo4j
Harish Chakravarthy Harish Chakravarthy is an intrapreneur leveraging technology to make a positive difference. Interests include API integration, user experience, data visualization and analytics. Detailed bio.

Connect with Harish on Social Media Github Twitter LinkedIn