Using seed to explore APIs
overview of what we’re working on and how to explore apis
- tags
- happy_seed
- rails
- github
- rake
Contents
I’ve been working to update seed, which is HappyFunCorp’s app generator to make it easy to kick off MVPs. Check out the website for more information. One of the things that I’ve started to do is to seperate out the dependancies more, and being tutorials on how to use each of the different features. After I link to that stuff, let’s walk through a way to combine different techniques we’ve discussed together.
- Check out the site
- Or watch the video:
The basic process
- Get access to the API.
- Open up a console and start playing with commands.
- Start storing the data locally once we need to get more.
- Play around with the data.
- Design a UI around in.
- Refactor the test “scripts” into ruby classes that fit that UI.
Using Seed, OAuth, and Rake to explore github data
What I want to do is to look at all of the Gemfile
s for all of our projects, and see which gems are the most popular, which versions we are using, and if we can develop some more expertise around it. But first, I want to get the data.
I could go to each of the repos in github, but there are well over a hundred so that doesn’t work. Github has an API, which is great, but I need oauth2 to access it. Let’s get going.
Creating a github application
The first step is to make sure you have a github application.
- Go to developer applications
- Click Register New Application
- Enter in
http://localhost:3000/users/auth/github/callback
as the callback URL. - Fill in whatever else.
- Resister the Application
At this point you should have the Client ID and the Client Secret.
Create a seed_defaults file
Once I create a remote app, I like to put the credentials into the ~/.seed_defaults
file. This will make it so the next time I generate a seed app, these are the default credentials that are used.
|
|
Generate the app
If you haven’t installed happy_seed yet, do so now by typing gem install happy_seed
.
Then generate the app:
|
|
Now it’s going to install things. Inititally, just say no to everything. Once it’s done, install the github generator:
|
|
That will do its thing. Then we need to create the database and get going:
|
|
Ask for more scope
When the github generator is run, it configures the oauth scope that it requests in config/initializers/devise.rb
. We need to ask for a bit more permissions, so open up that file and change the scope requested to be "user,repo,read:org"
, so:
config.omniauth :github, ENV['GITHUB_APP_ID'], ENV['GITHUB_APP_SECRET'], scope: "user,repo,read:org"
Now lets start the server and see what’s up:
|
|
- Point your browser to http://localhost:3000
- Select “Sign up” from the “Account” menu.
- Press “Sign in with Github”
- You should be bounced to github and you have to accept.
- You should be back on localhost with the message
Successfully authenticated from Github account.
Lets play
Get access to the API.- Open up a console and start playing with commands.
- Start storing the data locally once we need to get more.
- Play around with the data.
- Design a UI around in.
- Refactor the test “scripts” into ruby classes that fit that UI.
Lets get in that console and see what we can do. If you look at app/models/user.rb
you can see that there’s code to setup the github client, and we can access it like so:
|
|
Ok, now we can start figuring out what we need to do to get access to the data. We have an authenticated user account, and we can start hitting the API. I know for a fact that I have way more than 30 repos – I mean, seriously – so first thing is to figure out why that is and how to get more. It’s probably related to pagination.
|
|
OK, looking through the octokit issues this can be dealt with by turning auto_paginate: true
on when we load up the client. So let’s edit app/models/user.rb
to do that:
|
|
Back to the console, do reload!
and load up our client again. Notice that I’m making this a one liner, since we’re going to be doing it over and over its nice to use the arrow keys.
|
|
OK, that looks better. That will give us a list of all the repos, so now we just need to see how to get the contents of our file, and then we can put it all together.
|
|
Looking at this, we see that github returns the contents of the file base64 encoded. I guess that makes sense, so if we want to print it out:
|
|
Using Rake to pull the data down
Get access to the API.Open up a console and start playing with commands.- Start storing the data locally once we need to get more.
- Play around with the data.
- Design a UI around in.
- Refactor the test “scripts” into ruby classes that fit that UI.
Lets use rake to start managing the data. We’re going to be using some of the techniques that were outlined in the using rake for dataflow programming and data science post. First step is to create a lib/tasks/github.rake
file that we’re going to put our tasks.
|
|
Be sure to change the HappyFunCorp
to your organization, or use the repos
call instead of the organization one.
Now lets run rake data/projects.json
. If you run it a second time, notice that rake returns imediately and doesn’t hit the remote server.
- The
file
task only runs if the file doesn’t exist. Rake::Task["environment"].invoke
is a way to ensure that a task as been run without forcing it to run.- The API calls are from our console experiments.
- Just save it to a file.
OK, now lets be able to loop over everything to load the files that we want. First we define a method that lets us define a task to loop over all the entries in a JSON array, and then we’ll call it with our block which loads up the contents. (Add this to the end of the github.rake
file)
|
|
And run it, rake load_gemfiles
. Depending upon how many repos you have, this could take a few seconds. (Also make sure you’ve updated the organization!)
- Define a file task for each output file, that we will
invoke
at the very end. - Inside the task, make sure that the
environment
is loaded. - Pull down the contents of the Gemfile.lock from the API.
If you run this a second time, notice that it only attempts to load from the files that weren’t loaded before.
For fun, delete the data
directly and run the rake task again. BOOM!
Massaging the data into something usable
Get access to the API.Open up a console and start playing with commands.Start storing the data locally once we need to get more.- Play around with the data.
- Design a UI around in.
- Refactor the test “scripts” into ruby classes that fit that UI.
OK, now that we have all the data, lets figure out how to slice and dice it. Lets just wire together some standard UNIX tools to filter and get some info.
|
|
Running rake filter_gemfiles
will go through and only show the specific gems that were locked out the Gemfile.locks. Obviously, filtering the file based on the fact that it has exactly 4 spaces isn’t robust, but it works.
Lets add a couple of other nifty methods:
|
|
I’m going to stop here, but in case you are wondering the top gems that we use are:
- (82) json
- (81) tzinfo
- (81) i18n
- (81) activesupport
- (79) rack
- (79) multi_json
- (78) sass
- (77) rack-test
- (76) tilt
- (76) mime-types
Repeatable data in 10 minutes
There’s lots of stuff you can do from there, the most likely one being “sending an email and forgetting about it.” But lets look at what we have.
- The access key isn’t hard coded anywhere. When you come back to this, if it expires, you just reconnect on the website.
- Way easier to get access keys this way, only a few oauth providers make this simple. (Twitter does, for example, github doesn’t.)
- There’s a direct process transitioning from ‘playing around’ to automated.
- Loading the data from the remote API is automated and repeatable. If you’ve setup the dependancies correctly, you can run the rake tasks and things magically get up to date.
- If you do want to build a UI around this, you already have a webapp up and running…
Importantly, this is something that you can get up and going with in under 10 minutes, at least if you know how the API works. It takes less that 1 minute to get to the point where you have an authenticated client to the remote service and you can spend time exploring.
One of the reasons I like having seed around to help prototype and explore ideas!
Source code can be found: https://github.com/wschenk/project_stats_demo
Previously
Next