howto

Making a JSON api from a CSV file using fly

download, process, serve, update

tags
csv
flyio
ruby
sinatra

I wanted to be able to serve up and process a CSV file on fly, but have the actual processing of that data happen on the server without any sort of fancy database setup. Here's a way to do it using the persistent volumns.

Bookkeeping

Bookworm has ruby 3.1, so lets use that

1
  asdf local ruby 3.1.4

And lets create a quick sintara app:

1
2
    bundle init
    bundle add sqlite3 sinatra puma rerun rackup sinatra-activerecord

config.ru

1
2
3
  require File.expand_path('app', File.dirname(__FILE__))

  run App

app.rb:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
  require 'sinatra/base'
  require 'sqlite3'
  require "sinatra/activerecord"

  class App < Sinatra::Base
    register Sinatra::ActiveRecordExtension
    
    set :database, {adapter: "sqlite3", database: "stations.db"}
    
    get '/' do
      "Hello world"
    end
  end

And then you can run with

1
  rerun bundle exec rackup

And test

1
 curl http://localhost:9292
Hello world

Package the app

Deploy the app

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
  FROM debian:bookworm-slim

  RUN apt-get update
  RUN apt-get install -y ruby ruby-dev \
      build-essential curl sqlite-utils \
      python3-click-default-group

  WORKDIR /app

  RUN gem install bundler:2.3.26

  COPY Gemfile* ./
  RUN bundle install

  COPY * ./

  EXPOSE 8080
  CMD ["bundle", "exec", "rackup", "--host", "0.0.0.0", "--port", "8080"]

Test

1
  docker build . -t test && docker run -it --rm -p 8080:8080 test
1
 curl http://localhost:8080
Hello world

Looks good!

Deploy the app

1
  fly launch --no-deploy --name=chargermap

Inside of the created fly.toml, lets add a section for a persistent volume:

1
2
3
[mounts]
  source="myapp_data"
  destination="/data"

Then we can deploy with

1
  fly deploy

And then test:

1
curl https://chargermap.fly.dev
Hello world

Write the logic

Ok, so now that we have something served up, lets actually write the code.

loader.rb:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
  require 'csv'
  require 'sqlite3'

  class Loader
    def initialize
      @dir = ENV['DB_DIR'] || '.'
    end

    def db; "#{@dir}/db"; end
    def csv; "#{@dir}/csv"; end
    
    def db_exists?; File.exists? db; end
    def csv_exists?; File.exists? csv; end

    def ensure!
      if !db_exists?
        if !csv_exists?
          puts "Downloading csv"
          download_csv
        end
      end

      if !db_exists?
        create_db
      end
    end
    
    def download_csv
      puts "Downloading csv"
      system( "curl https://willschenk.com/alt_fuel_stations.csv -o #{csv}" )
    end

    def create_db
      puts "Creating database"

      system( "sqlite-utils insert #{db} data #{csv} --csv --detect-types" )
    end
  end

  if __FILE__ == $0
    puts "Hello there"

    l = Loader.new
    puts "DB Exists? #{l.db_exists?}"
    puts "CSV Exists? #{l.csv_exists?}"

    l.ensure!

    puts "DB Exists? #{l.db_exists?}"
    puts "CSV Exists? #{l.csv_exists?}"
  end
1
ruby loader.rb
Hello there
DB Exists? false
CSV Exists? false
DB Exists? false
CSV Exists? true

app.rb:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
  require 'sinatra/base'
  require 'sqlite3'
  require "sinatra/activerecord"
  require_relative './loader'

  class Data < ActiveRecord::Base
  end

  class App < Sinatra::Base
    register Sinatra::ActiveRecordExtension
    l = Loader.new
    
    set :database, {adapter: "sqlite3", database: l.db}
    
    get '/' do
      l = Loader.new
      content_type :json

      { db: l.db, csv: l.csv, csv_exists: l.csv_exists?, db_exists: l.db_exists? }.to_json
    end

    get '/stats' do
      content_type :json
      {
        count: Data.count,
        ct: Data.where( "State = ?", "CT" ).count,
        ny: Data.where( "State = ?", "NY" ).count
      }.to_json
    end

    post '/' do
      l = Loader.new

      l.ensure!

      redirect '/'
    end
  end
1
  curl http://localhost:9292 | jq .
{
  "db": "./db",
  "csv": "./csv",
  "csv_exists": true,
  "db_exists": true
}
1
  curl -X POST http://localhost:9292 | jq .
1
  curl http://localhost:9292 | jq .
{
  "csv_exists": true,
  "db_exists": true
}
1
  curl http://localhost:9292/stats | jq .
{
  "count": 73454,
  "ct": 822,
  "ny": 3793
}

Deploy new code

Inside of the fly.toml lets set the DB_DIR to point to our directory, and then deploy this sucker!

1
2
[env]
  DB_DIR="/data"

Then

1
  fly deploy
1
curl https://chargermap.fly.dev | jq .
{
  "db": "/data/db",
  "csv": "/data/csv",
  "csv_exists": false,
  "db_exists": false
}

Now do a post to set things up

1
  curl -X POST https://chargermap.fly.dev

And then

1
  curl https://chargermap.fly.dev/stats | jq .
{
  "count": 73454,
  "ct": 822,
  "ny": 3793
}

We can go to the console and stop the machine, and then it will automatically start itself up again when you hit it!

1
  curl https://chargermap.fly.dev/stats | jq .
{
  "count": 73454,
  "ct": 822,
  "ny": 3793
}

Next steps

When do you want to reload the file? Is it every couple of days? What further transformations do you want to have on the data?

All things to keep playing with.

Previously

labnotes

Deploying puppeteer on fly.io

tags
puppeteer
flyio

Next

labnotes

Address already in use

lsof

tags
network