Download PubChem

Download all of PubChem... in case you need half a terabyte of SDfiles.

Published Feb 16, 2016 in Programming

PubChem offers an FTP site. This script will download all of the gzipped SDfiles to a local directory. There are about 58GBs of zipped data and 535GBs of unzipped data.

require 'net/ftp''') do |ftp|
  ftp.passive = true
  files = ftp.list('*')
  total = 0
  sdf_files = { |f| f.match(/\.sdf\.gz$/) }
  sdf_files.each_with_index do |file, index|
    tokens = file.split(/\s+/)
    size = tokens[4].to_i
    total += size
    filename = tokens.last
    puts "Getting [#{filename}] [#{size}] [#{index} of #{sdf_files.count}]"
    ftp.getbinaryfile(filename, filename, 1024)
  puts "#{total} :: #{total.to_f/(1024*1024*1024).to_f}"
end; nil

You can then unzip these using:

gunzip *.gz

(expect it to take a while)

Convert Your Dash Snippets to Quiver

Convert Dash's sqlite database to the directory structure expected by Quiver.

Published Feb 12, 2016 in Programming