Connecting R to a PostgreSQL Database

We’ve an amazing collection of data stored in a PostgreSQL database. I use R for help visualizing and analyzing some of this data. In the past I’ve used ODBC for general database connectivity and RODBC from within R.

I’ve found setup of ODBC with PostgreSQL driver quick and easy in Ubuntu Linux. Something along the lines of this (untested) block of code:

sudo apt-get install odbc-postgresql
sudo sh -c 'echo "[PostgreSQL]
Description = PostgreSQL driver for Linux & Windows
Driver = /usr/lib/odbc/psqlodbcw.so
Setup = /usr/lib/odbc/libodbcpsqlS.so" > /etc/odbcinst.ini'

sudo sh -c 'echo "[ODBC Data Sources]
demodb = The Data Base

[demodb]
Driver = /usr/lib/odbc/psqlodbcw.so
Database = demo
Servername = localhost
Port = 5432
Protocol = 9.0.5
ReadOnly = Yes

[ODBC]
InstallDir = /usr/lib" >> /etc/odbc.ini'

Continue by installing the RODBC library within R and everything is good to go.  And then I got a Mac…

Problems with ODBC in OSX

I tried to use a similar ODBC setup to communicate with the PostgreSQL database on my MacBook, but I failed to succeed in doing this.

First attempt used the iODBC framework configured and the psqlodbc driver configured using advice on setting up linker flags from this email. Attempts to connect using RODBC segfaulted.

Second attempt tried using unixODBC installed via port and a reconfigured psqlodbc driver. Attempts to connect using RODBC segfaulted again.

I poked a bit outside of R to see if I could spot the problem. Testing outside of R seemed to show that ODBC was unable to connect to my server, but didn’t spend very much time getting a solid diagnosis. I wanted a solution that worked, rather than one that required non-trivial compiling, configuring, and testing. Having burnt my allocated time on documentation and forums I decided to try solutions outside of ODBC.

Solution with RPostgreSQL with DBI

A bit of forum searching brought me upon this email describing how to connect to a remote PostgreSQL database from R on mac osx. I was happy to see that the external dependancies I’m used to with ODBC did not exist. Installed the DBI and RPostgreSQL R libraries, then fired up R.

$ R
> dbname <- "demo"
> dbuser <- "demo_uer"
> dbpass <- "demo_password"
> dbhost <- "localhost"
> dbport <- 5432
> drv <- dbDriver("PostgreSQL")
> con <- dbConnect(drv, host=dbhost, port=dbport, dbname=dbname,
user=dbuser, password=dbpass)
> dbListTables(con)

All good! I guess that moving forward I’ll transition myself over to using RPostgreSQL with DBI within R.

Django Setup from Shell

Django uses a number of settings typically defined in the settings module within a project. These can be loaded into the interactive python shell for extra interactive fun.

from django.core.management import setup_environ
from myapp import settings
setup_environ(settings)

,

Migrated from my local machine to a remote server a data analysis that plots data using Python’s matplotlib and saves the plots to PNGs. Running the analysis turned up an unexpected error that ends by noting:

no display name and no $DISPLAY environment variable

Bummer. A quick search turned up more documentation on this from both the official matplotlib documentation on using matplotlib in a webapp and generating PNGs in matplotlib when DISPLAY is not defined.

The two typical solutions are to explicitly set a backend in the Python code (this must be done before any other matplotlib imports):

import matplotlib
# Force matplotlib to not use any Xwindows backend.
matplotlib.use('Agg')

or set the backend using the matplotlibrc file.

I’m a command line junkie who only turns to higher level scripting when
a line of piped together commands falls short. A common task that I run
into is summing a column of data from a text file. Typically, my data
is in a gnuplot friendly format makes extracting the column easy:

grep -v "^#" bandwidth.dat | cut -f3 -d' '

Now what is an easy way to sum that data? I’ve seen solutions using
awk, but I’m not much of an awk fan. Much more exciting is a version
using paste:

grep -v "^#" bandwidth.dat | cut -f3 -d' ' | paste -sd + | bc