Introduction
There are various ways to interact with databases in Platform, including the Civis API, the API Clients, the SQL Scripts interface, and the interactive Query interface. These methods all use Platform to execute SQL statements on your Redshift or PostgreSQL databases. However, you can also interact more directly with these databases using Python tools such as SQLAlchemy or Psycopg2. This article provides examples of using those tools.
Networking Restrictions
Note that direct database connections will only work within Platform (i.e., via Scripts or Notebooks) due to security restrictions. If you want to interact with a Platform database locally (e.g., from your laptop), you will need to use the API clients or Platform tools mentioned above.
Finding the Remote Host name
The first step in setting up a direct connection is finding out the database host name. It will look something like “your-org-name.civis.orgs.civis.io”. You can issue the command civis.APIClient().remote_hosts.list() with the Python client or visit https://api.civisanalytics.com/remote_hosts in your browser when logged in to Platform to see a list of database hosts. You’ll just need the hostname part of the “url” attribute (e.g., “jdbc:postgresql//your-org-name.civis.orgs.civis.io:5432/dev”), after the protocol (“jdbc:postgresql//”) and before the port (“:5432”).
Creating a Connection
Next, you’ll need to get your database credentials into the Notebook or Script environment and use them with the host name from above to create a connection. For Notebooks or Services, you’ll need to select your database credential under Settings. Your default database credential usually has the same name as your username. For Scripts, you’ll need to add a database credential parameter to the Script object and specify the appropriate credential as the argument, as described here.
database_cred_name = 'DATABASE_CRED_OR_PARAM_NAME' # change this
db_user = os.environ[f'{database_cred_name}_USERNAME']
# don't print or save db_pass in plaintext!
db_pass = os.environ[f'{database_cred_name}_PASSWORD']
host_name = 'your-database-name.your-org-name.orgs.civis.io' # change this
db_port = 5432 # the standard port for PostgreSQL and Redshift
Psycopg2
The following will provide a connection via the psycopg2 library and issue a test query:
conn = psycopg2.connect(
dbname='dev', user=db_user, port=db_port,
host=host_name, password=db_pass, sslmode='require'
)
cur = conn.cursor()
cur.execute("""SELECT 1234""")
print(cur.fetchall())
For further information about using psycopg2, see this page.
SQLAlchemy
You can also connect to a database via SQLAlchemy, which enables you to interact with a database using an ORM approach or via constructing complex queries as Python objects. You’ll need to create a SQLAlchemy engine that knows how to create a database connection. The process is similar to Psycopg2:
def connect():
return psycopg2.connect(
dbname='dev', user=db_user, port=db_port,
host=host_name, password=db_pass, sslmode='require'
)
engine = sqlalchemy.create_engine('redshift+psycopg2://', creator=connect)
statement = sqlalchemy.select([sqlalchemy.literal(1234)])
conn = engine.connect()
print(conn.execute(statement).fetchall())
The SQLAlchemy and sqlalchemy-redshift packages are not available by default for Notebooks in Platform. To install them as custom packages, see this page. Or, as a quick workaround, you can execute a notebook cell with “!pip install sqlalchemy sqlalchemy-redshift”.
For further information about how to use SQLAlchemy, see this page.
Note that pandas also has functionality that integrates with SQLAlchemy.
Comments
0 comments
Please sign in to leave a comment.