The following scripts import one or more files from an FTP/SFTP server to Civis Platform. Each file imported is saved as a job output for further processing, if necessary.
Adding a S/FTP Credential
To add your S/FTP credential to Platform go to New Credential and fill out the fields and click "Save"
- Name: what you would like this credential to be named
- Credential Type: Custom
- Username: S/FTP Username
- Password: S/FTP Password
Creating the jobs
There are two scripts required to import multiple files from S/FTP onto your database. Once you run the "Multi from S/FTP to Civis" script and your files have been loaded onto the files endpoint, you will need to run the "Multi from Civis" script to import the files from the files endpoint onto your database
I. Import files from S/FTP to files endpoint
- On the top navigation menu, click Code, and then "More Script Templates...". Select "Multi from S/FTP to Civis".
- Fill our the following required parameters. You may also include additional info in the optional and advanced fields as necessary:
Required:
- Protocol - specify the transfer protocol to use. The options are 'ftp', 'ftp_tls', or 'sftp'.
- Host - enter the url of the s/ftp location, e.g. ‘ftp.examplesite.com’
- Credential - select the credential containing the appropriate username and password for the host.
- Credential password is a private key? - if the password to your ftp credential is a private key, check this box. Ultimately, this option depends on your server implementation and is only used for ftp_tls and sftp. In most cases it does not need to be checked.
- Port - specify the port on which to establish the connection. If left blank the default is 21 for ftp and 22 for sftp.
- Start path - directory within the host to search for files, e.g. '/Download/'.
- Filename match - transfer only files matching this text, e.g. Type 'users' to transfer all files with the word 'users' in the filename. Supports regular expressions.
Optional:
- Secondary filename match - comma separated list of text. If set, only files matching one of these text values will be transferred. Useful for zip files with multiple file members. Supports regular expressions.
- Incremental sync - check the box to import only matched files newer than the newest file previously imported by this job.
- Start utc timestamp - yyyy-mm-dd hh:mi:ss am/pm timestamp for the earliest files to transfer, e.g. '2016-03-14 12:00:00 am'. Must be in utc, but note that your s/ftp directory explorer program may show you timestamps in local time.
- End utc timestamp- yyyy-mm-dd hh:mi:ss am/pm timestamp for the latest files to transfer, e.g. '2016-03-14 12:00:00 pm'. Must be in utc, but note that your s/ftp directory explorer program may show you timestamps in local time.
- Extra auth password- custom credential, for sftp only. 'Password' should be a password to decrypt the key in your ftp credential, if that key is encrypted. 'Username' is not needed. This credential is not needed in most cases.
Advanced
- Recursive search - search all directories under the start path.
- Max chunk size (b) - if left blank, uses connection default. The default works for most connections.
- Required cpu (mcpu) - set the desired cpu to allocate to this script. The default works for most connections.
- Required memory (mib) - set the desired memory to allocate to this script. The default works for most connections.
- Required disk space (gb) - set the desired disk space to allocate to this script. The default works for most connections.
- Debug - enable debug level logging.
II. Import files from files endpoint to Redshift
- Once you run the "Multi from S/FTP to Civis" script, please proceed to Code, and then "More Script Templates..." and select "Multi from Civis"
- Fill our the following required parameters. You may also include additional info in the optional and advanced fields as necessary:
Required:
- Column delimiter: If these are CSVs, choose "comma".
- First row is header: Check this box if the first row of your CSVs is a comma.
- If table(s) exist: Choose how you want Platform to handle imports where those tables already exist. If the CSVs are refreshed on a regular basis but always keep historical data, we recommend using the "drop" option, which will drop and re-create the tables each time. Other options include:
- Fail: The import will fail if the table exists so that you don't overwrite the existing table
- Append: Add any new data to the existing table.
- Wipe: Replace the data but keep the columns and data type.
- Max errors: Values between 0- 100000.
- Destination database: Your database name.
- Destination credential: Your database username (will show up in the dropdown).
- Filename match: The names of the files you're pulling down from the files endpoint.
- Destination schema and table name: The schema.table where these files should be stored. *The order of the tables in this field MUST match the order of the files in the above field.
- Civis Job ID: ID of the first job you made which pulled files from the S/FTP job (can be found at the end of the job's URL).
You can change the rest of the fields depending on the file you're trying to import, however we recommend leaving the remaining fields blank (unless you already know of changes you'd like to make), and updating them after you've imported tables for the first time.
Comments
0 comments
Please sign in to leave a comment.