Civis Platform comes with a built-in integration with Jupyter, the most popular notebook tool. However, Jupyter was not built as a cloud-aware application. This document contains best practices for dealing with common issues that result from utilizing Jupyter in a cloud environment, gathered from thousands of users who explore and work with their data in the Civis Platform every day.
Best Practice #1: Keep the notebook open while you are working. If you need to browse other parts of Civis Platform, do so in a separate tab.
- Reason: Notebook cells are stored in your browser. If you close the notebook tab, or reload the page without saving, any unsaved cells will be lost.
Best Practice #2: When you want to shut down the notebook, hit the Save button, then wait for the "Checkpoint created" message in the upper right. It is now safe to shut down the server.
- Reason: Saving a notebook takes longer than it does locally since the notebook is stored in the cloud. When you see the "Checkpoint created" message, the save operation is complete. Note that the "X" button in the top right does not free up resources. It only closes the notebook page.
Best Practice #3: Try to use notebooks only when you have a stable Internet connection. If you have an unstable connection, switch networks, or need to go offline, save every computation to a variable. When you reconnect, you can evaluate the variable to get your result.
- Reason: Computations still happen even if you become disconnected from the notebook, but the output of a cell will only be saved if the browser is online and connected to the server.
Best Practice #4: If you have a long-running computation, you might need to increase the "Idle Timeout" setting. The time unit is in hours and the default is 3 hours.
- Reason: We find that it is easy for people to forget that they have a notebook running. Therefore, we shut down idle notebooks after 3 hours. You can increase this setting before starting a notebook or while it is running. The maximum idle timeout is 24 hours.
Best Practice #5: Save dataframes to Civis files or tables, using the Civis client loaded in every notebook, to help with resuming work.
- Reason: The servers that notebooks run on are ephemeral, so you can't save things to disk between notebook runs. However, you can use civis.io.dataframe_to_file and file_to_dataframe in the Python client to move data between your notebook and Platform. The R client has similar functionality through the write_civis_file and read_civis methods.
Best Practice #6: Queries made via the Civis API are convenient, but are not recommended for real-time query performance. The Query UI in Platform is much faster than making those same queries via the API. If you need real-time performance, the fastest possible method is directly connecting to your database.
- Solution: Connect directly to the database from the notebook. Add your database credential to the notebook under Settings -> Credentials and then use it to connect to the database using the information in this article.
Comments
0 comments
Article is closed for comments.