From 0afa5d2cbaf1f4e009d81fe9c117f28178f3cbba Mon Sep 17 00:00:00 2001 From: Maxime Beauchemin Date: Fri, 8 Apr 2016 08:44:28 -0700 Subject: [PATCH] Added FAQ and db dependencies to docs --- docs/faq.rst | 34 ++++++++++++++++++++++++++++++++++ docs/index.rst | 1 + docs/installation.rst | 36 +++++++++++++++++++++++++++++++++++- 3 files changed, 70 insertions(+), 1 deletion(-) create mode 100644 docs/faq.rst diff --git a/docs/faq.rst b/docs/faq.rst new file mode 100644 index 0000000000000..efecfcee7248a --- /dev/null +++ b/docs/faq.rst @@ -0,0 +1,34 @@ +FAQ +=== + + +Can I query/join multiple tables at one time? +--------------------------------------------- +Not directly no. A Caravel SQLAlchemy datasource can only be a single table +or a view. + +When working with tables, the solution would be to materialize +a table that contains all the fields needed for your analysis, most likely +through some scheduled batch process. + +A view is a simple logical layer that abstract an arbitrary SQL queries as +a virtual table. This can allow you to join and union multiple tables, and +to apply some transformation using arbitrary SQL expressions. The limitation +there is your database performance as Caravel effectively will run a query +on top of your query (view). A good practice may be to limit yourself to +joining your main large table to one or many small tables only, and avoid +using ``GROUP BY`` where possible as Caravel will do its own ``GROUP BY`` and +doing the work twice might slow down performance. + +Whether you use a table or a view, the important factor is whether your +database is fast enough to serve it in an interactive fashion to provide +a good user experience in Caravel. + + +How BIG can my data source be? +------------------------------ + +It can be gigantic! As mentioned above, the main criteria is whether your +database can execute queries and return results in a time frame that is +acceptable to your users. Many distributed databases out there can execute +queries that scan through terabytes in an interactive fashion. diff --git a/docs/index.rst b/docs/index.rst index 07c19448eb0e3..29478e2b39565 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -34,6 +34,7 @@ Contents tutorial videos gallery + faq Indices and tables diff --git a/docs/installation.rst b/docs/installation.rst index c8224e0e1f974..fcfc63c53356d 100644 --- a/docs/installation.rst +++ b/docs/installation.rst @@ -121,6 +121,40 @@ the `Flask App Builder Documentation `_ for more information on how to configure Caravel. +Database dependencies +--------------------- + +Caravel does not ship bundled with connectivity to databases, except +for Sqlite, which is part of the Python standard library. +You'll need to install the required packages for the database you +want to use as your metadata database as well as the packages needed to +connect to the databases you want to access through Caravel. + +Here's a list of some of the recommended packages. + ++---------------+-------------------------------------+-------------------------------------------------+ +| database | pypi package | SQLAlchemy URI prefix | ++===============+=====================================+=================================================+ +| MySQL | ``pip install mysqlclient`` | ``mysql://`` | ++---------------+-------------------------------------+-------------------------------------------------+ +| Postgres | ``pip install psycopg2`` | ``postgresql+psycopg2://`` | ++---------------+-------------------------------------+-------------------------------------------------+ +| Presto | ``pip install pyhive`` | ``presto://`` | ++---------------+-------------------------------------+-------------------------------------------------+ +| Oracle | ``pip install cx_Oracle`` | ``oracle://`` | ++---------------+-------------------------------------+-------------------------------------------------+ +| sqlite | | ``sqlite://`` | ++---------------+-------------------------------------+-------------------------------------------------+ +| Redshift | ``pip install sqlalchemy-redshift`` | ``redshift+psycopg2://`` | ++---------------+-------------------------------------+-------------------------------------------------+ +| MSSQL | ``pip install pymssql`` | ``mssql://`` | ++---------------+-------------------------------------+-------------------------------------------------+ + +Note that many other database are supported, the main criteria being the +existence of a functional SqlAlchemy dialect and Python driver. Googling +the keyword ``sqlalchemy`` in addition of a keyword that describes the +database you want to connect to should get you to the right place. + Caching ------- @@ -147,7 +181,7 @@ parameters exposed by SQLAlchemy. In the ``Database`` edit view, you will find an ``extra`` field as a ``JSON`` blob. .. image:: _static/img/tutorial/add_db.png - :scale: 50 % + :scale: 30 % This JSON string contains extra configuration elements. The ``engine_params`` object gets unpacked into the