Skip to content

Commit

Permalink
Merge pull request #87 from coursera/develop
Browse files Browse the repository at this point in the history
Dataduct 0.2.0
  • Loading branch information
sb2nov committed Mar 20, 2015
2 parents c846f73 + a9e65bb commit 5bc0675
Show file tree
Hide file tree
Showing 194 changed files with 9,363 additions and 1,170 deletions.
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,11 @@
# Python egg metadata, regenerated from source files by setuptools.
/*.egg-info
/*.egg

# Images created should be checked in manually
*.png

.coverage

# pycharm or intellij
.idea/
46 changes: 46 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
language: python
python:
- 2.7

sudo: false

addons:
apt_packages:
- graphviz
# command to install dependencies
install:
- pip install coveralls
- pip install -r requirements.txt

# Setup config file
before_script:
- mkdir ~/.dataduct
- |+
echo "
etl:
ROLE: DataPipelineDefaultRole
RESOURCE_ROLE: DataPipelineDefaultResourceRole
S3_ETL_BUCKET: FILL_ME_IN
ec2:
CORE_INSTANCE_TYPE: m1.large
emr:
CLUSTER_AMI: 2.4.7
redshift:
DATABASE_NAME: FILL_ME_IN
CLUSTER_ID: FILL_ME_IN
USERNAME: FILL_ME_IN
PASSWORD: FILL_ME_IN
mysql:
DATABASE_KEY:
HOST: FILL_ME_IN
USERNAME: FILL_ME_IN
PASSWORD: FILL_ME_IN" > ~/.dataduct/dataduct.cfg
# Run tests
script: nosetests --with-coverage --cover-package=. --cover-erase
after_success:
coveralls
32 changes: 31 additions & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,34 @@
# Changes in ETL_Lib
# Changes in dataduct

### 0.2.0
- Travis integration for continous builds
- QA steps and logging to S3
- Visualizing pipeline
- Dataduct CLI updated as a single entry point
- RDS connections for scripts
- Bootstrap step for pipelines
- Backfill or delay activation
- Output path and input path options
- Script directory for transform step
- SQL sanatization for DBA actions
- SQL parser for select and create table statements
- Logging across the library
- Support for custom steps
- Pipeline dependency step
- Reduce verbosity of imports
- Step parsing is isolated in steps
- More examples for steps
- Sync config with S3
- Config overides with modes
- Rename keywords and safe config failure handling
- EMR Streaming support with hadoop 2
- Exceptions cleanup
- Read the docs support
- Creating tables automatically for various steps
- History table support
- EC2 and EMR config control from YAML
- Slack integration
- Support for Regions in DP

### 0.1.0
- Initial version of the dataduct library released
Expand Down
2 changes: 0 additions & 2 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
include *.txt
include *.md
include *.rst
include *.sh
include *.py
recursive-include bin *
recursive-include scripts *
14 changes: 11 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
Dataduct
----------
Dataduct |build-status| |coverage-status|
-----------------------------------------
Dataduct is a wrapper built on top of AWS Datapipeline which makes it easy to
create ETL jobs. All jobs can be specified as a series of steps in a YAML file
and would automatically be translated into datapipeline with appropriate
pipeline objects.

**Documentation and Details**

Documentation and more details can be found at http://pythonhosted.org/dataduct/
Documentation and more details can be found at http://dataduct.readthedocs.org/en/latest/

**License**

Expand All @@ -24,3 +24,11 @@ distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

.. |build-status|
image:: https://travis-ci.org/coursera/dataduct.svg?branch=develop
:target: https://travis-ci.org/coursera/dataduct
.. |coverage-status|
image:: https://coveralls.io/repos/coursera/dataduct/badge.svg?branch=develop
:target: https://coveralls.io/r/coursera/dataduct?branch=develop
Loading

0 comments on commit 5bc0675

Please sign in to comment.