Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added remove all jobs button #34

Open
wants to merge 41 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
70eb790
added remove all button
kalombos Sep 4, 2017
2065b2f
show stats for each job
kalombos Oct 3, 2017
6a23062
Merge branch 'stats' into develop
kalombos Oct 3, 2017
ec69897
some cosmetic fixes
kalombos Oct 3, 2017
b5aa1a6
Merge branch 'fixes' into develop
kalombos Oct 3, 2017
4027768
remove find_project_by_id method, some pep-8 fixes
kalombos Oct 3, 2017
6d14f56
Merge branch 'fixes' into develop
kalombos Oct 3, 2017
442f6e1
fixes for postgresql
kalombos Oct 12, 2017
5841e4c
fork spiderkeeper
kalombos Oct 16, 2017
3eb5050
Bump version: 0.0.1 → 0.0.2
kalombos Oct 16, 2017
5e4e7e7
fixes sql for postgresql, remove project creating, added project sync…
kalombos Oct 18, 2017
58c3214
fix avg runtime, added cascade deletion
kalombos Oct 18, 2017
c10d729
Bump version: 0.0.2 → 0.1.0
kalombos Oct 18, 2017
79cdc80
remove logger, added foreign keys
kalombos Oct 20, 2017
771632d
Bump version: 0.1.0 → 0.1.1
kalombos Oct 20, 2017
9635988
separate scheduler into own process, remove create project view, fixes
kalombos Oct 23, 2017
de986d6
Bump version: 0.1.1 → 0.1.2
kalombos Oct 23, 2017
f7a2ebd
remove log styles, refactoring
kalombos Oct 23, 2017
f9de501
Bump version: 0.1.2 → 0.1.3
kalombos Oct 23, 2017
942c3c2
revert logs grabbing with requests
kalombos Oct 24, 2017
38a4fa7
Bump version: 0.1.3 → 0.1.4
kalombos Oct 24, 2017
a7082c9
update README, fix spiderkeeper script
kalombos Oct 26, 2017
163b6a9
Bump version: 0.1.4 → 0.2.0
kalombos Oct 26, 2017
193178b
remove lost pending jobs
kalombos Mar 28, 2018
d6e14bd
Bump version: 0.2.0 → 0.2.1
kalombos Mar 28, 2018
cafe3c9
fix removing lost jobs
kalombos Mar 28, 2018
1be27f2
Bump version: 0.2.1 → 0.2.2
kalombos Mar 28, 2018
0ee0281
fix readme
kalombos Apr 9, 2018
3374d32
move scripts to head, fix static urls
kalombos May 21, 2018
3114155
Bump version: 0.2.2 → 0.2.3
kalombos May 21, 2018
48f2d1f
revert removing lost jobs
kalombos May 22, 2018
295b446
Bump version: 0.2.3 → 0.2.4
kalombos May 22, 2018
c6ac9fb
refactoring flask application architecture
kalombos Sep 14, 2018
f2814de
Bump version: 0.2.4 → 0.2.5
kalombos Sep 14, 2018
9e5a732
Merge pull request #1 from kalombos/develop
kalombos Sep 14, 2018
79d8dd3
update changelog
kalombos Sep 14, 2018
184582f
update requirements.txt
kalombos Sep 14, 2018
e537773
integrated into scrapyd service
kalombos Sep 17, 2018
c00b9cb
Bump version: 0.2.5 → 0.3.0
kalombos Sep 17, 2018
3c9894c
fix requirements
kalombos Sep 17, 2018
380896c
Bump version: 0.3.0 → 0.3.1
kalombos Sep 17, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[bumpversion]
current_version = 0.3.1
files = SpiderKeeper/__init__.py
commit = True

44 changes: 13 additions & 31 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,16 @@
# SpiderKeeper Changelog
## 1.2.0 (2017-07-24)
- support chose server manually
- support set cron exp manually
- fix log Chinese decode problem
- fix scheduler trigger not fire problem
- fix not delete project on scrapyd problem
# SpiderKeeper-2 Changelog

## 1.1.0 (2017-04-25)
- support basic auth
- show spider crawl time info (last_runtime,avg_runtime)
- optimized for mobile
## 0.3.0 (2018-09-14)
- spiderkeepr was integrated into scrapyd service

## 1.0.3 (2017-04-17)
- support view log
## 0.2.5 (2018-09-14)
- refactoring flask application

## 1.0.0 (2017-03-30)
- refactor
- support py3
- optimized api
- optimized scheduler
- more scalable (can support access multiply spider service)
- show running stats

## 0.2.0 (2016-04-13)
- support view job of multi daemons.
- support run on multi daemons.
- support choice running daemon automaticaly.

## 0.1.1 (2016-02-16)
- add status monitor(https://github.com/afaqurk/linux-dash)

## 0.1.0 (2016-01-18)
- initial.
## 0.2.0 (2017-10-26)
- SpiderKeeper was forked to Spiderkeeper-2
- Add button for removing all periodic jobs
- All tasks show stats now.
- When you run spiderkeeper under wsgi you should not use background scheduler, [see issue](https://github.com/agronholm/apscheduler/issues/160), you should run scheduler in separated process. So, scheduler was separated to own module
- Add foreign constraints to models.
- No need to create project now, all projects will be synchronized automatically with scrapyd.
- Fix bugs.
74 changes: 15 additions & 59 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# SpiderKeeper
# SpiderKeeper-2
#### This is a fork of [SpiderKeeper](https://github.com/DormyMo/SpiderKeeper). See [changelog](https://github.com/kalombos/SpiderKeeper/blob/master/CHANGELOG.md) for new features

[![Latest Version](http://img.shields.io/pypi/v/SpiderKeeper.svg)](https://pypi.python.org/pypi/SpiderKeeper)
[![Python Versions](http://img.shields.io/pypi/pyversions/SpiderKeeper.svg)](https://pypi.python.org/pypi/SpiderKeeper)
[![The MIT License](http://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/DormyMo/SpiderKeeper/blob/master/LICENSE)
[![Latest Version](http://img.shields.io/pypi/v/SpiderKeeper-2.svg)](https://pypi.python.org/pypi/SpiderKeeper-2)
[![Python Versions](https://img.shields.io/pypi/pyversions/SpiderKeeper-2.svg)](https://pypi.python.org/pypi/SpiderKeeper-2)
![The MIT License](http://img.shields.io/badge/license-MIT-blue.svg)

A scalable admin ui for spider service

Expand All @@ -12,10 +13,10 @@ A scalable admin ui for spider service
- With a single click deploy the scrapy project
- Show spider running stats
- Provide api
- Integrated in scrapyd



Current Support spider service
- [Scrapy](https://github.com/scrapy/scrapy) ( with [scrapyd](https://github.com/scrapy/scrapyd))

## Screenshot
![job dashboard](https://raw.githubusercontent.com/DormyMo/SpiderKeeper/master/screenshot/screenshot_1.png)
Expand All @@ -29,83 +30,38 @@ Current Support spider service


```
pip install spiderkeeper
pip install spiderkeeper-2
```

### Deployment

```

spiderkeeper [options]

Options:

-h, --help show this help message and exit
--host=HOST host, default:0.0.0.0
--port=PORT port, default:5000
--username=USERNAME basic auth username ,default: admin
--password=PASSWORD basic auth password ,default: admin
--type=SERVER_TYPE access spider server type, default: scrapyd
--server=SERVERS servers, default: ['http://localhost:6800']
--database-url=DATABASE_URL
SpiderKeeper metadata database default: sqlite:////home/souche/SpiderKeeper.db
--no-auth disable basic auth
-v, --verbose log level


example:

spiderkeeper --server=http://localhost:6800

```

## Usage

```
Visit:

- web ui : http://localhost:5000

1. Create Project

2. Use [scrapyd-client](https://github.com/scrapy/scrapyd-client) to generate egg file
1. Run ```spiderkeeper```

scrapyd-deploy --build-egg output.egg
2. Visit http://localhost:5000/

2. upload egg file (make sure you started scrapyd server)
3. upload egg file (make sure you started scrapyd server)

3. Done & Enjoy it
4. Done & Enjoy it

- api swagger: http://localhost:5000/api.html

```

## TODO
- [ ] Job dashboard support filter
- [x] User Authentication
- [ ] Collect & Show scrapy crawl stats
- [ ] Optimize load balancing

## Versioning

We use [SemVer](http://semver.org/) for versioning. For the versions available, see the [tags on this repository](https://github.com/DormyMo/SpiderKeeper/tags).
```

## Authors

- *Initial work* - [DormyMo](https://github.com/DormyMo)
- *Fork author* - [kalombo](https://github.com/kalombos/)

See also the list of [contributors](https://github.com/DormyMo/SpiderKeeper/contributors) who participated in this project.

## License

This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details
This project is licensed under the MIT License.

## Contributing

Contributions are welcomed!

## 交流反馈
![Contact](https://raw.githubusercontent.com/DormyMo/SpiderKeeper/master/screenshot/qqgroup_qrcode.png)

## 捐赠
![Contact](https://raw.githubusercontent.com/DormyMo/SpiderKeeper/master/screenshot/donate_wechat.png)
4 changes: 2 additions & 2 deletions SpiderKeeper/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
__version__ = '1.2.0'
__author__ = 'Dormy Mo'
__version__ = '0.3.1'
__author__ = 'kalombo'
193 changes: 80 additions & 113 deletions SpiderKeeper/app/__init__.py
Original file line number Diff line number Diff line change
@@ -1,129 +1,96 @@
# Import flask and template operators
import logging
import datetime
import traceback

import apscheduler
from apscheduler.schedulers.background import BackgroundScheduler
from flask import Flask
from flask import jsonify
from flask import Flask, session, jsonify
from flask_basicauth import BasicAuth
from flask_restful import Api
from flask_restful_swagger import swagger
from flask_sqlalchemy import SQLAlchemy
from werkzeug.exceptions import HTTPException

import SpiderKeeper
from SpiderKeeper import config

# Define the WSGI application object
app = Flask(__name__)
# Configurations
app.config.from_object(config)

# Logging
log = logging.getLogger('werkzeug')
log.setLevel(logging.ERROR)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler = logging.StreamHandler()
handler.setFormatter(formatter)
app.logger.setLevel(app.config.get('LOG_LEVEL', "INFO"))
app.logger.addHandler(handler)

# swagger
api = swagger.docs(Api(app), apiVersion=SpiderKeeper.__version__, api_spec_url="/api",
description='SpiderKeeper')
# Define the database object which is imported
# by modules and controllers
db = SQLAlchemy(app, session_options=dict(autocommit=False, autoflush=True))


@app.teardown_request
def teardown_request(exception):
if exception:
db.session.rollback()
db.session.remove()
db.session.remove()

# Define apscheduler
scheduler = BackgroundScheduler()


class Base(db.Model):
__abstract__ = True

id = db.Column(db.Integer, primary_key=True)
date_created = db.Column(db.DateTime, default=db.func.current_timestamp())
date_modified = db.Column(db.DateTime, default=db.func.current_timestamp(),
onupdate=db.func.current_timestamp())


# Sample HTTP error handling
# @app.errorhandler(404)
# def not_found(error):
# abort(404)


@app.errorhandler(Exception)
def handle_error(e):
code = 500
if isinstance(e, HTTPException):
code = e.code
app.logger.error(traceback.print_exc())
return jsonify({
'code': code,
'success': False,
'msg': str(e),
'data': None
})


# Build the database:
from SpiderKeeper.app.spider.model import *


def init_database():
db.init_app(app)
db.create_all()


# regist spider service proxy
from SpiderKeeper.app.proxy.spiderctrl import SpiderAgent
from SpiderKeeper.app.blueprints.dashboard.views import dashboard_bp
from SpiderKeeper.app.blueprints.dashboard.api import api
from SpiderKeeper.app.blueprints.dashboard.model import Project, SpiderInstance
from SpiderKeeper.app.proxy import agent
from SpiderKeeper.app.proxy.contrib.scrapy import ScrapydProxy
from SpiderKeeper.app.extensions.sqlalchemy import db

agent = SpiderAgent()


def regist_server():
def register_server(app):
if app.config.get('SERVER_TYPE') == 'scrapyd':
for server in app.config.get("SERVERS"):
agent.regist(ScrapydProxy(server))


from SpiderKeeper.app.spider.controller import api_spider_bp

# Register blueprint(s)
app.register_blueprint(api_spider_bp)

# start sync job status scheduler
from SpiderKeeper.app.schedulers.common import sync_job_execution_status_job, sync_spiders, \
reload_runnable_spider_job_execution

scheduler.add_job(sync_job_execution_status_job, 'interval', seconds=5, id='sys_sync_status')
scheduler.add_job(sync_spiders, 'interval', seconds=10, id='sys_sync_spiders')
scheduler.add_job(reload_runnable_spider_job_execution, 'interval', seconds=30, id='sys_reload_job')


def start_scheduler():
scheduler.start()


def init_basic_auth():
def init_basic_auth(app):
if not app.config.get('NO_AUTH'):
basic_auth = BasicAuth(app)
BasicAuth(app)


def initialize():
init_database()
regist_server()
start_scheduler()
init_basic_auth()
def init_database(app):
db.init_app(app)
with app.app_context():
# Extensions like Flask-SQLAlchemy now know what the "current" app
# is while within this block. Therefore, you can now run........
db.create_all()


def register_extensions(app):
init_database(app)
init_basic_auth(app)
api.init_app(app)


def register_blueprints(app):
# Register blueprint(s)
app.register_blueprint(dashboard_bp)


def create_flask_application(config):
# Define the WSGI application object
app = Flask(__name__)
# Configurations
app.config.from_object(config)
app.jinja_env.globals['sk_version'] = SpiderKeeper.__version__
register_extensions(app)
register_blueprints(app)
register_server(app)

@app.context_processor
def inject_common():
return dict(now=datetime.datetime.now(),
servers=agent.servers)

@app.context_processor
def inject_project():
project_context = {}
project = None
projects = Project.query.all()
project_context['project_list'] = projects
if projects:
project = projects[0]

project_id = session.get('project_id')
if isinstance(project_id, int):
project = Project.query.get(project_id) or project

if project:
session['project_id'] = project.id
project_context['project'] = project
project_context['spider_list'] = [
spider_instance.to_dict() for spider_instance in
SpiderInstance.query.filter_by(project_id=project.id).all()]
else:
project_context['project'] = {}
return project_context

@app.errorhandler(Exception)
def handle_error(e):
code = 500
if isinstance(e, HTTPException):
code = e.code
app.logger.error(traceback.print_exc())
return jsonify({
'code': code,
'success': False,
'msg': str(e),
'data': None
})
return app
Loading