A custom Django model field that automatically archives a URL
Install this library from the Python Package Index.
$ pip install django-urlarchivefield
Add it to one of your models.
from django.db import models
from urlarchivefield.fields import URLArchiveField
class MyModel(models.Model):
archive = URLArchiveField(upload_to="my_archive")
Saving an URL's HTML content to your storage backend becomes this easy.
>>> obj = MyModel.objects.create(archive="http://www.latimes.com")
>>> # You can do it this way too
>>> obj = MyModel()
>>> obj.archive = "http://www.latimes.com"
>>> obj.save()
The field attribute now has all the typical qualities of a Django FileField with a few additions.
# Return the URL that was originally submitted for archival
>>> obj.archive.archive_url
# Return the timestamp when the archive was created
>>> obj.archive.archive_timestamp
# Return the HTML from the archive
>>> obj.archive.archive_html
By default, the data are compressed before being saved by gzip. If you'd rather store the raw
HTML set the compress
keyword argument on the field.
class MyModel(models.Model):
archive = URLArchiveField(upload_to="my_archive", compress=False)
This is a joint project of PastPages.org, The Reynolds Journalism Institute and the University of Missouri.