Tree view site map generator

Discovers all the pages in site or single page app (SPA) and creates a tree of the result in ./output/<site slug/crawl.json. Optionally takes screenshots of each page as it is visited.

Demo

Demo website

Prerequistes

Node v8+

Install Node using HomeBrew on Mac

Download project and install dependancies

clone https://github.com/ptutty/sitemapcreator
cd sitemapcreator
npm install

Setup Configuration file

edit config-sample.json and rename config.json, add the URL details of the site you wish to crawl
depth is how many levels to crawl
if you wish to test you may find it useful to set headless: false so see what is going on.
the filter flag allows you to cusomize anchors link which are crawled

{
    "host": "https://www.bbc.co.uk",
    "path": "/sport",
    "depth": 2,
    "headless": true,
    "filters": false
}

Filters

Filter allow you to remove unwanted cruff from the visualisation, such as: page anchors links, links back to the homepage, links to documents, intranet links etc. See the array 'excludeAnchorsWhichContain' below

Sometime you may wish not to crawl the navigation again on each subpage, you can list URL fragments in the array 'excludeSubpageAnchorsEndingWith'

{ 
    "excludeSubpageAnchorsEndingWith" : [
        "/live/",
        "/programmes/",
    ],
    "excludeAnchorsWhichContain" : [
        "#",
        ".pdf",
        "docx",
        "doc"
    ]
}

Start a crawl and capture data

To start a crawl, run the command below in the console - make sure you are in the project directory.

  node app.js

You will see URL's being crawled in the console. You can also run a crawl and capture optional screenshots

  node app.js --screenshots

View the visualisation

Start a local server.

  node server.js

Then open the URL below in a browser:

http://localhost:8080/html/d3tree.html?url=../output/https___yourspa.com/crawl.json

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
html		html
.gitignore		.gitignore
app.js		app.js
config-sample.json		config-sample.json
crawler.js		crawler.js
filters.json		filters.json
get_page_anchors.js		get_page_anchors.js
helpers.js		helpers.js
package-lock.json		package-lock.json
package.json		package.json
readme.md		readme.md
server.js		server.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tree view site map generator

Demo

Prerequistes

Download project and install dependancies

Setup Configuration file

Filters

Start a crawl and capture data

View the visualisation

About

Releases

Packages

Languages

ptutty/sitemapcreator

Folders and files

Latest commit

History

Repository files navigation

Tree view site map generator

Demo

Prerequistes

Download project and install dependancies

Setup Configuration file

Filters

Start a crawl and capture data

View the visualisation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages