Skip to content

Commit

Permalink
refactor(perf): make the action faster (#3)
Browse files Browse the repository at this point in the history
## Summary

Instead of using the docker image by algolia for docsearch, this pr uses the source repository for scrapping and uploading to algolia.

## Details

Using the source repository, removes the use of jq, installation of docker-cli, the algolia docsearch docker image, and other peer dependencies.

Using the python:3.6 as the base image (which comes with git preinstalled), first the algolia-docsearch repository is git cloned.
pipenv is installed and then pipenv installed the packages in the Pipfile.

## Improvements

The running time of the action has now reduced by 40 seconds.

## Further Comments

I have made a few other fixes/corrections like correcting the spelling of algolia.

Also, I changed the config.example.json since it took a lot of time to index and the difference could not have been made clear.

Closes #2
  • Loading branch information
aditya-mitra authored Jun 23, 2021
1 parent 1eaef43 commit e3b8c55
Show file tree
Hide file tree
Showing 6 changed files with 46 additions and 59 deletions.
9 changes: 6 additions & 3 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
on: [push]
name: Test the Action

on:
- push

jobs:
example_job:
runs-on: ubuntu-latest
name: test the action
steps:
- uses: actions/checkout@master
- uses: actions/checkout@v2
- name: test
uses: darrenjennings/algolia-docsearch-action@master
uses: ./
with:
algolia_api_key: ${{ secrets.ALGOLIA_API_KEY }}
algolia_application_id: ${{ secrets.ALGOLIA_APPLICATION_ID }}
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM ubuntu:latest
FROM python:3.6

COPY entrypoint.sh /entrypoint.sh

Expand Down
9 changes: 4 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,18 @@ This action runs the docsearch scraper and updates an index.
## Inputs

### `algolia_application_id`
**Required** 'Aloglia docsearch `APPLICATION_ID`
**Required** Algolia docsearch `APPLICATION_ID`

### `algolia_api_key`
**Required** Aloglia docsearch `API_KEY`
**Required** Algolia docsearch `API_KEY`

### `file`
**Required** File able to be accessed from $GITHUB_WORKSPACE, used in tandem
with `actions/checkout@master`
**Required** File able to be accessed from $GITHUB_WORKSPACE, used in tandem with `actions/checkout@master`

## Example usage

```yaml
- uses: actions/checkout@master
- uses: actions/checkout@v2
- uses: darrenjennings/algolia-docsearch-action@master
with:
algolia_application_id: 'XXXXX83LWT'
Expand Down
4 changes: 2 additions & 2 deletions action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ branding:
color: 'blue'
inputs:
algolia_application_id:
description: 'Aloglia docsearch APPLICATION_ID'
description: 'Algolia docsearch APPLICATION_ID'
required: true
algolia_api_key:
description: 'Aloglia docsearch API_KEY'
description: 'Algolia docsearch API_KEY'
required: true
file:
description: 'File path to docsearch'
Expand Down
47 changes: 13 additions & 34 deletions config.example.json
Original file line number Diff line number Diff line change
@@ -1,41 +1,20 @@
{
"index_name": "prod_EE",
"index_name": "algolia_docsearch_action",
"start_urls": [
{
"url": "https://docs.konghq.com/enterprise/(?P<version>.*?)/",
"variables": {
"version": {
"url": "https://docs.konghq.com/enterprise/",
"js": "var versions = $('ul[aria-labelledby=version-dropdown] a, button#version-dropdown').map(function(i, e) { return $(e).text().replace(/\\s+/g, '').replace(/Version/g, '').replace('(2020)', '').replace('(latest)', ''); }).toArray(); return JSON.stringify(versions);"
}
}
}
],
"sitemap_urls": [
"https://docs.konghq.com/sitemap.xml"
],
"stop_urls": [

"https://aquaimpact.github.io/CovidSusTrackerDocs"
],
"stop_urls": [],
"selectors": {
"lvl0": {
"selector": ".docs-navigation > a.active",
"global": true,
"default_value": "Kong"
},
"lvl1": ".content h1",
"lvl2": ".content h2",
"lvl3": ".content h3",
"lvl4": ".content h4",
"text": ".content p, .content li"
"lvl0": ".doc-content h1",
"lvl1": ".doc-content h2",
"lvl2": ".doc-content h3",
"lvl3": ".doc-content h4",
"lvl4": ".doc-content h5",
"lvl5": ".doc-content h6",
"text": ".doc-content p, .doc-content li"
},
"selectors_exclude": [
"#next-steps",
"#next-steps ~ p"
],
"only_content_level": true,
"conversation_id": [
"534091583"
"1313246279"
],
"nb_hits": 18645
}
"nb_hits": 98
}
34 changes: 20 additions & 14 deletions entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,23 @@ APPLICATION_ID=$1
API_KEY=$2
FILE=$3

apt update
apt install jq -y

# install docker
apt install apt-transport-https ca-certificates curl software-properties-common -y
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable"
apt update
apt-cache policy docker-ce
apt install docker-ce -y

ls -la $GITHUB_WORKSPACE
cat $GITHUB_WORKSPACE/$FILE | jq -r tostring
docker run -e APPLICATION_ID=$APPLICATION_ID -e API_KEY=$API_KEY -e "CONFIG=$(cat $GITHUB_WORKSPACE/$FILE | jq -r tostring)" algolia/docsearch-scraper
# build from the main source repository
git clone https://github.com/algolia/docsearch-scraper.git

cd docsearch-scraper/

# install pipenv without cache
pip install --no-cache-dir --trusted-host pypi.python.org pipenv

# install packages without virtualenv
pipenv install --system --deploy --ignore-pipfile

# create the .env file for docsearch
echo "APPLICATION_ID=${APPLICATION_ID}
API_KEY=${API_KEY}
" > .env

# run algolia docsearch
python docsearch run $GITHUB_WORKSPACE/$FILE

echo "🚀 Successfully indexed and uploaded the results to Algolia"

0 comments on commit e3b8c55

Please sign in to comment.