Skip to content
This repository has been archived by the owner on Jun 3, 2021. It is now read-only.

Commit

Permalink
fix: make ocr adapted to serverless
Browse files Browse the repository at this point in the history
  • Loading branch information
beetcb committed Feb 18, 2021
1 parent f33bfb8 commit f0dc6f5
Show file tree
Hide file tree
Showing 2 changed files with 53 additions and 7 deletions.
20 changes: 15 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,8 @@

- 验证持久化: 缓存验证信息于内存, 只在失效时更新

- 兼容云服务的 OCR:很多云服务(如云函数)的文件系统并不都是可写入的,我们将 OCR 验证码识别用到的 [tesseract.js](https://github.com/naptha/tesseract.js) 数据和训练缓存包暂存到了 `/tmp`,降低出错率;同时,为加快国内访问速度,下载节点托管于码云

- 多用户非阻塞: 利用 NodeJS 异步特征,多用户可并行,实现毫秒级的多用户同时操作

- 关于签到: (学校配置时)使用百度地图 API 获取学校全局签到地址, 使用今日校园接口返回的签到数据获取签到经纬度, 简单来说, 只需知道学校英文简称即可配置好所有签到信息, 充分懒人化
Expand All @@ -186,16 +188,24 @@ npm i -g @beetcb/cea

2. 初始化学校及用户

- 学校配置:
- 用户配置:

交互式配置用户:

```sh
cea -s
cea -u
```

- 用户配置:
或者从 `conf.toml` 文件配置用户,同时也会配置学校

```sh
cea -u
cea load
```

- 学校配置:

```sh
cea -s
```

- (可选)使用文件配置用户: 根目录下创建 `conf.toml`, 参考以下示例:
Expand Down Expand Up @@ -274,7 +284,7 @@ cea rm 'all'
登录中加解密过程大量参考 [wisedu-unified-login-api](https://github.com/ZimoLoveShuang/wisedu-unified-login-api) 项目,十分感谢
感谢 [cloudbase-framework](https://github.com/Tencent/cloudbase-framework)、[Github Actions](https://github.com/actions)、[Coding CI](https://help.coding.net/docs/ci/intro.html) 提供的优秀服务 🎉
感谢 [cloudbase-framework](https://github.com/Tencent/cloudbase-framework)、[Github Actions](https://github.com/actions)、[Coding CI](https://help.coding.net/docs/ci/intro.html)、[Gitee Pages](https://gitee.com/help/articles/4136) 提供的优秀服务 🎉
## Disclaimer
Expand Down
40 changes: 38 additions & 2 deletions crawler/captcha.js
Original file line number Diff line number Diff line change
@@ -1,8 +1,42 @@
const { createWorker } = require('tesseract.js')
const fetch = require('node-fetch')
const fs = require('fs')
const tessdataPath = '/tmp/eng.traineddata.gz'

module.exports = async function ocr(captchaUrl) {
async function downloadTessdata() {
process.env.TESSDATA_PREFIX = '/tmp'
// check folder exists
if (!fs.existsSync('/tmp')) {
fs.mkdirSync('/tmp')
} else {
// check file exists
if (fs.existsSync(tessdataPath)) {
return
}
}
const result = await download(
'https://beetcb.gitee.io/filetransfer/tmp/eng.traineddata.gz',
tessdataPath
)
console.log(result)
}

async function download(url, filename) {
const stream = fs.createWriteStream(filename)
const res = await fetch(url)
const result = await new Promise((resolve, reject) => {
res.body.pipe(stream)
res.body.on('error', reject)
stream.on('close', () => resolve(`Downloaded tess data as ${filename}`))
})
return result
}

async function ocr(captchaUrl) {
await downloadTessdata()
const worker = createWorker({
langPath: 'crawler',
langPath: '/tmp',
cachePath: '/tmp',
})
await worker.load()
await worker.loadLanguage('eng')
Expand All @@ -17,3 +51,5 @@ module.exports = async function ocr(captchaUrl) {
await worker.terminate()
return text
}

module.exports = ocr

0 comments on commit f0dc6f5

Please sign in to comment.