SMM

Teammates

王竣樺梁致銓曾繁斌

Patent Crawler

Build Env

python -m venv venv
venv/Script/Activate
pip install -r requirements.txt

Proxy Pool Usage

We currently utilizing 10 proxies provided by the free account of Webshare.

Warning

REMEMBER del the save page part of get_page() or your computer will be filled with htmls

Run Crawler

cd Patent_Search_Crawler
# crawling
python main.py
# merging files
python merger.py

Data

You can find the complete raw data in this link: link

We use the data merge_data.sqlite for training.

Tip

The merge_data.sqlite should be put at SMM folder.

Explanation of the preprocessed data (in /SMM/EDGPAT/data):

Tip

The datasets below are for training, no need to preprocess.

2-1-level.csv: IPC Level mapping from 1 to 2.
3-2-level.csv: IPC Level mapping from 2 to 3.
4-3-level.csv: IPC Level mapping from 3 to 4.
5-4-level.csv: IPC Level mapping from 4 to 5.
real-data.json: Illustrate the company's patents within the current year.

Note

we encode year 2018 as 0.

IPC Patent level example:

Final Project: Exploring Patent Trends in Taiwan with Event-based Graph Techniques

Our Goal

Our project focuses on developing a patent prediction model specifically for forecasting Taiwan's future patent trends. Utilizing Event-based Graph techniques, this model analyzes historical patent data to identify emerging trends and patterns.

Data-Driven: Uses real-world patent data (Taiwan) to identify trends.
Dynamic: Adapts to changes in technology and innovation.
Predictive: Forecasts areas likely to see growth in patent filings.

Model FrameWork

Framework of the proposed model. We just show the calculations of the patent classification codes and one of the related companies for simplicity.

Code

We utilized the code from EDGPAT

Warning

The Python env should be Python 3.6!

Required packages:

Preprocessing

Just run the build_input.ipynb

We split the data into three parts: training, validation and testing by year.

Training

Run the code:

sh EDGPAT/train.sh

Note

This code will ouptut the training result in EDGPAT/out.txt

Our Results

Origin Paper Results

Our Results

	Recall	NDCG	PHR
Top-10	0.1175	0.1725	0.5491
Top-20	0.1646	0.1742	0.6304
Top-30	0.1868	0.1769	0.6612
Top-40	0.2006	0.1769	0.6800

	Recall	NDCG	PHR
Top-10	0.1577	0.3395	0.8381
Top-20	0.2349	0.3364	0.8995
Top-30	0.2837	0.3401	0.9217
Top-40	0.3150	0.3427	0.9321

Droput 0.5

Dropout 0

	Recall	NDCG	PHR
Top-10	0.1668	0.3552	0.8407
Top-20	0.2363	0.3496	0.8851
Top-30	0.2762	0.3487	0.9073
Top-40	0.3051	0.3490	0.9164

	Recall	NDCG	PHR
Top-10	0.1577	0.3395	0.8381
Top-20	0.2349	0.3364	0.8995
Top-30	0.2837	0.3401	0.9217
Top-40	0.3150	0.3427	0.9321

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
EDGPAT		EDGPAT
Patent_Search_Crawler		Patent_Search_Crawler
pic		pic
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md
build_input.ipynb		build_input.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SMM

Teammates

Patent Crawler

Build Env

Proxy Pool Usage

Run Crawler

Data

Final Project: Exploring Patent Trends in Taiwan with Event-based Graph Techniques

Our Goal

Model FrameWork

Code

Preprocessing

Training

Our Results

About

Releases

Packages

Contributors 3

Languages

joeyliang1024/Social-Media-Mining

Folders and files

Latest commit

History

Repository files navigation

SMM

Teammates

Patent Crawler

Build Env

Proxy Pool Usage

Run Crawler

Data

Final Project: Exploring Patent Trends in Taiwan with Event-based Graph Techniques

Our Goal

Model FrameWork

Code

Preprocessing

Training

Our Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages