This project is a Go web application developed using the Fiber framework. It serves as a platform for searching and retrieving trademark information sourced from USPTO bulk data based on various criteria like Mark Identifcation, Attorney Names, Owners, Serial number, Class Codes, and Application Date.
- Overview
- Usage
- Search Architecture
- Features
- Data Source
- Extracting Data
- Bulk Insertion
- Search
- Postman Documentation
This is a data-driven application that leverages information obtained from the United States Patent and Trademark Office (USPTO) dataset. It utilizes USPTO daily trademark files to provide valuable insights, search capabilities, and retrieval of trademark information.
The following prerequisites are required to run this application:
- GoLang
- ElasticSearch
- PostgreSQL
- Clone the repository:
git clone https://github.com/prashant42b/trademarks-elastic-search-engine.git cd trademarks-elastic-search-engine
- Run
go mod download go run main.go
![image](https://private-user-images.githubusercontent.com/63443918/296379601-0bd994b0-04ec-47ba-84aa-9a6c3afa784e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2OTI1MjcsIm5iZiI6MTczOTY5MjIyNywicGF0aCI6Ii82MzQ0MzkxOC8yOTYzNzk2MDEtMGJkOTk0YjAtMDRlYy00N2JhLTg0YWEtOWE2YzNhZmE3ODRlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE2VDA3NTAyN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTZkNDIyYTFlYjA3M2RjYzk0NWU1NjBkOWRlNDlkNDZmMzExYjA2NzhkYjE4ODI4MmJjZTdlNWRhYTE5ZTdlNTUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.ERn93Li0yD8XJzIKSMyHPah9m_OBmfSBrxwks-lclS0)
- Search trademarks by mark-identification, serial number, attorney name(s), owner(s), application date, and class code(s).
- Efficiently parse and store USPTO trademark data from XML to JSON.
- The project integrates with a PostgreSQL database using GORM (ORM) to store and retrieve trademark information.
The USPTO data is sourced from the official United States Patent and Trademark Office database.
- Files: xmlFromArchive.go and xmlToJSON.go
- The extraction process entails a script that decompresses the daily data file, extracting the XML file into a designated folder.
- The next step involves utilizing the encoding/xml and encoding/json packages to parse all the extracted fields into JSON.
type CaseFile struct {
SerialNumber string `xml:"serial-number"`
FilingDate string `xml:"case-file-header>filing-date"`
StatusCode string `xml:"case-file-header>status-code"`
MarkIdentification string `xml:"case-file-header>mark-identification"`
MarkDrawingCode string `xml:"case-file-header>mark-drawing-code"`
AttorneyNames string `xml:"case-file-header>attorney-name"`
Owners []Owner `xml:"case-file-owners>case-file-owner"`
ApplicationDate string `xml:"transaction-date"`
RegistrationNumber string `xml:"registration-number"`
ClassCode string `xml:"classifications>classification>international-code"`
RegistrationDate string `xml:"case-file-header>registration-date"`
}
type Owner struct {
Name string `xml:"party-name"`
}
- Files: insertIntoDB.go and insertIntoESDB.go
- Two utils have been created to facilitate Bulk insertion of json data from the converted_data.json file.
- Bulk insertion into PostgreSQL DB: insertIntoDB.go makes use of a GORM model to handle this functionality.
- Bulk insertion into Elastic Search: insertIntoESDB.go employs a strategy to bulk insert jsonData into the Elastic Search Index (trademarks)
The search engine uses Elastic Search and enables users to search and retrieve trademark information sourced from USPTO bulk data based on various criteria like Mark Identifcation, Attorney Names, Owners, Serial number, Class Codes, and Application Date.
- Postman Documentation Please refer to the above mentioned API documentation.