Paperbaum is a decentralized academic paper publishing and verification system built on a custom Substrate-based parachain. It addresses issues of authorship verification, restricted access, and inefficient paper linking in academic publishing.
Link to the slides: here
- Substrate Parachain: Custom runtime for paper metadata storage in a merkle tree and verification.
- IPFS Integration: Decentralized storage for full paper content.
- Vector Similarity Engine: NLP-based system for semantic paper linking.
The core of Paperbaum is built on a custom Substrate parachain, providing a robust and flexible foundation for academic paper management and verification. The custom pallet uses a Merkle tree to natively link papers together. This pallet provides functionality for:
- Managing a Merkle tree of paper hashes
- Verifying Merkle proofs
- Storing and retrieving paper metadata
- Enforcing size limits on various paper attributes
Paperbaum leverages the InterPlanetary File System (IPFS) for decentralized storage of full paper content. This integration ensures that papers are stored in a distributed, content-addressed manner, enhancing accessibility and permanence.
Paperbaum implements a vector similarity engine for semantic paper linking. This system uses OpenAI's text embedding model to generate vector representations of papers, enabling efficient similarity searches. The generateEmbedding
function creates a vector representation of text, while cosineSimilarity computes the similarity between two vectors.
When a paper is uploaded, Paperbaum processes the PDF, extracts key metadata, and generates a vector representation:
- PDF text extraction
- Metadata extraction using GPT4o-mini
- Vector embedding generation
- IPFS upload
- Storage of metadata and vector in-memory in a merkle tree
![](https://private-user-images.githubusercontent.com/80065244/350760139-cebaa7ac-5f2e-4efb-bfa8-d3afce0734e3.jpg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg4Mzg4MjksIm5iZiI6MTczODgzODUyOSwicGF0aCI6Ii84MDA2NTI0NC8zNTA3NjAxMzktY2ViYWE3YWMtNWYyZS00ZWZiLWJmYTgtZDNhZmNlMDczNGUzLmpwZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA2VDEwNDIwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTVjNDU4ZGEwNjI2YzQyM2I4MjcwZjhiZDMwYWY1ODNkODJhZjc5MGRhZTJjZDlmNjQ4MTZlNDQ4MzkyOGUzZmEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.s7NRfvGnxodk-dG5BUTsaL_OXYowJzxzWGhWM7m44jc)
![](https://private-user-images.githubusercontent.com/80065244/350760140-1c621f98-d9df-417e-92d7-db6a1b2354c9.jpg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg4Mzg4MjksIm5iZiI6MTczODgzODUyOSwicGF0aCI6Ii84MDA2NTI0NC8zNTA3NjAxNDAtMWM2MjFmOTgtZDlkZi00MTdlLTkyZDctZGI2YTFiMjM1NGM5LmpwZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA2VDEwNDIwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWZlZDVmZDg4NDI4NWRlNDU2ODdjODU4NzNlOTZlYjAyOWIzNTAxZDhhMzg2OTBiZDlhZjZkNDY2NjA3NDk3M2EmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.4Dkfq-_7BlrLkmfWI29mRQOQlFQA56r3lezC3wAF1F0)
![](https://private-user-images.githubusercontent.com/80065244/350760141-d1d4f39c-96e0-423b-9c1f-be90f3ce1a3b.jpg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg4Mzg4MjksIm5iZiI6MTczODgzODUyOSwicGF0aCI6Ii84MDA2NTI0NC8zNTA3NjAxNDEtZDFkNGYzOWMtOTZlMC00MjNiLTljMWYtYmU5MGYzY2UxYTNiLmpwZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA2VDEwNDIwOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWNlM2QwM2FmNDEyYTdlNTRhMGYxOWMzMTYzOTYzZjYwNzZjNjkyNzFmZTk2MjRmMWVlZDMwMWU4MTg5NzA4YTImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.B8JfDjjjHUGs90_oAFOOaEfMxrm7C8gDa83dVga50EE)
For the parachain, first compile it using
cargo build --release
and then run
./target/release/node-template --dev
to run the substrate node on 127.0.0.1:9944
To run the backend server, enter the backend
directory and run
npm install
and then proceed to run
node server.js
to run the server on localhost:3000
To run the frontend, enter the frontend
directory and run
npm install
and then
npm run dev
to run the frontend on localhost:3001
- Develop a more sophisticated Merkle tree structure for efficient paper linking and verification.
- Implement Merkle Mountain Ranges (MMR) for dynamic dataset management, allowing efficient updates and proofs of inclusion.
- Develop a ZK-based reputation system for anonymous yet credible peer reviews.
- Create ZK proofs for citation verification without revealing full paper contents.
- Implement double-blind review processes using ZK proofs.
- Develop a reputation system for reviewers based on the quality and timeliness of their reviews.
- Develop cross-chain citation verification and tracking.
- Create a system for recognizing academic credentials and reputations across different blockchain networks.
- Implement versioning and provenance tracking of papers using OriginTrail's blockchain-agnostic protocol.
- Develop an AI-assisted discovery system leveraging OriginTrail's semantic data structure.
See MIT License