-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crowdsourcing Metadata #2455
Comments
A wiki-like approach? What would be editable within and outside Tribler? |
Refocus this issue on torrent channels based on magnet links with rich metadata. Thesis goal is to deploy creation and search of rich metadata. More ambitious goal would be to do integrating of voting to obtain trustworthy metadata. Each channel can only be modified by the channel owner. Users vote on the quality of channels and quality emerges from the Tribler user collective. It is easy to copy an entire channel, re-use quality content from a channel, and re-mix channels. Thus leading indirectly to crowdsourcing. We specifically avoid the problem of edit wars, collaborative editing, and undo changes. This leads to a realistic master project. Only with an active community in place, we can move to the next stage and experiment with more sophisticated crowdsourcing models. The torrent channel is expanded with a new type: rich metadata channels. Channels owners indicate the content type of each magnet link. We still keep it simple: 1 magnet link has 1 rich metadata description. First step is to create a simple editor inside Tribler. It supports several content types. Combining enriching of music, podcast, movies, vloggers, scientific articles, etc. : This master thesis puts the foundations in place for rich metadata. In the future we want to integrate collaborative tools. For instance, we assume scientific papers are available in Tribler and we can semi-automatically create survey papers. Still out of scope: |
Demonstrates the content type idea (radio button or drop down): |
@devos50 Do you have time to walk through the stack this afternoon? I have a doctors appointment in an hour, so I'll be at the lab after lunch. I've seen most of the code, and have looked into QT last week, but it still is a bit fuzzy. |
He is on vacation this week. But others should be able to help. |
Got my community running, peers can exchange messages (directed and broadcast), next step is to define the behavior of the community and the message types that can be sent. @synctext should I look at a more branching metadata structure such as here (inspired by youtube, piratebay and dublincore). Next step would be to structure the database and messages sent. |
@svanschooten I would advise to keep it as simple as possible for now and not go wild with many different metadata types/complicated structures yet. Also, nice to see that you have a basic community up and running! |
@devos50 welcome back! I agree, that is why I re-researched the desired structure, when I have something solid I'll implement a data structure which I can store in the database (also start implementing the distribution mechanics). Due to a family crisis I have not been able to come to the lab Friday and today, but I have done some reading and thinking on the categorization issue: most content management systems use a tree based structure to define archetypes, subtypes and properties, though I have come across some interesting work. Twitter has published a content categorization method that looks interesting, though it is not directly applicable to our case. Based on these articles and papers I have opted to design a more 'flat' category structure, which I have documented on my repository. |
#1150 is about to start soon. First finish this quick MAX 4 week prototype, then think how to build on top of scalable channels. When the're hopefully ready! Write rich metadata on Trustchain? (e.g. so barter records, voting for channels, trading honesty, and metadata enrichment). Then we have 4 contexts of reputations to merge somewhat. Next step is to remove all non-blockchain data sync mechanisms in Tribler... Remove all storage in Dispersy #2778, all communities, and replace it with IPv8-based Trustchain storage. Keep it simple-and-get-it-running-first-you-stupid model: only channel owner can do metadata enrichment :-) |
Next (@synctext ??): writing metadata to persistence layer, more UI screens (only on torrent add for now) or better metadata models? |
Looking at most metadata models, they approach it from an unstructured data angle, they usually have a (semi-) fixed tree structure for fields, but no simple and straightforward approaches to storing it using a relational database: I do not want to introduce more dependencies, but I think a noSQL storage method would be easiest?
|
Why do you want to make the Consider adopting distributing scientific works as your test community for your entire thesis {or something else additionally; http://bt.etree.org}. Or create a tool and test how many hours it takes to put stuff like 400k scientific journals in your rich metadata. a.k.a. Just make an music table, movie, clip, series, vlog, ebook, adult entertainment, other
ID3v1 pre-defines a set of genres denoted by numerical codes. Keeps it trivial... Future: #3484 After this 4-week prototype is completed, explore more advanced architecture. We are prototyping using our Trustchain idea as the only storage paradigm in Tribler. It would contain: bandwidth barter transactions, voting for channels, trading of bandwidth coins #3326. Additionally, possibly rich metadata of channels; this thesis. Warning: this idea for yet another Tribler overhaul would take years to complete and get stable! |
The underlying data model has bee simplified and abstracted more, to provide generic reading and setting handles. Is completely flat now. TODO:
|
Refined thesis subject: Searching in enriched metadata using deduplicated tag clouds.
|
It might be helpful for you to sync with @xoriole, your ideas seems to overlap somewhat. |
@devos50 It was quite a long discussion. Lots of ideas floating. We'll see how the design materializes. |
related to #6217 |
Allow any user to improve the metadata. Examples of existing approaches:
Time slicing is too heavy for Tribler, out of scope. Just metadata.
The text was updated successfully, but these errors were encountered: