-
Notifications
You must be signed in to change notification settings - Fork 30
@domschiener's project #47
Comments
In #46 (comment) @jbenet said:
|
In #46 (comment) @domschiener said:
|
@domschiener what's the title of your project? |
I don't have a name for it yet. I'll reach out to some people and see if we can get some interest from other communities as well to work on this. Will keep you updated. |
For the consensus engine, it doesn't necessarily have to be fully based on reputation system. |
(hashed just in case if there is an apocalypse on the server https://ipfs.io/ipfs/QmeA6i4taf1ufjqsA7BSnECBSA5KKBFxgBTJJkz7M4AFav) |
@rht very nice find. |
Progress UpdateI wanted to give everyone a quick progress update on the project. I've basically worked on a simple prototype (https://github.com/domschiener/instant-wikipedia) that makes it possible to instantly search wikipedia entries. To describe where I'm going with this:
Right now I'm continuously making API calls but essentially for taking this product live I will need to download the Wikipedia dump (which is around 30gb I think) and utilise that. That means the next step will it be the fork Wikipedia. But for this I absolutely need to discuss a few things with the IPFS community. Essentially, referring back to #46, we could fork Wikipedia and put it on IPFS so that applications like the one I'm trying to build can utilize the content and perform these operations with. But I'm wondering if this is doable right now? Would love to get some input from you guys. |
i do not want a fork of wikipedia. maintaining wikipedia is a gargantuan amount of work. Please do not do this-- just provide a different view based on different storage for now. we can ingest all of the data, and then run periodic scripts that update all data from wikipedia servers as it is created. thus you do not fork, merely show how it would work distributed. |
anyway, the website looks reall cool and happy to help get it all setup! let's just make sure it's the real wikipedia and not a fork. |
Well the point is that I can't take the current system into production as I'm making API calls on each keyboard entry, which means that with many concurrent clients making request this will just put unnecessary bloat on Wikipedia's servers. Which is why I much rather prefer to download Wikipedia's Dump and use it to provide the current features. This is why I was thinking of potentially "forking" Wikipedia and uploading the content on IPFS. But I too agree that this is a pretty useless way to move forward. I like the idea of simply mirroring Wikipedia as is. This way we can then "prefill" the website and on top of that we let users create entries for Factoids and for the summary of content - which is our unique way of useful content creation. Of course the question then comes: What if users want to edit the main content mirrored by Wikipedia? Do we fork these individual pages? But I suppose we can wait to answer that question until after we have contributors to Factoids/Summaries. |
yes im not suggesting "dont download wikipedia", i'm saying there's a big difference between "cloning and staying up to date with upstream" and "forking". "forking" implies changes in your clone that are not in upstream. |
I will write a more serious Concept Paper about the overall idea of this Decentralized Collaborative Community and hopefully more people will join it then. Will keep you posted. |
@jbenet What do you think is the best strategy for uploading the Wikipedia dumps on IPFS regularly? According to DBPedia.org "1.4 Wikipedia articles are modified per second which results in 84 articles per minute" this means that around 120000 (probably less) articles are edited a day. Wikipedia creates new dumps roughly once a month, so we would have to catch up with that updated content and create the updates for the IPFS backups. We could of course write a program that compares the old articles with the new articles (from the dump), determines what has been changed and only uploads the changes to IPFS. But would love to get your input on this. Btw I will have the concept paper ready by tonight so you can all take a look. |
Also, it should be noted that the dumps consist of roughly 45gb worth of material in total. |
Hrm, interesting... I thought the foundation would also publish changesets. Maybe they became to massive for the English version? I found the Special:Export which can return the history of a specific article. This might give use a good chance to get more feedback on #23 as it might be necessary to reconstruct the history form revision data without access to the actual VCS? Tangential question for @domschiener: Have you considered wikidata.org as a source for your project as well? |
@cryptix I think that all the dumps include metadata, such as the article history, as well. But I haven't downloaded the dumps yet to confirm this. If it contains this information as well it will definitely make our job easier. Here are the dumps I'm referring to btw: https://dumps.wikimedia.org/backup-index.html (enwiki to be precise). re: wikidata.org. I really like what they are doing and they are basically taking an opposite approach of DBPedia.org. But after some research, I think that utilizing DBPedia is still the better approach considering that they are more established and have more data entries. But perhaps we can find a way to use both. The way we could use DBPedia is as a sort of "semantic overlay" to the platform which offers a richer and more informative user experience when a user searches for a specific subject. We can for example change the way people get to their desired information by extending the way search queries can be performed (http://dbpedia.org/use-cases/revolutionize-wikipedia-search-0) and we can also create "portable knowledge". What I mean with that is for example you as a website owner of your most favorite football club, lets say FC Bayern, can utilize our API to construct a detailed profile about each player. Instead of having to find out all the information yourself, you can make a simple API call which returns the required information, ready to be displayed on your website. Or we can even go a step further and allow users to create predefined profiles that can be embedded on websites (similar to how onename.com does it with their identity profiles). The goal is it to be make knowledge even more accessible and easier for people to get the information they want in as short time as possible. But that is what the future of this project hopefully is. In the present we need to create an active community that is incentivized to actively contribute to the platform. |
Here is the concept paper: https://medium.com/college-cryptocurrency-network/reposium-dco-the-future-of-wikipedia-4be080cfa027 |
If anyone is interested in improving on Wikipedia, I'd love to chat with you. I think today's Wikipedia is fundamentally broken and there's room for at least an order of magnitude improvement (same as from Britannica -> Wikipedia). Get in touch. @domschiener: got your slack message btw, am working on a reply right now. |
I am totally in for such a project and I think that this is the kind of project that IPFS could enable. @taoeffect I may get in touch with you but what I have on the mind right now is the semantic web because just storing the content of wikipedia something but IPFS could enable a better view of that data. I know there was a lot of hype around the semantic web some years ago and it fall through but I think the idea of organizing and structuring information is crucial for the future of the web. An IPFS data node has a some links that can be used to create relations between data and right now I don't think that these links can be tagged (to have a annotated DAG) but this may be emulated (or changed). It may be interesting to think about a webfs standard a bit like the unixfs, it could provide some ideas to crawl and search in an IPFS web. |
Hi, just stumbled about this thread. I am very interested in such projects and would like to help. I also have to read more in this thread. Very brief, because I'm traveling right now: @cornerman and I developed a hypergraph based discussion system as our master theses in computer science. The goal was to build a discussion system that scales in the number of people. We also did our own take on community moderation. I will write about more concepts soon. Prototype: http://lanzarote.informatik.rwth-aachen.de:9001/dashboard (please play around and do the tutorial to understand the most important concepts) |
Wordpress integration? |
In #46 (comment) @domsciener had said:
I moved this to keep it as its own discussion.
The text was updated successfully, but these errors were encountered: