================= is a collection of R functions I wrote to speed up my daily workflow. Thats why they do not relate to a common working field such as functions in a package would.
this function takes an igraph graph object and collapses it by a given node attribute (byAtt). The agg-option allows to reduce certain node attributes by functions as for example sum,mean,median,min,max for numeric values or paste, unique etc. for strings.
tries to identify the domain of a given set of urls. The result is not just the first part of an URL but the "real" domain. this returns derstandard.at for http://m.derstandard.at/xxx/yyy/zzz and not m.derstandard.at therefore a list of registered topleveldomains (tlds.txt) is used. appending toplevel domains to this list allows you to identify subdomains of services. for example adding wordpress.com will return supersambo.wordpress.com for http://supersambo.wordpress.com/p=123
means duplicated urls. tries to identify different urls which lead to the same contents. This is done by searching for different patterns. for a vector of urls it returns a vector of ids of the same length. if two urls get the same id the function identified them as leading to the same content. This is working quite well but has to be checked manually. It is also very slow when it comes to tens of thousands of urls.
tries to identify overview page in a vector of urls based on certain patterns
simply counts occurence of words in a string and returns a data frame.
a function for igraph objects which matches nodes attributes with edgelist column. The function returns a data.frame or a new igraph object with the corresponding additional edge attributes
this function computes a bridgeness score for graphs with a given community partition. It's based on what Tamás Nepusz one of the creators of igraph suggested here. It takes: *g igraph object *att the name of the vertex attribute containing the community membership *mode the neigbhorhood mode c("in","out","all"), which is directly passed to igraphs neighboorhood function *calc the type of calculation to be done c("membscore","expent"), where membscore (for membership score) is supposed to be the first option you suggested and expent (for expontiated entropy of the memebership score vector) which hopefully implements your second suggestion
it returns a vector of the length length(V(g)) with the respective bridgeness score for each vertex.
converts an unweighted directed graph into an weighted undirected graph by calculating bibliographic coupling or cocitation for each pair of nodes. This is similar to a onemode projection of a bipartite graph, but in this case the input graph is onemode as well. Input is given int the form of an edgelist and a nodelist just like this is used for igraphs graph.data.frame() function.
writer is just a shortcut for writing dataframes to csv. It just takes a dataframe and saves it by its own name. It saves to exports/ if available (that's my normal workflow) and if not directly to the working directory. opts are row.names=FALSE, sep=";"
##most was originally made to return the element of a vector with the most occurrencies. It's now a little bit more complex and lets you adress all elements in a sorted dataframe. The default output is the element ("which") with the most (nr=1) occurrencies. but its also possible the adress the elements name, its number of occurencies and the fraction of the whole vector most(x,nr=1, c("which", "freq", "perc"))
##maintext is basically an interface to Full-text Rss service from FiveFilters.org installed on a private server. In this case the function is used to retrieve the main text of pages with articles on it. Its not intended to work as rss-feed creator. It takes an *inputURL: url of which the text is to be retrieved *host: baseURL of the specific installation. *parsed: if result should be a parsed hmtl object or a character string
*the required apikey should be saved in a text file and the location of the latter hast do be specified it returns Title, Text, Author, Date and the Hyperlinks found in the text
##geo_function is an interface to yahoos placemaker api.
##panoramio_function is an interface to the panoramio API. Allows to get photos of places by their respective longitude and latitude (min, max). Returns a data.frame with urls an other info. Api is very simple allows 100 000 calls a day.
##colorbright_function lets you change brightness of a hexacode color by a provided factor (higher than 1=brighter. Smaller than 1=darker). Note that this function is not very sophisticated. It manipulates RGB Integers and controls so they do not pass 255.