Kaushal Kurapati\’s blog

Thoughts on Search, Technology and Management

Archive for June, 2007

Structure of graphs and networks – part 4: scale-free networks

Posted by kaushalkurapati on June 23, 2007

Till now we have seen (part1, part 2, part 3) that random networks can typically be captured by a bell curve. Many nodes have similar ‘scale’ and there is a characteristic peak. Few nodes in the network have many links or few links. In contrast, Barabasi’s group discovered that certain networks do not show peaks and there is no characteristic node or scale in the network. His group started to describe such networks as “scale-free”. They are represented mathematically in terms of a power-law. Most nodes have few links and the network is held together by a few highly connected ‘hubs’. To quote from the book,

The random network theory of Erdos and Renyi and its cluster-friendly extension by Watts and Strogatz both insisted that the number of nodes with k links should decreases exponentially–a much faster decay than that predicted by a power law. They both told us, in rigorous mathematical terms, that hubs do not exist.

Networks before the proposal of scale-free model by Barabasi’s group were modeled as static and randomly connected, which even though could model a small world, did not model the Web or spread of AIDS or computer viruses, etc. Real world networks grow, and there is preferential attachment (rich get richer). Incorporating these two conditions into a network model helps capture the scale-free model represented by the Web, for instance.

Scale-free networks can also be small-world, just like the random + clustered model proposed by Watts & Strogatz; meaning in few hops one can get from any node to another node. The difference in these networks comes in how they break down and how information spreads through these networks. In random networks, taking many nodes out won’t affect reachability in most of the network until a large % of the network nodes are removed. In scale-free networks as well, if nodes are removed randomly then the network holds up, which is great news for the stability of a network like the Internet. However, if key hubs are removed in the scale-free network, you will immediately have islands in the network thereby crippling reachability across the network. Consider the spread of information or virus or a disease: if the web of intimate links in a human society were to be random, something like AIDS may not have spread like it did. Unfortunately, that intimate links graph apparently is a scale-free hubs-type network and so once a hub gets a disease, it spreads rapidly to many-many links in the network. The silver lining here could be that curing only the hubs, which may be cost effective, can drastically reduce the spread of the disease. If it were a random network, we would have to cure all nodes in the network to stop the spread of a disease in the network. The challenge I would think is figuring out who the hubs are, especially in a disease-spread scenario. In the case of computers, it is relatively easy to figure that out given there is lot of knowledge about the network.

Posted in Graph Theory, Math | Leave a Comment »

Search Engine Market Share: May 2007

Posted by kaushalkurapati on June 23, 2007

comScore released May 2007 search engine market share stats: Google crosses 50% market share–as per comScore–for first time. Everyone else decline marginally or stay flat.

  • Total searches in May 2007: 7.6B (up 11% y/y and 4% over previous month)
  • Google: 50.7% share (3.9B queries)
  • Yahoo: 26.4% share (2B queries)
  • MSN: 10.3% share (782M queries)
  • Ask: 5% share (384M queries)
  • AOL: 4.6% share (348M queries)

Nielsen figures have been different. Trends in Google hold up though. Ask and AOL swap places in Nielsen figures. Per Nielsen Google has 56% share, Yahoo 22%, MSN 8%, AOL 5%, Ask 2% (excluding some network searches I think).

According to HitWise, Google accounted for 65% of all US searches in a 4-week period in May 2007. Yahoo stood at 21%, MSN 8.4%, and Ask 4%.

Yahoo & MSN stats are quite close in Hitwise and Nielsen measurements (21% and 8% respectively) — their comScore numbers are not far either (26% and 10%). Ask is also in the 4-5% range per Hitwise and comScore (the 2% reported by Nielsen is due to exclusion of Ask’s network searches I believe). Only Google numbers vary a lot across various measurement systems. Differences could be due to sampling methodology, and the way searches / queries are counted, etc.

Posted in Search, Stats | Leave a Comment »