Kaushal Kurapati\’s blog

Thoughts on Search, Technology and Management

Books I read recently

Posted by kaushalkurapati on March 10, 2008

List of books I read during the past few months:

Posted in Books | 1 Comment »

Search Engine Market Share: January, February 2008

Posted by kaushalkurapati on March 10, 2008

Compete.com blog post on February 2008 Search engine market share is interesting. Google touches 70% market share. Yahoo! drops to ~16%. Microsoft, half of Yahoo!, at 8.4%. Ask.com is less than half of Microsoft, at 3.7%, and AOL is half of Ask.com at 1.8%. In terms of query volume growth, Google has grown queries by 50% y-o-y! (almost 4B –> 6B). Yahoo! did not have any query growth y-o-y (1.3B). Microsoft had 12% query growth. Ask.com had huge query growth (55%; 200M to 310M). AOL also showed a big 45% query growth y-o-y (100M -> 150M).

Comscore data for January 2008 paints a slightly different picture, although relative positions/trends hold up. Total # of searches conducted at ‘core’ engines in January 2008 was 10.5B.

  • Google: 58.5% (6.1B queries; matches roughly with Compete’s numbers)
  • Yahoo!: 22.2% (2.3B queries; this is vastly higher than Compete’s numbers at 1.3B – big discrepancy. Must be in the way various services count things)
  • Microsoft: 9.8% (1B queries)
  • AOL network: 4.9% (510M queries)
  • ASK network: 4.5% (475M queries)

Other notable sites with major query volumes include eBay (460M/month), craigslist (250M), Amazon (160M), MySpace (375M), and Facebook (100M).

Posted in Search, Stats | Leave a Comment »

Search Engine Market Share: June 2007

Posted by kaushalkurapati on August 10, 2007

comScore released June 2007 Search engine market share stats few weeks ago. Google dropped a touch month/month to 49.5%. Yahoo also dropped 1% point to 25%. Microsoft notched up an impressive 3% points to 13%. Ask.com held steady at 5% and AOL dropped further to 4%. Microsoft’s gain was purely because they were offering free games to people who searched on MSN Live. Its doubtful that these queries represent ‘real’ traffic; those searches would probably not monetize well anyway, not that MSN is looking for that incremental revenue of course. So with traditional measurement metrics while it appears like MSN has gained, its not due to any improvements in the site or the engine itself.

  • Total searches in the month: 8B (up 6% month/month and 26% year/year).
  • Google handled 4B searches (50%)
  • Yahoo handled 2B searches (25%)
  • Microsoft handled 1.1B searches (13%)
  • Ask network handled 400M searches (5%) – breaching 400M for first time
  • AOL handled 340M searches (4%)

Compete reports slightly different numbers for Google & Ask: For July 2007 Google comes in at 66% – 4.8B queries; Yahoo at 20% – 1.44B queries; Microsoft at 10% – 744M queries; Ask.com at 3.3% – 244M queries. Compete must be including AOL under the Google column, which is fair. Not sure what else is counted under Google to make it 66% (counting Google.com only & AOL makes it 50 + 5 = 55%). Also # of queries on the Ask network may not be counted appropriately in this data set: the discrepancy b/w 244M queries reported here and 400M reported by comScore is huge.

Posted in Data, Search, Stats | 2 Comments »

Structure of graphs and networks – part 4: scale-free networks

Posted by kaushalkurapati on June 23, 2007

Till now we have seen (part1, part 2, part 3) that random networks can typically be captured by a bell curve. Many nodes have similar ‘scale’ and there is a characteristic peak. Few nodes in the network have many links or few links. In contrast, Barabasi’s group discovered that certain networks do not show peaks and there is no characteristic node or scale in the network. His group started to describe such networks as “scale-free”. They are represented mathematically in terms of a power-law. Most nodes have few links and the network is held together by a few highly connected ‘hubs’. To quote from the book,

The random network theory of Erdos and Renyi and its cluster-friendly extension by Watts and Strogatz both insisted that the number of nodes with k links should decreases exponentially–a much faster decay than that predicted by a power law. They both told us, in rigorous mathematical terms, that hubs do not exist.

Networks before the proposal of scale-free model by Barabasi’s group were modeled as static and randomly connected, which even though could model a small world, did not model the Web or spread of AIDS or computer viruses, etc. Real world networks grow, and there is preferential attachment (rich get richer). Incorporating these two conditions into a network model helps capture the scale-free model represented by the Web, for instance.

Scale-free networks can also be small-world, just like the random + clustered model proposed by Watts & Strogatz; meaning in few hops one can get from any node to another node. The difference in these networks comes in how they break down and how information spreads through these networks. In random networks, taking many nodes out won’t affect reachability in most of the network until a large % of the network nodes are removed. In scale-free networks as well, if nodes are removed randomly then the network holds up, which is great news for the stability of a network like the Internet. However, if key hubs are removed in the scale-free network, you will immediately have islands in the network thereby crippling reachability across the network. Consider the spread of information or virus or a disease: if the web of intimate links in a human society were to be random, something like AIDS may not have spread like it did. Unfortunately, that intimate links graph apparently is a scale-free hubs-type network and so once a hub gets a disease, it spreads rapidly to many-many links in the network. The silver lining here could be that curing only the hubs, which may be cost effective, can drastically reduce the spread of the disease. If it were a random network, we would have to cure all nodes in the network to stop the spread of a disease in the network. The challenge I would think is figuring out who the hubs are, especially in a disease-spread scenario. In the case of computers, it is relatively easy to figure that out given there is lot of knowledge about the network.

Posted in Graph Theory, Math | Leave a Comment »

Search Engine Market Share: May 2007

Posted by kaushalkurapati on June 23, 2007

comScore released May 2007 search engine market share stats: Google crosses 50% market share–as per comScore–for first time. Everyone else decline marginally or stay flat.

  • Total searches in May 2007: 7.6B (up 11% y/y and 4% over previous month)
  • Google: 50.7% share (3.9B queries)
  • Yahoo: 26.4% share (2B queries)
  • MSN: 10.3% share (782M queries)
  • Ask: 5% share (384M queries)
  • AOL: 4.6% share (348M queries)

Nielsen figures have been different. Trends in Google hold up though. Ask and AOL swap places in Nielsen figures. Per Nielsen Google has 56% share, Yahoo 22%, MSN 8%, AOL 5%, Ask 2% (excluding some network searches I think).

According to HitWise, Google accounted for 65% of all US searches in a 4-week period in May 2007. Yahoo stood at 21%, MSN 8.4%, and Ask 4%.

Yahoo & MSN stats are quite close in Hitwise and Nielsen measurements (21% and 8% respectively) — their comScore numbers are not far either (26% and 10%). Ask is also in the 4-5% range per Hitwise and comScore (the 2% reported by Nielsen is due to exclusion of Ask’s network searches I believe). Only Google numbers vary a lot across various measurement systems. Differences could be due to sampling methodology, and the way searches / queries are counted, etc.

Posted in Search, Stats | Leave a Comment »

Mid-2007: India a Trillion dollar economy

Posted by kaushalkurapati on May 31, 2007

As per Govt. of India GDP stats and current Rupee/$ value of 40.7, India is a Trillion Dollar economy now! Remember the date – May 2007. 12 nations belong to this club now.

So what would it take to double the GDP from here on? If India grows at 9% per annum rate, it would take 8 years — 2015. I would think that once an economy hits $1T, it kicks into higher gear and things would accelerate further. So the next trillion may happen sooner — 2013-2014, say, assuming a growth of 10-11%. Crippling infrastructure (power shortage, clean water, roads, ports) could be our only brake. Agriculture: Although agriculture’s share of gdp is declining, majority of the country engages in it and so agri’s performance is key to driving up consumer-demand, which clearly is a big chunk of the GDP.

Posted in Economy, India | 1 Comment »

Bill & Steve together on stage – one for the history books

Posted by kaushalkurapati on May 31, 2007

Steve Jobs & Bill Gates were interviewed together on stage by Walt Mossberg at the “D” conference outside San Diego. This is one for the history books given the huge contributions each of them had on the technology industry. Great advice from the tech leaders regarding what makes people successful — one word “PASSION“.

“8:40 p.m.: Q.: Advice for the upcoming entrepreneur?
Gates: The idea of being at the forefront and increasing in size has been one of our greatest challenges. Our business is really about the passion.
Jobs: If you don’t love it, you’re going to fail. You’ve got to love it and you’ve got to have passion. And you’ve got to be a great talent scout, you can only build a great organization around great people.”

Being successful is all about having “passion” for what you do. This is the essence really. I have seen this in my own work; when I am passionate about what I do and really love it, thats when my best creativity comes forward. When I don’t like what I do, I don’t even come close.

Posted in Innovation, Management, Technology | Leave a Comment »

Structure of graphs and networks – part 3

Posted by kaushalkurapati on May 28, 2007

Part 1 of this series looked at Erdos and Renyi’s Random model of networks. Part 2 of the series looked at Six degrees of separation as per Milgram’s experiment. We continue here with Granovetter’s strength of weak ties and Watts & Strogatz’ clustered world approach.

Strength of Weak Ties

Mark Granovetter identified a critical element in modeling real world networks, called the strength of weak ties. In this model, we have very close ties to few friends, forming a complete graph — implying all our friends are friends of one another too (strong ties). Some members of our close-friend-circle have acquaintance relationships (weak ties) with others, who in turn have their friend circles. So the entire human network graph is connected that has lumps of close friends /strong ties, who are joined to other lumps with weak ties.

These weak ties are what help us find jobs apparently–at least better than our strong ties. The weak ties lead us to new worlds and new opportunities that we ourselves do not know of or our strongly-tied friends are not aware of. Our close friend circle is presumably aware of similar opportunities as we do…so it is unlikely to open new doors.

Contrast this with the random model of Erdos and Renyi — in that model any two arbitrary nodes are just as likely to be connected as our close friends are! That seems quite unlikely given what we know of our world. Granovetter says that social networks are not random and that our close friends form a near complete graph (strong ties) with a high clustering coefficient and we are tied to acquaintances through weak ties. 

Duncan Watts & Steven Strogatz proposed a model where people are envisioned to live on a circle. We are closed to the nodes next to us and also the ones one step away from the immediate neighbors. This network offers a highly clustered world model–like Granovetter imagined–but is also a large world model. It would take several steps to reach a node that is diametrically opposite to a node on the circle. Watts & Strogatz went on to add few random links between distant nodes on the circle. This suddenly shrunk the distance between diametrically opposite nodes and their next neighbors. Importantly a few such long-distance links are enough to reduce the overall average separation between nodes. This model then accomodates the six-degrees world view as well. Few nodes / people have distant links to people living far-off and thereby become bridges/connectors reducing hopping distance.

According to the book “this [Watts & Strogatz] model offered an elegant compromise between the completely random world of Erdos and Renyi, which is a small world but hostile to circles of friends, and a regular lattice, which displays high clustering but in which nodes are far from each other.”

Posted in Graph Theory, Math | Leave a Comment »

Structure of Graphs and Networks – part 2

Posted by kaushalkurapati on May 28, 2007

We looked at Random network model in Part 1  of this series.

 Six degrees of Separation

In 1967 Stanley Milgram, a Harvard professor, ran an interesting experiment. He chose two people in Boston as the targets. He sent letters to randomly chosen people in the midwest (Omaha, Wichita). He asked these people to send letters to these targets if they knew them directly; if not, they should send letters to their personal acquaintances, who they thought may know these targets directly. Apparently 42 of the 160 letters he sent made it back to the targets! (26%). The median number of intermediaries required to reach the target was 5.5. Rounding it up to 6 gives the famous “any two people are separated by six degrees of separation“.

This model of the world says that we live in a small world. So any two nodes in a large human network can, on average, be reached via 6 links. This does not imply reaching a node 6 links away is easy…because at each node you would have to know which out-bound link to pursue to get to your target and without that knowledge, the search quickly becomes exponential and impossible to navigate to the target.

Posted in Graph Theory, Math | Leave a Comment »

Structure of Graphs and Networks – part 1

Posted by kaushalkurapati on May 28, 2007

I have been reading “LINKED” by Albert-Laszlo Barabasi. Through a series of blog posts I will summarize my understandings of the various concepts regarding the structure of graphs and networks.

Random Network model

Paul Erdos and Alfred Renyi, both great mathematicians, assumed that complex networks are essentially random. Start with a large set of unconnected nodes and begin adding links randomly between nodes. After a while, most of the nodes will be connected and each node will approximately have the same number of links. There may be some outliers–far more links than most or far fewer links than most–but in general most of the nodes will end up with approximately same number of links. This is like a Poisson distribution.

Imagine a cocktail party with a large number of guests. You incentivize your guests to pass on a secret by introducing it to a node or few nodes. The guests have to make acquaintances to pass on the message. Will the message reach everyone? Almost all is the answer. If you plot a histogram of how many of the guests had 1, 2, …N acquaintances, the distribution will turn out to be Poisson! This is as per the random network model of Erdos and Renyi. A majority of the guests would have made the same # of acquaintances and on either sides of the peak the distribution diminishes rapidly indicating that extreme variations are very rare.

It is worthwhile to note that Erdos and Renyi did not intend to model real-world phenomenon like web-page distributions, cell-phone distributions, etc., with their random network model. They were purely interested in the mechanics of graphs.

To summarize the random network model then, it states that the average is the norm. Most people have same number of acquaintances and very few people know tons of others and very few people are compleltely isolated. This model does not answer how real world networks indeed look like. Other models were derived to explain that.

Posted in Graph Theory, Math | 2 Comments »