In a recent round up of techniques and tools for doing more precise blog research on Technorati to be published in the December Information Advisor, I had a chance to ask CEO David Sifrey several questions about the latest enhancements on his site, as well as a few broader questions about Technorati.
There wasn’t room in the article for everything that Sifrey told me, so I thought Intelligent Agent readers might appreciate some of his comments that could not make it in to the piece, so here they are below:
Sifrey Talks About Technorati and Trends in CGM Monitoring
In addition to clarifying some of the new Technorati features, we spoke with David Sifrey about larger trends in blog searching, and some of Technorati’s future plans. Here is an edited transcript of our talk:
Q. Some people have told me that they don’t rely on Technorati because it misses a lot of bloggers’ postings—that they simply don’t show up. Can you comment on that?
A. Let me address this concern by making several points:
1. We’re still pretty young and we’re not bug free. Sometimes we screw up, but we try to fix it.
2. We are growing as a company, and we’re not a subscription service. We’re also getting more support queries—questions like why is my blog not being indexed and so forth—and it can be hard to deal with this success. So I extend my apologies for not being as responsive as we should be. We are working on coming up with tools to let people fix their own problems.
3. Now and then, the blog post providers themselves, like Feedburner, Typepad, and Blogger, may have their own issues. It’s not always our fault.
4. Finally, we recognized early on when we developed Technorati that the ability to eliminate spam would be a tremendous differentiator from other services. We want to be the most comprehensive index and provide relevant, spam-free results. But this presented us with an internal conflict. We had to make a choice—we could be a little more lenient on spam detection to make sure we did not reject legitimate postings, but this would mean more spam; or we could be really stringent in blocking spam, but that means we may get “false positives”. In other words, a legitimate blog can trigger our spam filter.
Q. There was a recent posting from one of Technorati’s editors that bloggers are more likely to have their posts indexed if they use the Atom standard vs. RSS. Is this right?
A. Yes, there is a small difference. RSS is fine, but for technical reasons we do recommend that you have an Atom feed as well, to help us distinguish if you are sending an excerpt or the full content of your blog post.
Q. Where are we in terms of indexing non-English blogs?
A. Look at my latest state of the blogosphere report: English now represents just over 1/3 of blogs—it’s no longer the majority. Japanese and Chinese have become really big. Now we are trying to improve our localization of search and discovery for other languages. We have a relationship with a team in Japan for technorati.jp; and we’re working on Romance language localization too. Searchers can now even type Japanese or Chinese right into the box.
This was all pretty easy with Romance languages and when there is a space between words. But for Japanese and Chinese, the word-boundaries problem is a not a trivial issue.
Q. Where are we in terms of indexing words on podcasts and video?
A. We’re all about understanding the live web. Blogs were first big symptom of what occurs when people have enough bandwidth to be on the web all the time and when they have access to good tools so they can be creators as well as consumers. All these other media are part of the live Web too. Voice-to-text technologies are immature, though, so for this kind of more opaque content, we need to rely on our tags page. As the technologies mature to make audio and video posts transparent, we will aggressively pursue this, but it is not yet effective or reliable.
Technorati Tags: Technorati
Technorati Tags: David Sifrey