Tag Archives: Aussie Blogs
Aussie Blogs .. revived
Posted on 09. Jan, 2005
The update tracker is back, I've found someone to take over the entire technical side. It'll run on my servers for the next few months until it can be moved. Thanks to everyone who sent thank you emails, sorry for the false alarm.
http://aussieblogs.org/...
Aussie Blogs Closed
Posted on 03. Jan, 2005
The Aussie Blogs update tracker and web ring are now closed permanently. Keeping the site running through solving technical issues, programming new updates to the detection routines and answering enquiry emails has been a time consuming process, and in 2005 I really feel my time could be better spent on other pursuits. I've thoroughly enjoyed running the site over the past 5 years, but since moving overseas my feeling of satisfaction from running the tracker has been steadily diminishing.
Thanks to the hundreds of sites that have linked to the tracker and web ring over the years, and the thousands of Australian blogs who've benefited from the more than half-million combined referrals. Thanks also to the volunteer editors in particular Victor and Wade. And to Benn for his design contributions.
The web ring is available to anyone who would like to take ownership, however you must have a track record of at least 3 years blogging to prove you're going to keep it and running in the long term. You must also be over 21 years of age. This site (the update tracker) is not available.
The update tracker database is available in OPML format by request should you wish to build your own Australian Weblogs update tracker. The code behind the web site and detection bots is NOT available. The 15 Australian blogging specific domain names that point to this site are for sale.
http://aussieblogs.org/...
Aussie Blogs RSS Feeds
Posted on 16. Dec, 2004
I whipped up two quick beta RSS feeds from the Aussie Blogs Update Tracker this morning before work. I've been meaning to do it for ages, but well you know. There's an updates and an aggregator feed. The updates feed contains a list of the last 6 hours of updated Australian weblogs, and the aggregator feed contains the last 100 posts appearing on Australian weblogs. The aggregator feed is particularly interesting as you can basically read all of Australia's weblog posts in one aggregated feed. The aggregator probably needs a bit of work with full versus excerpt posts and support for Atom feeds, but it's a start. Feedback? Anyone?
http://aussieblogs.org/...
Aussie Blogs - approaching 5,000 monitored blogs!
Posted on 10. Dec, 2004
Since pointing the automated Aussie Blogs FinderBots at several geographically specific blog directories including Blogger.com, Blogwise, Rice Bowl Journals and Brave Journal, the database is now at almost 4,600 sites. We should also easily hit 5,000 blogs and half-a-million click-thrus in next few months, currently at 361,073 clicks. Again, while my enthusiasm for Aussie Blogs has dropped significantly after all these years of the running the site, it has been fun improving the detection bots and programming new finder bots. The dozen or so bots that drive Aussie Blogs are getting quite clever now, particularly when it comes to geographic classification. Aussie Blogs has to be one the best and most accurate single-counrty listings of blogs available. Given unlimited spare time I'd like to open source the bots and convert the site from Lotus Domino to PHP/MySQL. But, alas, no one has stepped up to help out with coding after 5 years of asking, and I have no time. If you're a proficient PHP developer and want to compile your own country specific weblog monitor, let me know, maybe we can work together on redeveloping the site.
This weekend will be the last lot of work on Aussie Blogs for quite a while. Plans:
- Set up secondary bot servers and add failover wrappers to all bots. The bots will check-in with the web server every few minutes to see who should be running, if the primary bots haven't checked in for a while, the secondary bot servers will take over. This should help ensure there are always results showing on the front page.
- Upgrade Informa RSS library to support Atom 0.3 feed parsing on the Recent Posts page.
- Various tweaks including: updating the CheckerBot excluded URL list and tweaking work file and database cleanup scripts.
- Update site help page.
http://aussieblogs.org/...
More Domains!
Posted on 10. Dec, 2004
Given the dropping USD, I thought it was good time to protect a few more Australian specific blogging domains from ad spammers. The following now point to the appropriate pages on Aussie Blogs: sydneyblogs.com, sydneyblogs.net, perthblogs.com, perthblogs.net, canberrablogs.com, australianblogs.net, australianblogs.org and brisbaneblogs.net. I now hold 22 domain names, 15 of which point to Aussie Blogs.
http://aussieblogs.org/...
Aussie Blogs Codefest 2004
Posted on 06. Dec, 2004
Spent the weekend coding on Aussie Blogs, except when I was getting my
ears raped on Saturday night at the Prodigy concert. I fixed lots of niggling bugs and also added
some cool features. Summary of the more interesting stuff:
- Blogger.com Australia plus major cities directories are now searched by a new bot daily for new aussie blogs and automatically added to the database.
- The entire Blogwise Australia directory is now checked for aussie blogs weekly by finder bot, not just the first page as before.
- The blo.gs listener bot is out of testing and now running permanently watching for updated aussie blogs that ping blo.gs. The bot is running on my London server.
- Improved the alternate URL matching logic to improve the chances of matching a pinging blog with the aussieblogs database.
- Added source stats for the last 48 hours to the sidebar so we can monitor where updates are coming from.
- Re-enabled the CheckerBot for non-pinging blogs. I'm in the process of switching to threading to handle where bad blogs can sometime block further execution.
- Added a awaiting review statistic to the right sidebar to show how many blog edits or additions to the aussieblogs database are waiting for human editors to check (currently 222 awaiting review!).
- The connection handling, polling frequency and efficiency of the weblogs.com and blogger.com bots has been improved.
- Migrated the aussieblogs site (and my personal site) to the new server at the MCI data centre in Sydney (there was some downtime while data was migrated and dns entries changed).
- Fixed a thread pooling bug on the new server stopping some features from working on the site.
http://aussieblogs.org/...
Aussie Blogs - belated renewed enthusiasm kicks in
Posted on 03. Dec, 2004
Looks like things aren't as grim as I had thought for aussieblogs. Weblogs.com seems to be alive again, and better still I discovered their shortChanges.xml file which shows the last 5 minutes of updates and is much nicer to poll for than there often-flakey-minimum-3-polls-per-hour-2000-entry changes.xml beast. Using shortChanges.xml also reduces the impact of losing a whole chunk of updates due to badly formed XML which has been a problem of late with weblogs.com So good old weblogs.com may have some life left in it yet.
It seems my worries about blo.gs stopping their changes.xml interface were a bit unfounded. I just implemented their alternative, and it is fantastic! Rather than downloading and parsing their changes.xml file every 30 mins, I set up a listener connected to ping.blo.gs:9999 and they stream new updates live down to me in zipped XML! I used their simple PHP 5.0 based stream_socket_listener example script and set it to kick off a special version of my Java checking bot (BloDotGsBot) each time a new update is detected by blo.gs. BloDotGsBot accepts the URL and RSS-URL as arguments and runs a fast comparison against the aussieblogs database, if it finds a match it runs off and does all the normal bot tasks that the other update tracker bots (visit site, parse meta-data, download last post, post update). It's great to just sit there and watch the stream listener kick off all these little BloDotGsBots as bloggers all over the world ping blo.gs hoping that one of those blogs will match the aussieblogs database. This approach means blogs will appear within 10 minutes of making a post. Of course, if too many people blog at the same time around the world and ping blo.gs, my bot server might run out of memory ;) .. DoS by blogging! More work required in this area obviously.
On to Technorati's Attention API, I had to change my Attention experiment a little as no matter how hard I try I can't get the Technorati API to recognise my username and MD5 hashed password! It just says bad username or password. I've emailed, but alas, no response :( .. I assume the API is just broken for me. My new bot (AttentionBot) posts a sites.opml file to Technorati which rewrites the outline elements with extra attributes including lastUpdate and sends it back to me. From there I parse this list and look for updated blogs within the last hour and process them accordingly (visit site, parse meta-data, download last post, post update). It seems to work well, although with my test data I found that Technorati was disappointingly lagging. For example it showed my weblog hadn't been updated for 18 days! I don't know how good this will be long term as a source for hourly updates, but it's the best around with the most financial backing so hopefully it'll turn out to be a good place for harvesting updates.
Anyway, as you can see, lots going on to improve update detection through a variety of methods despite my general lack of interest in keeping the tracker running. And still no Java programmers willing to kick in and help :(
http://anthonyjhicks.com/...
This will be in the exam
Posted on 11. Jul, 2004
According to the referrers, presumably Aussie Blogs featured in the week 10 workshop notes for MDCM1000 New Media Technologies A at UNSW. Amusing.
http://www.student.unsw.edu.au/...
Aussie Blogs Finder Bot
Posted on 01. Apr, 2004
I've finished the final major automation task for Aussie Blogs, a brand new bot called FinderBot. This bot monitors several other geographically specific weblog listings, collects the links and compares them against the Aussie Blogs database. If a new link is found it goes into a queue for review and approval by the Aussie Blogs Editors. FinderBot is quite adaptive, not only can it parse raw links appearing in a page such as Adelaide Blogs, Melbourne Blogs and Blizg, it can also handle sites that list URLs behind redirects, either by parsing it as a parameter on the URL like on eatonweb.com or completely masked under a coded redirect used at Blogwise and Peppy's. FinderBot follows the redirect and gets the new location from the header, but doesn't actually have to visit the redirected page. This results in a fairly inexpensive acquisition of the real URL hidden behind the redirect. FinderBot also attempts geographic classification through where the URL was found with my other bots subsequently checking for geo meta-tags and completing geographic classification where possible. After unleashing FinderBot a few days ago it has collected some 300 unique not previously listed Aussie Blogs. If only Technorati had geographic classification, then I'd really be able to keep up with new Aussie Blogs.
http://anthonyjhicks.com/...
Adelaide Blogs Forced to Shut Down
Posted on 01. Apr, 2004
Wow. This is staggering. Just staggering. I hope this is just a great April Fools joke.
Update: James you bastard, nice one! I sweated over this for 2 hours waiting for a similar fate to befall my site. And it was still 31st March for me ;)
http://jamesr.net/...
Aussie Blogs Recent Posts
Posted on 29. Mar, 2004
Aside from many small bug fixes and tweaks I've made to the Aussie Blogs update checker bots over the past couple of weeks, I've launched a new feature: Recent Posts. This page shows an aggregated view of the latest post appearing on each updated Aussie blog if that blog has a parsable 0.9x, 1.0 or 2.0 RSS feed (atom support soon!). I actually programmed this feature two years ago, but never got around to finishing the parser. Thanks to the excellent Informa RSS Library for Java I had it up and running in a couple of hours! Give it ago, I think Recent Posts is great as it gives a useful view of the posts appearing on a variety of Australian weblogs. It fills the gap where Update Tracker doesn't really give enough information to help you decide whether you should actually visit a site appearing in the recently updated list -- it's hard to figure out if you're clicking on some teen angst site or that of an interesting knowledgeable commentator.
Anyway, the work now distributed between the three Aussie Blogs Bots is growing and they're getting much smarter at handling the variety of problems I've learnt you have to handle when mass monitoring and analysing over 1,500 sites on an hourly basis. What's new:
* ChangesBot as always, checks changes.xml files at Weblogs.com, Blogger and blo.gs every 10 to 30 minutes for updates. I upgraded the comparison search to use a multi-tree style search algorithm allowing for the thousands of comparisons that need to be performed with different combinations of URLs to complete in seconds rather than several mintues.
* As always, CheckerBot visits all sites not previously detected by ChangesBot and does a size comparison against a snapshot to figure out if a site has been updated. If a difference of 250 bytes is found, a change is registered on the Update Tracker. CheckBot is the bandwidth sucker. It checks over 500 urls every few hours for updates. Sometimes only resulting in only a 5% hit rate in finding updates. So I've added much improved error trapping and storage of HTTP status codes so that sites returing 40n, 50n errors are more quickly discarded from the update tracker. While I was doing this before, I was not handling it as well as I could have and in many cases error returning sites continued to be checked.
* Both the ChangesBot and CheckerBots now parse a variety of Geo and DC tags and resolve countries and regions against the ISO3166-1/2 code tables for better automated geographical classification of blogs. I'm also picking up the ICBM tag used by GeoURL for later use in determining location via lat/long where a specific Geo tag is not used.
* Both the ChangesBot and CheckerBots now visit the page/frames at the blog URL and collect all href links and the RSS feed (if available). Links are submitted to the Topic Tracker database, and the RSS feed is parsed for the last post submitted to the Recent Posts page.
Of interest to Java developers, Aussie Blogs uses:
JDK 1.4.2_04,
Jakarta Commons HTTP Client 2.0,
Jakarta Commons Logging 1.0.3,
Xerces 2.6.2,
Informa 0.5.0,
do.org HtmlStreamTokenizer 1.0,
Red Hat Linux 9 (bot server),
Red Hat Linux 7.3 (web/database server),
Domino 6.0.3
http://www.anthonyjhicks.com/...
Aussie Blogs Topic Tracker
Posted on 27. Mar, 2004
The Topic Tracker, my hack at a blogdex or Technorati style link popularity tracker, is back online. I put this together early last year to collect links appearing on Australian weblogs and apply some dark-magic ranking algorithms, providing an insight into what Aussie Bloggers are linking to. While sites like Blogdex and Technorati are far better at this than mine, it's hard to get a regional Australia only perspective. The Topic Tracker usually requires a couple of days of processing links before it settles down and shows good results, however already this morning several links to SMH and The Age etc. were detected and featured. Also when I originally put this together I never got around to programming a lookup of the link title and description, so I've put together a new bot (called TopicBot) which runs every hour updating the titles and descriptions of links found. I reckon the end result is pretty cool (and uniquie!), a timely view of what Australian bloggers are linking to. I've got several other things I've been working on in the background for Aussie Blogs too, which I'll release details of soon.
http://anthonyjhicks.com/...
Aussie Blogs Editors and Coders
Posted on 28. Jan, 2004
I've had three volunteer editors so far for the Aussie Blogs Update Tracker and Webring which is really cool. I'm still looking for a Domino or Java developer to help out with the coding though.
http://anthonyjhicks.com/...
Aussie Blogs down for a couple of days
Posted on 28. Jan, 2004
Aussie Blogs will be down for a few days. The firewall for the network that hosts the Aussie Blogs update tracker spiders died yesterday and needs to be rebuilt, unfortunately I do not have a redundant server in place and the firewall is in Sydney. To avoid problems like this in the future I plan to setup a redundant spider server on a different network, this should help ensure that the Update Tracker is always showing results.
http://aussieblogs.org/...
Aussie Blogs is running again after 4 months downtime
Posted on 05. Jan, 2004
I've got quite a few things I've been meaning to post about which I guess I'll get around to over the next few days. I've spent far too much time doing backend coding on my site over the past few weeks, and I'm still not quite there for XHTML 1.0 Transitional compliance, I'm close though! Getting to XHTML 1.1 will be a little way off yet.
I finally re-enabled the Aussie Blogs update tracker yesterday after 4 months of downtime. The tracker is now checking Blogspot and Weblogs.com pinging blogs every 10 minutes, and non-pinging blogs every 4 hours. I had to take it down in August/September due to exceeding my mates Tel$tra bandwidth allowance a few times and had to wait until he moved to a different provider. I also had a spambot run through and corrupt a number of the site records looking for form mail exploits due to my open Wiki style policy of allowing anyone to edit site information without needing a login. I now have an approval process where all edits go into a holding queue for me to review before allowing them to go live. I always knew a completely open no-login edit policy would bite me one day, luckily I had most of the site details backed up, however I still have 37 bad records. Bloody spambots.
I finally made up a FOAF file. FeedDemon is a pretty decent RSS aggregator, I may even pay for it.. ultimately though I still think I prefer a web-based server-side aggregator rather than having to install a piece of software to read my feeds on each of the PCs I use.
http://aussieblogs.org/...
Aussie Blogs is down indefinitely
Posted on 14. Oct, 2003
The update tracker consists of two servers, the Spider Server handles checking sites for updates every 10 minutes, and the Web Server hosts the web site and databases. The Spider Server is connected to a Telstra ADSL, and often exceeds the monthly download allowance meaning I have to switch it off towards the end of the month to avoid being overcharged by Tel$tra. Despite switching over to using changes.xml files for monitoring of a fair number of the sites, I still exceed the piss poor limits imposed by Tel$tra. I'm currently looking for alternative hosting for the Spider Server, however this is difficult. No web hosting providers will host spiders. Until I find new hosting the tracker is down indefinitely, and will display the last set of updated sites from weeks ago. Changing ADSL providers is not an option as it was running on a friend's ADSL. Running it in the UK is not an option as I only have my laptop over here with me. This is a real shame, however the best place to lay blame is the extraordinary over pricing of broadband in Australia, and Tel$tra's monopoly helping to keep Australia down ;)
http://anthonyjhicks.com/...
Aussie Blogs Topic Tracker disabled..
Posted on 07. Sep, 2003
No feedback, no interest.. so I've disabled the Topic Tracker as it was starting to chew quite a bit of download allowance on my mates ADSL. Interesting experiment, the quality of results was getting better, but its probably only marginally useful to a few people interested in monitoring the popularity of outgoing links appearing on Australian weblogs. Fairly niche target I guess, at least I have the code and database there if I ever want to renable it -- it's not something that can be turned on and off too easily though as once the database of profiled outgoing links gets stale, it takes a week or so of profiling blogs until the ranking algorithms smooth everything out and start returning fresh relevant outgoing links again.
Aussie Blogs Topic Tracker .. does anyone use it?
Posted on 06. Sep, 2003
I haven't had any feedback on the Aussie Blogs Topic Tracker. I'm not expecting a swamp of emails, but no feedback at all seems to indicate to me I may be wasting my time with it. I'm not sure whether people find it interesting, confusing or useless -- I guess I'm the only real user as I use it daily to see what other Australian bloggers are linking to, with the results generally dominated by links to articles on the SMH and The Age web sites. The problem is that it's a real download allowance chewer, so I may pull the plug on it altogether if I'm the only regular user. The database has now profiled 70,000 outgoing urls, and reliably indicates outgoing links appearing on Australian weblogs similar to the way Blogdex and Daypop work.
http://www.anthonyjhicks.com/...
Aussie Blogs Update Tracker is back!
Posted on 10. Aug, 2003
Thanks to Rob & Steve for rescuing the critical directory that held all the change files and to Rob for rebuilding the box. The Blogger and Weblogs.com changes.xml checkers are working fine and run several times an hour, however the checker for blogs that do not ping when updated has a memory leak. I'll sort it soon enough, until then it dies about 200 sites into a checking run. For those who use blog software that supports sending a ping to Weblogs.com, I highly recommend you enable pinging, updates to your blog will definitely be recognised by Aussie Blogs update tracker then.
http://anthonyjhicks.com/...
Aussie Blogs Update Tracker downtime
Posted on 28. Jul, 2003
Apologies for the extended downtime of the Aussie Blogs update tracker. I moved to London to live in mid-July and had to disconnect the ADSL line at my old apartment in Sydney used by the Update Tracker bots to check for updated sites. I moved the server that runs the bots to a friends house, however during the transit the hard drive must have been damaged and the machine would not boot. As I was leaving for London the following morning I didn't have time to rebuild the server running the bots. I have left this in the hands of a Linux friend who will hopefully get it done over the next couple of weeks. This is also made hard by the fact that I am still walking up to a Internet cafe to use the net in London as BT seem to be taking forever to indicate whether I can or cannot get ADSL at the new flat.



