June 5, 2008

Web Crawlers and HTTP Referer

Something that’s been annoying me for as long as I can remember with web crawlers, when a bot makes a request it doesn’t tell you where it got a URL from. The number of times I’ve seen a URL request made by a bot in logs which is just absurd or obviously wrong, then you think “I wonder where that bot got that URL from”, then you start getting paranoid that a link in a page somewhere is totally broken and maybe your visitors might hit that URL too. Or if you knew the URL it picked the link up from you could check it and maybe report an issue to the crawler’s developers that it’s doing something completely incorrectly.

What I’m asking begging for is that if you develop a web crawler/spider (whatever) of any kind could you please for the love of all that’s holy store the URL of the page which you first picked up that URL from and send it in the referer header when you index the linked page, so us lowly developer-types can check/fix/report any issues? What we’re all supposed to be working towards here is a useful resource - us with not broken links and you with complete/useful indexes of pages. Best of all because it would be actually telling the truth (just because you maybe didn’t index the page until a week later, doesn’t mean that the referer is invalid), it’s a perfectly valid and sensible thing to do.

The day that gets added to the major engines is the day everybody’s life gets made so much simpler. If you’re writing a new/open source crawler or search engine/whatever make sure you add this, it’ll be worth it’s weight in gold.

Comments Comments | Categories: Uncategorized | Author: streaky




March 21, 2008

Phorm, Privacy and CCTV

So for a while now a company named Phorm has been doing the rounds at various ISPs in the UK, including mine. They have been touting an advertising system that tracks user’s browsing behavior to figure out what would be the most relevant ads to display to people, because apparently relevant ads are what people click (the fact is nobody clicks ads no matter how relevant they are, but whatever).

This isn’t so much of a privacy issue, it’s a downright interception of public communications issue – the kind of thing that even the police need warrants for – so why are we beating around the bush with this? more…

Comments 2 Comments | Categories: RAWR! | Author: streaky




February 1, 2008

<3 Poker, Hate Agressive Players

So I was playing poker a few minutes ago.

Sat down at a fairly low limit table at PokerStars with $15, at the same time somebody else did. A few hands went past and this other guy had won like 5 out of the 6 hands by buying the pot, I was pretty annoyed - hate people that do that.

Next hand I get Pocket Aces, I check, then this guy bets 50c, I call, everybody else folds - there’s a K & Q on the table but I decide my pocket Aces are gonna win. Next card nothing much, I check and he hits it for $2, I know he has nothing because he’s obviously a fool - and how many hands can one person win in a row? Next card nothing much, still got pr aces he obviously thinks he’s gonna 1UP me with a pair of kings or something, I check again and he puts down another $2, I raise all in. He’s obviously pissed off but can’t stand it and folds. I didn’t show my cards which probably annoyed the guy even more.

The next hand I get like an A9 in the pocket, he calls to see the flop. 2 aces drop on the flop, I know I’ve won basically whatever happens. Sure you never know but this guy is a tool. He bets $2 after my check, which I call. He must think I lost the plot at this point.

Turn card comes with nothing useful, I check and this drives him nuts. He puts $4.50 on the table, which I raise, he sits there for a minute considering, he calls.

River? Nothing. I go all in with what I have left, he goes nuts and calls it (he kinda has to do because he’s now invested most of his stack into the pot). Show my trip Aces, he mucks, I’m now up over $15 after 5 minutes play.

Some online players are complete idiots. He didn’t buy back in.

Comments Comments | Categories: Poker | Author: streaky




December 28, 2007

Can’t think of a title..

Was just re-listening to a recent episode of A State Of Trance (Episode 327), Armin managed to remind me of one of my favourite tunes - played the Signum remix of Someone by Ascension. Never really understood why I like it so much, but I do, a real classic. Nice to see Signum not messing around with it too much but just doing enough to freshen it up.

Comments Comments | Categories: Music | Author: streaky




November 2, 2007

What the F**k are EA Trying to Pull Now?

A long time ago I wanted a copy of the BF2142 Beta before launch because I knew that if I didn’t get a copy and report the 9 million bugs that there was a 500% chance would be in it the whole thing would be totally messed up, I went looking and the only way I could get a copy was to subscribe to FilePlanet. For those that don’t know these FilePlanet guys are scrounging bastard rip-off merchants of the worst kind - they’ll happily take your cash for something that should be free.

Fast-forward to today, I want a copy of the Crysis MP Beta to ensure there’s no stupid show-stopper bugs in the damn thing when I want to buy it, but OFC just like 2142 - if you want it you need to subscribe to the scum at FilePlanet. It’ll cost you $6.95 per month, which okay isn’t that much - but considering you’re only really paying to get access to the Betas they somehow get their grubby mits on it’s frankly a big greasy money-grabbing rip off. F**k that. more…

Comments 2 Comments | Categories: RAWR! | Author: streaky




November 1, 2007

Million Dollar Homepage? No fuck off.

When I first saw the reports about the Million Dollar Homepage spam/scrounge-fest my first thoughts were “what retard is going to be stupid enough to be scammed by that?”. Turns out lots of people, and it’s the stupidest thing I ever heard. I don’t know when it was filled but last I saw it was pretty much empty. Now it’s chocked full and I fear for the human race.

Seriously when did people start giving money to fucking scroungers?

I just thought of a new idea, “The Billion Punches in the Face Home Page” - you give me a billion dollars and I’ll give you a shitty link on an even shittier web site then come to your house and give you a billion punches in the face for being a dumb bastard. I think it might work if this shit is anything to go by. *applies for a Swiss bank account*.

The thing that really irks me is that right in the middle of all the spam there’s a big link to The Times Online - reasonably classy newspaper (okay it’s News International toilet paper so not that classy, but still it’s reasonably well regarded - I guess NI haven’t quite figured out the whole internet thing yet, shit, they bought MySpace for more than the $5 it’s worth, Q.E.D.) falls down to common spammage. Rupert Murdoch needs to fire whomever came up with that dumb idea right now.

Bastards.

Comments 3 Comments | Categories: RAWR! | Author: streaky




September 23, 2007

New Hotness

So after roughly 2 years with a few stops and starts and about 5 rewrites I’m finally managing to get large chunks of Booguu into SVN. What a nightmare.

It’s tough to rip the whole thing apart and push it into a repo like that because I’ve been building it as I’ve been using it. Kind of presents a problem in that chunks of it are ‘private’ code and other chunks are what I need to have in SVN, there’s also code that I’ve written right into the core to do something I needed that I never intended for the actual product. That all makes a mess, believe me.

That said the layout of ‘B’, as it’s become known, is actually designed to be worked with that way – you can build onto it without disturbing the core of the code, which would be great if I’d actually used it that way in the past.

I also needed to build a default “Well done, you’re not a moron – you managed to copy the files onto your server” application for it, and a stupidly basic default template too.

But yeah, that’s done so I guess I must have finalised the API around there somewhere eh? Oh well.

Next stop – it needs a site, and a logo :/

Comments 2 Comments | Categories: Booguu, PHP, Projects | Author: streaky




September 22, 2007

Code Documentation is Bogus.

So here’s something that’s been annoying me ever since I started writing code - and of course reading other people’s – code and API documentation.

Basically every piece of code I’ve ever read is on one of two sides of the coin with documentation of code and APIs. You always find that they’re either stupidly over-documented or badly under-documented. Even after that does a given project even need documentation at all?

The way I look at code documentation is that if you need it, you’re probably writing bad code. It should always be easy to see what a certain piece of code is doing by following it through, which you’d have to do to see/need docs at that level anyway – if it’s not then rewrite the code and make it right.

The other side is API documentation – you always need that regardless of how your code is witten, but maybe it could be helped using sensible method names or sensible variable names. At various points a developer should be able to say “Well hey, this method is called save_cache(); so I know it’s job is to save cache” for example. Yes the API docs should explain this further if it makes sense to do so, to point out the very existence of the method and to expand on any arguments it takes (though a given functions arguments should be self-descriptive also), but beyond that only if absolutely necessary otherwise you’re going too far.

Some people will reply to this with a statement like “Hey, what about the maintenance developer, you’re just making it harder for them to understand what’s going on”. Well no, it’s a case of making the code more sensible and getting rid of potentially reams of documentation that makes their job harder not easier.

The idea in the end is that any code you write describes itself, rather than you needing to stuff it full of comments to explain the hack you just threw together. Seems like a sensible idea to me.

Comments 1 Comment | Categories: PHP, Projects | Author: streaky




August 8, 2007

Making SPAM Detection More Fun

Spam is a tricky area - a pretty large amount of spam flows through the internet these days, and spammers are using semi-clever tricks to find ways through spam detection like using single word messages, okay, it’s not genius but it works.

My latest (and hopefully last) attempt at a ‘final solution’ is a multi-pronged attack on spammers.

Can’t talk in too much detail about it (partly to help with people not finding ways around it - although that’s going to be seriously difficult because there’s manual filtering in there too, and partly because it isn’t finished yet) but it includes manual filtering and an IP whitelist/greylist/blacklist, karma filtering (user has posted non-spam messages before - or not), extended blocking and range-blocking for bad subnets.  The whole thing is automated to extend the already long-term blocking for bad users so when a known spammer sends another spam message their block is automatically extended for an even longer period of time.

All the way through the spammer has no idea this is happening, they aren’t told that they are in a list, and there’s no way of them finding out - they can’t check against a database. It looks like their message went through, and on some blogs that I control there will be no CAPTCHA, email validation or anything of that sort - all to help create a semi-spam trap (we want the spam bots to get caught). Though not like an email spam trap though, these messages will be manually validated for first-time offenders and added to the database, after that the whole process is automated to make their lives living hell.

Some people might think they can just push hundreds of thousands of messages through in one go and hope that it will be an overwhelming workload, but I’ve tested this with old data and I can blacklist about 500,000 comments in about 3 minutes, thanks to a custom UI I built for the task (you basically skim through what the system thinks is spam to check everything looks in order and click a buttom to blacklist what you don’t mark as false-positive).

Another prong of the system is handling what happens with other stuff that gets through, the system can if the blog requests it retroactively remove posts from spammers and also remove posts that are marked retroactively as spam (like if I spot something that is or isn’t spam after the system marks it incorrectly). The system also has grades of spam, like if it looks like a known spam message from a known spammer a blog system could just delete it on the spot (on an opt-in basis - the system has no control over what a system actually does with the information the service sends back to the blog application) and be rest assured that it is spam and should probably just delete it without even mentioning it.

Another part of the system is understanding that an English blog is exactly that - an English blog. This presents problems initialy that the system will only be valid for use with English blogs - so the system can check a dictionary to ensure the comment isn’t just a bunch of random letters in an attempt to confuse spam filters.

One of the other main parts  is the inclusion in the checking code of unit testing style performance checking, so the system needs to archive pretty much everything and new rules can be checked to ensure they don’t make the system worse at detecting spam and that it can be used to check the point at which something is marked as spam - it actually goes through and checks every message posted within say 6 months (Why not over forever? Because spam methods change, and the act of running the system will change how spammers operate) and decides what spam score defines what a spam message is, and it can automatically remove filters that don’t help or make spam detection worse, and add them back in later if it needs to.

It’s seriously complicated stuff but hopefully it works out. The service will be free for single personal blogs to join and there will be some pricing for bulk blogs and corporate blogs (and possibly personal blogs with huge comment traffic depending how it works out), it won’t be expensive though.

Comments Comments | Categories: PHP, Projects, spam | Author: streaky




June 20, 2007

So Gatecrasher Burned Down

I don’t know what to think of that. I used to go there quite allot, some of my best memories are there, it feels like when you move out of the house you grew up in or something.

It’s not like when Gatecrasher @ Republic turned into Gatecrasher One – which incidentally I had high hopes for – then I saw the sorts of events of events they were putting on. That felt like somebody killed my dog.

Some people are laying flowers on Saturday. It’s almost excessive but I can really see it, Gatecrasher had such a massive effect on people’s lives. This is the thing with trance people that you don’t get anywhere – the camaraderie and the instant acceptance of anybody that wants to join in, and the love for the place and each other – there’s really nothing like it. It’s actually a Trance/Progressive thing, but you saw it more than anywhere else at Gatecrasher, it was almost a family thing.

Gatecrasher-that-was people were music connoisseurs, DJ’s tried out new stuff, Gatecrasher @ Republic was the front line. I hope, nay, I know the DJ’s loved it as much as we did. The friendship you felt with the people that went, the DJ’s, even the staff was so amazing, it’s not normal to be in a club where the people that work there love what they do so much, it’s usually students who need cash and could care less.

Then that ended in 2003 or whenever it was. I mean seriously, I find the very idea of Tim Westwood @ Gatecrasher One pretty.. Insulting? .. Obscure?

No, we should have done the flower thing back then, looking at the Facebook discussion on the subject, at least one person agrees ;)

There’s a thread on Facebook about people’s best Gatecrasher memories and I posted mine.

It’s basically this:

Saturday November 4th 2000 – was basically Bonfire Night because the 5th was a Sunday. We’d been to local clubs the 2 nights before, so we were pretty pumped up and I was ready to do some serious bouncing. 8 of us had got a van-taxi-thing to do the ~40 mile drive, we were having a good laugh, chatting, listening to the music provided by us to the driver, and watching the fireworks.

We’d actually pre-ordered our tickets to get to the side entrance where the ‘shorter’ queues are, but my mate who ordered them hadn’t received them and had been told that they would be with the ticket people there. Well, they weren’t – lesson learned, don’t use Ticketmaster.

So we had to go in the ‘normal’ queue, not too much of a problem except to say that we had to pay twice.

The first thing you noticed in Gatecrasher back then was all the Gatecrasher logos in silver hanging from the ceiling, looked very cool, then a few seconds later the sheer volume of what is playing. IIRC at that point i think Gatecrasher had the most powerful permanent sound system in the world. Fibre channel everywhere, amps hanging from every strut, lol.

We moved round to the main dance floor in front of which Scott Bond was playing his set at this point, we danced for a bit.

By the time Matt Hardwick started his shift we were well into dancing and whatnot. I don’t think I was prepared for how amazing Matt’s set was, I mean it was really good. You might not understand the very concept but let me put it this way, Matt Hardwick was relatively unknown at this point, but you knew he was going on to bigger things.

Sander Klienenberg was up next, I’d never heard him play before, I think he may have topped Matt Hardwick with his set. About half way through that (which must have been 2am or something) I moved round and took up residence around the entrance on the balconies overlooking it. I’d probably noticed at this point that Cygnus X – Superstring had been playing allot. I don’t know if the DJ’s that played there made a conscious decision to make it the theme of the night, but in there it sounded absolutely amazing.

That was a tradition kept up by Judge Jules when he went on (which I didn’t even know had happened at the time – I was over the other end of the building near the toilets). Let me say that if you’ve not heard Judge Jules live around the time he was the #1 DJ in the world you missed out big time, he was simply sublime. By 4am I’d took up residence on the walkway that leads up to the little mini-club at the top there, which my memory is so bad I can’t remember it’s name. I needed space and had found it. Later I pushed down to the area in front of the cigarette machine, there was hardly anybody round there and acres of space. I don’t recall what JJ played to finish the night at 6am but it was something mainstream, commercial, actually allot of fun to hear.

We we’re dead and I got back to Lincoln when my dad was going to work “Where have you been?” .. “Just out”. Little did he know (little he could comprehend) the amazing night I’d had and more so the amazing people I’d met.

Happy days.

Tags: , ,

Comments Comments | Categories: Uncategorized | Author: streaky