" /> Marcel Neuhausler's World: September 2004 Archives

« August 2004 | Main | October 2004 »

September 30, 2004

Fun Google hack

Here’s a fun thing to play around with, if you’re bored at 3am. Most Sony digital cameras start saving photos with the following name “DSC00001.JPG” and a lot of people take these photos and upload them to the web, where the all-knowing, all-seeing Google later catalogs all of them. So by clicking this link you can see the first photo taken by someone with their new camera or newly formatted card for some cameras, this is what it looks like when thousands of Sony cameras lose their photo-virginity.

Update: Bonus fun, IMG_0001.JPG is the equivalent for Canon cameras. So, that will work too. Here’s CASIO, Pentax, Kodak and Nikon. And one of our readers points out “Using DSC as a search string in P2P programs also works great for turning up things on peoples hard drives they they didnt know they were sharing”.

[Engadget]

AIM Game Bot

A bot that allows you to play Infocom games via AIM.

Guide to Hack your DirectTiVo

Nice step by step guide to hack a DrictTiVo.

September 29, 2004

Sphinx-4, an open source speech recognition engine written in Java.

Carnegie-Mellon University has posted the first beta of Sphinx-4, an open source speech recognition engine written in pure Java.
[Cafe au Lait Java News and Resources]

Introducing Frontier 10.0a1.

Introducing Frontier 10.0a1. Open source.
[Hack the Planet]

September 28, 2004

Indexing Object Graphs with Lucene

When I posted on Lucene and OJB, Erik Hatcher commented:

... at first glance it doesn't look like you're doing hierarchical indexing. How are you handling that type of thing, if I've missed it?

And he is right, I wasn't there -- it was simply flat indexing. Never one to turn down a fun sounding problem (I am weak that way) I spent Sunday afternoon (while Joy was out looking for wedding stuff -- not enough time to hack when planning a wedding, when does the "Yes dear, that sounds good" phase kick in?) tossing together a tool do build indexes on arbitrary object graphs in a useful way. Had some success too! Here's a source xref of the results, with some junit code showing how to use it

The code basically builds an index on the graph allowing queries of the form beer.name: Schlitz to look for instances of beer with the name field being Schlitz. A more fun one would be cooler.beer.name: Shitz~ AND cooler.location: My House would hit on the documents indexing org.skife.lucene.graph.helper.Cooler which contain beer whose name is like "Shitz~" (ie, Schlitz) and whose location is "My House".

Setting it up is pretty easy to do:

    public void setUp() throws Exception {
        final SimpleNameMapper mapper = new SimpleNameMapper();

        indexer = new GraphIndexer(new MetadataFactory() {
            public Field[] build(final Object entity) {
                final Field name = Field.Text("name", mapper.build(entity));
                return new Field[]{name};
            }
        });
    }    

    public void testFuzzy() throws Exception {
        final Cooler cooler = new Cooler(1);
        cooler.stock(new Beer(2, "Schlitz"));
        cooler.stock(new Beer(3, "Caffreys"));
        cooler.stock(new Beer(4, "McEwan's"));
        final File index = indexer.index(cooler);
        final IndexReader reader = IndexReader.open(index);
        final IndexSearcher searcher = new IndexSearcher(reader);

        final Query query = parser.parse("cooler.beer.name: Schitz~");
        final Hits hits = searcher.search(query);

        assertEquals(1, hits.length());
    }

The GraphIndexer builds an index (or can add to an existing) via the final File index = indexer.index(cooler); call, returning the directory (on filesystem) where the index is stored. The above index will create the fields:

Field: cooler.beer.name
Field: beers
Field: cooler.beer.identity
Field: beer.name
Field: identity
Field: cooler.identity
Field: beer.identity
Field: cooler.beer
Field: name
Field: cooler.beers

And populate the correct fields onto the correct documents. You can query against the index on simple property names (name, identity, etc) or specify types (beer.name, cooler.identity) etc. This allows all the following to be useful (? ;-) queries:

cooler.identity: 7
cooler.beer.name: mcewans
name: schlitz
cozy.beer.name: Bob OR cozy.owner: Bob
identity: 1 OR 2 OR 3
beer.identity: 1 OR 2 OR 3
beer.name: Coors OR cooler.beer.name: Schlitz
beer
cooler

Also notice the MetadataFactory passed to the GraphIndexer. This is just a convenient way to add additional fields on a per instance basis. I use it here to add name field to every indexed instance where the value is the simple mapped class name (drop package and downcase). I also use the name field as the default search field, so you can do nice searches like beer quality: good and get back hits for all instances of good beer. In a real application I would add the information required to query for the entity, such as the class name and pk value (using OJB probably just the stringified Identity for the instance, as I can extract all of that from the identity =)

Right now it has a couple quirks. Being a sunday afternoon hack (and Joy getting back from looking for wedding stuff) there is one case which is not handled, which is downstream mapping from a cycle in the graph. This is a *very* small case though, and won't be difficult to add when I get a chance. The second gotcha is that if you have a really big interconnected graph (say several gigabytes) it will be interesting to index because as it is right now it needs to keep the full graph in memory while it indexes. I don't think this will be gotten around without implementing some kind of relationship-only caching, which will be almost as memory intensive as the whole graph -- at least in the same order of magnitude if with a smaller constant. The workaround is to index aggregates recognizing that between aggregates property chaining will be slightly off. For graphs which are fairly hierarchical (you can keep all cycles in the same agrgegate being indexed at once) this will all properly index.

Most of the behavior is configurable, from the manner in which it traverses the graph, to how it names root classes, to how it stringifies things, to filtering out instances. The defaults should be pretty reasonable (go read what they do) for most cases and playing around though.

Lastly, it has a couple dependencies. The first I am happy about, it uses the grafolia library I wrote for object graph manipulation for OJB 1.1. I pulled it out from OJB because I realized it was the type of code I had implemented many times before (arbitrary object graph traversal type stuff). Glad to see it fit naturally here. The other dependency is commons-beanutils because it is just so much more convenient than using java.beans. BeanUtils drags commons-logging along with it. Sorry.

Tarball is available if anyone wants to play. It is a pretty basic little maven project right now (javadocs, xrefs, etc posted), so should be easy to build. Remember, maven idea and maven eclipse are your friends ;-)

[Waste of Time]

September 27, 2004

MBX

MBX offers appliances with interesting branding options.

September 20, 2004

Desktop Transporter

Desktop Transporter 1.1 - Remote desktop controller with auto discovery.
[MacUpdate - Mac OS X]

September 17, 2004

Metasploit

The Metasploit Framework is an advanced open-source platform for developing, testing, and using exploit code. This project initially started off as a portable network game and has evolved into a powerful tool for penetration testing, exploit development, and vulnerability research.

September 16, 2004

A9

"The web is easy to use, but using it well is not easy. We are inventing new ways to take search one step farther and make it more effective. We provide a unique set of powerful features to find information, organize it, and remember it—all in one place. A9.com is a powerful search engine, using web search and image search results enhanced by Google, Search Inside the Book™ results from Amazon.com, reference results from GuruNet, movies results from IMDb, and more."

Digital Street Game

"Digital Street Game is a battle for turf, a contest of wills, a way to explore the city. Dominate the gameboard of NYC by staging and documenting stunts on the streets." .. fascinating :-)

September 14, 2004

ASM

ASM is a Java bytecode manipulation framework. It can be used to dynamically generate stub classes or other proxy classes, directly in binary form, or to dynamically modify classes at load time, i.e., just before they are loaded into the Java Virtual Machine.

Hacking on OS X

Looking for OS X hacking and security applications? Thanks goes out to Jeremy C. for the following link:

www.undergroundmac.com

As many of you know, I'm a big fan of Apple's OS X. Primary because it's based on FreeBSD, and from my experience rock solid. I wish the hardware was less expensive, but I don't see that changing anytime soon..

Mac fans, enjoy!

+krose
[kevin rose dot com]

September 13, 2004

Terrabrowser

Terrabrowser 1.5a5 - View topographical satellite maps.
[MacUpdate - Mac OS X]

September 08, 2004

Browsers that Aren't Browsers

These days, we no longer simply browse the Web as much as we mine it. You have your favorite browser for viewing pages, but Giles Turnbull thought you might enjoy learning about a few new-generation web tools, too.
[O'Reilly MacDevCenter.com]

September 03, 2004

Mac PVR 1.0

Mac PVR turns a Mac into a personal video recorder using a VCR (with a camcorder or DV bridge) and Apple's QuickTime Broadcaster.
[MacInTouch]

Tivo

How to enable web based viewing and remote control over your Tivo.

September 01, 2004

Software maker launches remote-access tools

Among new products from 3am Labs is a free tool for accessing a PC via any device with a browser.
[CNET News.com]