September 12, 2007
Egon Willighagen has posted on the release of the latest InChI software. Egon (and others) are concerned about the implementation, especially that InChIKeys aren’t guaranteed unique. At a more basic level, I’m wondering whether people agree with the stated needs for InChIKey.
- Facilitate web searching
- Even though Google are coping with InChI very well, having a representation of InChI that didn’t break standard tokenization routines, and that could be attractively included in prose would be handy.
- Allow development of a web based lookup service
- Not really sure what’s meant here. As Egon pointed out in the comments to his post, he already has one of these, and it didn’t require InChIKey!
- Permit an InChI representation to be stored in fixed length fields make chemical structure database indexing easier
- Because RDBMSs have such a hard time indexing VARCHAR? Really?
- Allow verification of InChI strings after network transmission
- This is not a problem that needs solving again – using MD5SUMs would do the same job.
I make that one out of four and would argue that the only problem with InChI is the length of the identifiers and the issues caused by the characters used. This could be solved by having a centralized service that assigned short HTTP URLs for InChIs, ensuring a one to one relationship between InChIs and shorthand URLs.