Opinions | Iralis

En noviembre de 2006, IraLIS también hizo una consulta a la lista Web4Lib,
Web4lib@webjunction.org
enviando este mensaje:
Dear colleagues,
Since many years ago I've been searching online databases and always had to fight with the problem of authors names inconsistencies, both because of the database policy on the names format, or more frequently because the author signs differently each article.
I had imagined that "some day" the problem would disappear, but, on the contrary, I realize that it is growing because currently there are a lot more of new information resources.
In part, this problem is a matter of the database (or the resource) producers, but mainly it is a matter of the authors awareness.
In Spanish (and Portuguese) speaking countries, as you might know, people sign using more words than in English. It is frequent the use of a double first name and a double family name (father's family name + mother's family name). Some times even family names are compound. Example: Pérez Álvarez-Ossorio, José Ramón
In case of having to shorten their names, people give more importance to the father's name, except if this name is very common, in which case probably the person will want to use always the two names (in my example, Pérez is as much frequent as Smith in English). This has lead to a big problem when trying to standardize the names, because you find a lot of variations as normally the own individuals haven't decided a fix form yet. I'm the director of a Spanish LIS journal indexed since January 2006 in the database Social SCI (Thomson ISI), and I have realized how disappointed some authors are when they see themselves indexed at the SSCI database by their mother's family name (when American people see a three words name, they take the last one as the database entry).
An author like Imma Subirats Coll, who is the Co-ordinator of the E-LIS repository: http://eprints.rclis.org/
has that official name written on her passport and she is "Ms Subirats" for everybody who knows her. But if she signs an article with her full name, it will appear indexed in ISI as "Coll, IS".
Of course perhaps this is not important for everybody, but it is for academic people.
Therefore, an important format to take into account is the ISI one (or, in general, we could say "English format"): In order to avoid confusion people always should sign with two strings: one for the first name and one for the family name. In order to avoid inconsistency problems, the above Spaniard should choose one of these forms and keep it forever: Perez-Alvarez-Ossorio, Jose-Ramon or P.-Alvarez-Ossorio, Jose-Ramon or P.-Alvarez-Ossorio, Jose-R. I have removed the accents; I doubt if we should remove the dots as well. Well, this could be a "solution" for Spanish speaking countries –although it will be difficult to convince some people to follow this rule at the first try, because it implies creating "artificial family names". Please, what do you think about the authors names in other countries? Could you apply some similar rule? In English there are also some "problems" with people adding more words to their names. Thank you very much for you attention. I'd appreciate very much your comments on this. Tomas Baiget http://www.elprofesionaldelainformacion.com http://directorioexit.info/consulta.php?directorio=exit&campo=ID&texto=46

Estas fueron algunas de las respuestas recibidas:

1. Thomas Krichel

It is quite useless to want to impose rules on the consistency of name writing. A consistent rule for all languages would very hard for anyone to remember, and could therefore not be adequately implemented. You may want to have a look at the ACIS (Academic Contributor Information System) project: http://acis.openlib.org This builds software to run portals where authors can register, and maintain a name variations profile. Software is implemented for the RePEc digital library at the RePEc author service, see http://authors.repec.org
Thomas Krichel krichel openlib.org http://openlib.org/home/krichel RePEc:per:1965-06-05:thomas_krichel skype id: thomaskrichel

2. David Goodman

You might also want to look at the way Web of Science and Scopus do it. They have each managed to get most of the variations into clumps, and are continuing with the others.
It is good to note that, after they run their algorithms, they are using variations on the RePEc technique of hoping to reach the individual authors. Of course, they could have done this 5, or 25, years ago. We learn faster.

David Goodman, Ph.D., M.L.S. previously: Bibliographer and Research Librarian Princeton University Library dgoodman princeton.edu

3. Matthias Steffens

> Since many years ago I've been searching > online databases and always had to fight > with the problem of authors names > inconsistencies, both because of the database > policy on the names format, or more frequently > because the author signs differently each article.

Tomas, I agree that this is a major issue and it's very cumbersome to deal with these sort of problems.

As a scientist and programmer of a bibliographic web application, I must admit that it's completely elusive to me why the big publishers (and/or libraries) have not pushed the development of a unique and universal author ID system until yet –si similar to what has been achieved with DOIs.(*)

Personally I think that the lack of unique and universally supported author IDs is one of the biggest obstacles when working with academic bibliographies and literature databases. A truly unique author ID would solve all issues around author naming inconsistencies and would (IMHO) greatly ease the workflow of all involved parties. And it would be a big push towards a semantic web.

I could imagine that every author would be asked (i. e., required) to register for an ID (or enter an existing one) when submitting an article. These author IDs would be passed along with the bibliographic metadata similar to how it is currently done with DOI numbers. Such a system would not only substantially ease the process of submitting a paper, it would also allow the development of useful software tools. E.g, finding the CV of an author or the list of his recent publications would be a snap.

The OpenID () system comes to mind here, but AFAIK it's currently more targeted at authentication issues.

Best regards, Matthias (*) I've talked to Ed Pentz (CrossRef) some time ago and he said that they were thinking about "author DOIs". I desperately hope that such a system will eventually see the light. Matthias Steffens mat extracts.de http://www.extracts.de

4. Karen Coyle

Matthias Steffens wrote: > As a scientist and programmer of a bibliographic > web application, I must admit that it's completely > elusive to me why the big publishers (and/or > libraries) have not pushed the development of a > unique and universal author ID system until yet > –similar to what has been achieved with DOIs.(*)
An author ID would solve part of the problem –that is, it could make it explicit that author 1234567 wrote both article A and article P, at least in a situation where the author ID was attached to both articles. It doesn't solve the problem of moving from a human-readable (and often ambiguous) citation like "JS Smith" to the ability to retrieve articles by that precise author –who could be John Stevens Smith or Jane Smiley Smith, or any of a number of other individuals with that moniker.
Somehow we still have to get people from an individual non-unique name to an author. This is in part an issue of legacy data, which does not have unique IDs (and there's a lot of it). There's also an interesting issue of assigning the IDs: who does it, what guarantees that they get it right, etc.?
Although identifiers might be part of a solution, they rarely are the solution itself.
Karen Coyle / Digital Library Consultant kcoyle@kcoyle.net http://www.kcoyle.net ph.: 510-540-7596; fax: 510-848-3913 mo.: 510-435-8234

5. Edward F. Spodick

And do not forget that there may be more than one fully preferred / authorized version of an author's name, depending on what language or geographic region they are writing for. All of these need to be linked. And there are those who feel that a library catalog in one geographic region should not be forced to use the author name form which is preferred in another region, rather than in theirs. Consider an author writing under their Chinese name –in complex / traditional characters vs simplified characters, versus writing under the romanized form of their name (and the romanization system may have changed, e. g. from Wade-Giles to Pinyin), versus writing under the English form of their name (which may be different from the romanized version). If this interests you, consider checking out the work of K.T. Lam and Louisa Kwok at the links below. They are colleagues of mine.

"XML and global name access control" from OCLC Systems and Services, available at http://hdl.handle.net/1783.1/443 and "XML name access control metadata repository: an experiment at HKUST library" at http://hdl.handle.net/1783.1/668

and browse through the functional experimental "XML Name Access Control Repository" at HKUST Library at http://library.ust.hk/info/nac/

Edward F. Spodick, Information Technology Manager Hong Kong University of Science & Technology Library lbspodic@ust.hk Tel.:852-2358-6743; fax: 852-2358-1043