This is the IMDb contributor's newsletter, published every 6-8 weeks. To unsubscribe, send a message to data-news-unsubscribe@mlists.imdb.com. To subscribe, send a message to data-news-subscribe@mlists.imdb.com. You can also use the signup page at http://www.imdb.com/register/maillists . Back issues are available at http://www.imdb.com/Newsletter/ . Feedback on these articles or suggestions for new topics are welcome; contact dnews@imdb.com. The most interesting questions will be used in the next issue. Issue #5 In this issue ------------- - Forming titles - Character names - Locations - Writing credits - Cinematographer vs. director of photography - Processing cycle update - EPISODECORRECT-GUEST - Revised name filmographies - UNIX tool releases: 3.20, 3.21 Forming titles -------------- One continuing source of confusion to people contributing new titles is the way to format titles. We have several precise rules (and some that are admittedly a bit less precise). The basic rules for forming a title: TV series and mini-series should be enclosed in quotation marks; anything other than a TV series or movie needs a description added to the end: (TV) for a made-for-TV movie; (V) for direct-to-video; (VG) for a video game; (mini) for a miniseries. Please note that quotation marks can only be used for TV series and mini-series; there's no such thing as a "video series", for example. Other factors such as whether the title is a documentary, short film, etc. should not be included on the title (unless they appear in the title on screen). That brings us to the next rule: The primary title is the title as it appears on screen, and if it differs between the beginning and end of the film, then it's the title at the beginning of the film. Oddities such as substituting numbers for letters (e.g., Se7en) and intentional misspellings should be preserved. In English, a subtitle is set off from the main title with a colon (e.g., Lord of the Rings: The Fellowship of the Ring, The); as that example shows, any articles move to the very end of the title. The exception to the article rule: If the title is in a different language from the movie, as with Les Girls, El Cid, or La Bamba. Also, the French articles un/une/des and the Portuguese articles um/uma do not move. (In German, a subtitle is separated with a hyphen.) One exception to the "title as it appears on screen" rule: Author or filmmaker possessives such as Bram Stoker's Dracula or Disney's The Kid or Andy Warhol's Flesh are used only in alternate titles with the attribute (complete title). This doesn't apply to working titles, like Woody Allen Fall Project 2000, since that's not a possessive in the same sense. The primary title is the original title of a movie in its original language. If the movie is a coproduction that uses several languages, pick the dominant language concerning dialogue, director, cast and principal crew. If no language is dominating the others pick any one. The title should be that used at the first public screening; a film can have a different title at film festivals from when it goes into general release, or be retitled on rereleases. It's also common for titles to be changed for television and video. Again, these should be treated as alternate titles with the appropriate attributes. Other commonly used titles, such as those on posters or reference books, should also be sent as alternate titles. Accents should appear as they are on screen, except that accents omitted over upper-case letters should be restored. Please note that accents should be limited to those in the ISO-8859-1 character set; this means that some accents from languages such as Turkish will have to be omitted. Languages using non-Roman alphabets should be transliterated; for Japanese, use Hepburn romanization (Hyôjun-shiki). If the title is transliterated in the original release (films from India and Hong Kong often include English subtitles in the original), use the on-screen transliteration. The year used should be the year of first public exhibition of the final version, whether that was at a film festival, general release, television showing, or whatever. If that year is not known, approximations from other sources can be used, such as copyright date. If no good approximation is available, use ???? for the year. If there are two titles with the same title and year, they should be distinguished with Roman numerals (e.g., Hamlet (2000/I)). For this purpose, articles, punctuation, and title type are ignored; thus, Magicians, The (2000/I) (TV) and Magicians (2000/II). There are cases where the /I might be omitted, but it's best to leave those decisions to us. A TV special should be classified as a TV movie, but with the keyword tv-special. That keyword, as well as the Documentary and Short genres, causes special treatment of the title in some filmography listings. For our purposes, a mini-series runs at least 240 minutes excluding commercials; anything shorter is a TV movie, regardless of the number of parts it is divided into (though in some cases it might be a series, depending on the circumstances). Direct-to-video and TV movie are determined by the intent; for example, Theodore Rex (1995) went straight to video, but it was intended for and budgeted as a theatrical release. The rules for capitalization depend on the language of a title, and not what appears on screen, since the on-screen capitalization is often chosen for design reasons. In English, "book" rules apply: All words are capitalized except for articles and most prepositions and conjunctions of four or fewer letters; the first word, and the first word after a colon, period, exclamation point, or question mark are always capitalized (and the second word, if the first is an article). Exceptions are made, rarely, when needed for clarity -- for example, BUtterfield 8, where the first two letters represent a telephone exchange. Portuguese, Hebrew, and Indian languages use the same rules. In most other languages, only the first word (first two, if the first is an article), proper names, and the first word after certain punctuation as above are capitalized. German uses the usual German mixed rules. Admittedly, this seems like a lot of rules, but most of them are common sense; in reviewing over 350,000 titles, we've had to deal with a large number of unusual cases. Character Names --------------- We try to list character names as they appear in on-screen credits (i.e., the end titles cast listing). We make occasional exceptions when character names are not listed onscreen or when the character descriptions in the end titles include spoilers, but as a rule we try to stick to credits as closely as possible. If you don't know what the onscreen character name is or one isn't listed, here are some guidelines to help with character name contributions: 1. Keep it simple Please omit redundant information/irrelevant details: Ralph Fiennes' character in Red Dragon (2002) is called Francis Dolarhyde, and that's how he's listed in the credits. It's simply overkill to have him listed as "Francis Dolarhyde/The Tooth Fairy/The Red Dragon" even though those are factually correct descriptions. Names are usually enough and character names shouldn't be descriptive, unless absolutely necessary to identify the actor (i.e if a role doesn't have a name, someone may be identified as 'Man in van' or 'Woman with umbrella'). Avoid extra embellishments/repetitions/nicknames unless they are part of the credited character name: it's enough to list Robert Patrick as John Doggett in the "X-Files" TV series, instead of "Special Agent Jonathan Jay 'John' Doggett"; Jeri Ryan played Seven of Nine on "Star Trek: Voyager", not "Seven of Nine, Tertiary Adjunct of Unimatrix 01, aka Annika Hansen"; Ed Norton played Will Graham in Red Dragon (2002), not "William Graham" or "Special Agent Graham" or "FBI Special Agent William 'Will' Graham"; Matt LeBlanc plays Joey Tribbiani in "Friends", not "Joseph 'Joey' Francis Tribbiani". You get the idea. Whether that extra info is accurate or not doesn't matter. Robert Englund's character in the Nightmare on Elm Street films is known as Freddy Kruger, not Frederick Kruger or Frederick 'Freddy' Kruger, even though Freddy is probably the diminutive form for Frederick. 2. Character descriptions must be limited to the context of the film. Anthony Hopkins plays Hannibal Lecter in The Silence of the Lambs (1991); Brian Cox plays Hannibal Lecktor in Manhunter (1986). Yes, they're the same character but the spelling is different and we will stick to each film's peculiar version. Sigourney Weaver's character in Alien (1979) is called simply Ripley. The fact that her first name is Ellen is not disclosed/introduced until the sequel Aliens (1986): therefore her character name in Alien (1979) is Ripley, not Ellen Ripley. Including extra information that comes from other sources than the film is especially wrong: Nichelle Nichols plays "Uhura" in the TV series "Star Trek" and in the films. Even though, according to some Star Trek books and novelizations, her first name is Nyota, that name is not used in the films or TV series to the best of our knowledge. Even if the various Star Wars books and novelizations may include name, rank and serial numbers for every single Imperial Stormtrooper ever shown in the films, we'll still list them all simply as 'Stormtroopers' unless the onscreen credits have a different description. 3. No Spoilers Ian Hart plays Professor Quirrell in Harry Potter and the Sorcerer's Stone (2001). You're not supposed to know he's also Voldemort. Ian Holm plays Sir William Gull in From Hell (2001). His character name is not "Jack the Ripper". Those are both supposed to be surprises. If you haven't seen those two films, we just spoiled them for you. Sorry about that, but imagine how our users feel when they come to the site and see those character names before seeing the film. Even if factually correct, character names that constitute spoilers must be avoided at all costs. This is especially true for multiple character names that can be easily omitted: it's perfectly adequate to say that Cary Grant plays "Peter Joshua" in Charade (1963). There is no need to say that he plays "Peter Joshua/Alexander Dyle/Adam Canfield/Brian Cruikshank", even if that's true. 4. Language David Prowse plays Darth Vader in Star Wars (1977). Clint Eastwood plays Harry Callahan in Dirty Harry (1971). Even though the Italian releases of those films changed the names to Darth Fener and Harry Callaghan, we will stick to the character names used in the original version. 5. For TV series, use years when needed Cast changes are the rule on long running TV series. Unless an actor has been part of the cast for the entire run of a series, we try to include the time frame of his/her appearances on the series. For example, see the following character descriptions for "ER" (1994): Noah Wyle ... Dr. John Carter George Clooney ... Dr. Doug Ross (1994-1999) Paul McCrane ... Dr. Robert Romano (1997-) Noah Wyle has been a cast member on "ER" (1994) since the first episode and still appears in new episodes. His character name therefore doesn't need a year attribute. George Clooney was one of the original cast members on "ER" (1994) but left the series in 1999. Paul McCrane joined "ER" in 1997 and is still a cast member to this day. Note that these are part of the character name, and not separate attributes. Locations --------- When sending location information, bear in mind that it becomes part of a location tree (http://www.imdb.com/LocationTree), so it should make sense within that structure. In particular, the locations within a country should be treated consistently; for example, within the United States, the location must include a state, and (when possible) a city, but not the county/parish unless no more detailed location is available. Remember that each level of a location description is separated by a comma; multiple locations within, say, the same state should appear as separate location entries. In general, smaller countries (both by area and by number of films) should omit any political subdivision between the city and country. Major cities and locations should be given their English names (thus, Rome, Lazio, Italy, not Roma, Lazio, Italia); this also applies to major international airports, etc. Smaller towns and landmarks should use the local names. There is a long-range plan to allow proper use of local names everywhere with automatic translation, but this is still some time in the future. Los Angeles deserves a few words of its own, both because of the number of locations in the area and the complexity. The city of Los Angeles has several named neighborhoods that are actually part of the city; some of the better known ones include Hollywood, Venice, Van Nuys, and Encino. These are treated as divisions of the city (e.g., Hollywood, Los Angeles, California, USA). Named buildings are noted with their street address at the same level -- for example, Bradbury Building - 304 S. Broadway, Downtown, Los Angeles, California, USA. The new LOCATIONCORRECT keyword can be very convenient when cleaning up the locations in a given portion of the location tree. The usage: LOCATIONCORRECT wrong-location|correct-location| END There is no web form support for this keyword; it must be sent directly to the email interface (adds@imdb.com). Changing a location will change all subordinate locations; for example, LOCATIONCORRECT Rome, Italy|Rome, Lazio, Italy| will also change Tivoli, Rome, Italy. Several countries have already been cleaned up; before starting on a major cleanup project for a country, it's best to check in on the Contributors Help message board to see if there are others working on that country and to form a consensus on the proper subdivisions to use. Writing credits --------------- In the past, writing credits with no attributes were assumed to be "(screenplay)". This is no longer true; all writing attributes, including (screenplay), (teleplay), and (written by), should be included with writing credits. In addition, the "(also story)" form should no longer be used; instead, send separate (story) and (screenplay) credits. For example, where you might have sent: Doe, John|Title (2002)|(also novel) you should send Doe, John|Title (2002)|(novel) Doe, John|Title (2002)|(screenplay) If you are comfortable with sequence numbers, include them, but even without them, credits should be split as shown here. It's probably worth noting here that "(written by)" has a specific meaning, at least for titles covered by the Writers Guild of America (WGA). It means that the same writer(s) did essentially all the writing -- story and screenplay/teleplay -- and there is no adapted source material (novel, short story, article, etc.). Cinematographer vs. director of photography ------------------------------------------- In the past, the terms "cinematographer" and "director of photography" were used interchangeably. While we still believe they are virtually identical, we are now permitting "(director of photography)" as an attribute in the cinematographer list if that is how the on-screen credit reads. Cinematographers should still be sent with no attributes. Processing cycle update ----------------------- Since the last newsletter, we have continued reducing our cycle times. This has been most visible on the guest appearance list, where data is now processed every other day. Many other lists are still being processed on a weekly cycle, but with a cycle that isn't necessarily tied to the Thursday-to-Wednesday cycle for names and titles. We have determined that the processing of alternate names is best handled on a monthly cycle. New title approval has taken great strides recently. We have made it possible for several staff members to help with title approval; that, combined with new tools, has greatly reduced our backlog. In addition, many of the people who contributed new titles still in backlog have received mail messages informing them of what additional information will speed approval of their titles. Various groups of titles have been identified for speedier approval; some of these groups, such as titles from the USA or UK with valid release dates, no longer have backlogs. EPISODECORRECT-GUEST -------------------- One of the improvements we made to processing of guest appearances is a new keyword, EPISODECORRECT-GUEST, that makes it much easier to clean up the episode lists for a given title. In conjunction with some of our contributors, we have already cleaned up the data for a number of popular series. Where the data for a series was fairly complete (and the series is no longer in production), we have removed data that lacked episode information. This should serve as incentive to re-contribute it with complete information. Revised name filmographies -------------------------- We've recently improved the name filmography pages. Most notably, appearances as "Himself" or "Herself" have been moved into a separate category. The "self filmography" will eventually be a separate category; in the meantime, it includes appearances in Documentaries and tv-specials (as determined by genre and keyword entries, respectively), appearances marked (archive footage) with no character name, and appearances in any type of project as "Himself" or "Herself." We recognize this is imperfect (for example, some documentaries use re-enactment actors who are not playing themselves), which is why it's an interim approach; we feel the benefits are significant enough, particularly for well-known people with large "self" filmographies, to make it worthwhile. Titles that are still in production are also being flagged. This area should be expanding in the future, as we are now including more in-production data from our partners at the Hollywood Reporter. (Subscribers to IMDbPro will note expanded company contact information for such titles.) UNIX tool releases: 3.20, 3.21 ------------------------------ The moviedb package (a local UNIX version of the database) has again been updated to correct various capacity problems. Version 3.20 was released in late January; version 3.21 was released in late March, and is essential if you are using the current data files. It can be found at the usual FTP sites; see http://www.imdb.com/interfaces for details. Installation remains the same as for earlier releases. Note that if you are using the X Windows interface, xregal, it cannot be compiled with current releases of X. While the changes in this release do not require recompilation of xregal, some of the capacity problems will continue to occur if you do not. If you have a working binary of xregal, you can keep using it, but you will probably see an increasing number of crashes, particularly for name filmographies with long episode lists. Alas, the author of xregal has chosen to stop supporting it, so a newer version is not available. To rebuild: Extract the tar file into a directory named database. Assuming you already have a copy of the database files, from ./database/ : make compile make installbin cd imoviedb; make; make install # If you are able to build xregal: # cd ../xregal; make; make install cd .. make cleandbs make update-local ./etc/cgencompl -all # optional If it's not working for you, check the following things first: . Do you have enough disk space? . Are the source files for moviedb up to date? . Are all the binaries in database/bin/ and database/etc/ up to date? . Did you do *all* relevant steps above in the order listed? For further support, contact unix@imdb.com. Questions --------- Q: Can people in still photos be listed in cast credits? A: If, and only if, the person in a still photo is listed in the credits, they can be added to our credits list. If they are not, they cannot be. If an uncredited photo is notable, then it should be listed in trivia. Q: Do running times include commercials? A: Ideally, no; however, particularly for older programs, this may be the only data available. In this case, please add the attribute (including commercials). --------------------------------------------------------------------------- IMDb - Data Contributor's Newsletter - Issue 5 - THE END