# good reference: http://www.robotstxt.org/orig.html # not used by internal search User-agent: ocelli Disallow: / # Ocelli is an engineering index crawler - causes lots of errors, page crashes, etc. 2/3/10 User-agent: * # updated 7/20/05 by Vern # updated 7/14/06 by Janet # http://www.robotstxt.org/wc/norobots.html URL of Robots Info # our standard = only put in folders we don't want Google to index (i.e. images) # folders like nebraskaccess that we don't want users outside nebraska finding # folders like nebraskaacess, database, redirects that are just redirects # folders we don't want to advertise, but if the bad guys get in there it's fine # Mostly folders that would confuse outside users # We don't need to put in secure folders # blog: disallow category folders; all content, including comments is found in 2006, 2007, etc. folders Disallow: /blogs/nlc/books_reading/ # categories is ok, contains list for rss discovery Disallow: /blogs/nlc/education_training/ Disallow: /blogs/nlc/general/ Disallow: /blogs/nlc/information_resources/ Disallow: /blogs/nlc/library_management/ Disallow: /blogs/nlc/nebraskaccess/ Disallow: /blogs/nlc/now_hiring_your_library/ Disallow: /blogs/nlc/public_relations/ Disallow: /blogs/nlc/talking_book_braille_service_t/ Disallow: /blogs/nlc/technology/ Disallow: /blogs/nlc/uploads/ Disallow: /blogs/nlc/youth_services/ Disallow: /centennial/2000s/20002009/LibrariesStaffList.html Disallow: /comp/magnet/ Disallow: /Images/ #magnet was built to attract the bad guys, yes log this Disallow: /docs/pilot/scans/ Disallow: /docs/pilot/pubs/eofiles/ # these pdf's cannot be OCRed correctly per Bonnie 1.30.06 #Disallow: /docs/shippinglists/ # per Bonnie, no one needs to search these lists 1.20.06 # changed from /docs/shippingLists/microfiche/ 1/4/10 because URLs in old shipping lists are incorrect Disallow: /libdev/liblaws/lawarchives/ # our indexing doesn't index this either # can only get to it via a link from current liblaws page Disallow: /lists/commands.asp Disallow: /nebraskaaccess/ # searchable by our search engine but not from outside # (because only Nebraskans can use the databases) # Allana does not want the following indexed by Google Disallow: /nebraskaccess/ Disallow: /nebraskaccess/help/home/tutorials/ Disallow: /nebraskaccess/help/images/ Disallow: /nebraskaccess/help/k12/tutorials/ Disallow: /nebraskaccess/help/library/tutorials/ Disallow: /nebraskaccess/help/tutorials/ Disallow: /nebraskaccess/images/ Disallow: /nebraskaccess/journallists/images/ Disallow: /nebraskaccess/nacomments Disallow: /nebraskamemories/md/ # files that are used by the seperate member db users # not searchable by our search engine Disallow: /netserv/class/ Disallow: /netserv/learnatest/ Disallow: /netserv/netlibrary/ Disallow: /netserv/nsedbinfo/ # searchable by our search engine but not from outside # (because only Nebraskans can use the databases) Disallow: /netserv/overdrivemp3titles20080819.xls Disallow: /netserv/OverDriveTitles20080206.xls Disallow: /netserv/overdrivetitles20080618.xls Disallow: /netserv/training/onlinesessions/archives/ # javascripts, etc Disallow: /news/content/ # not searchable by our search engine either Disallow: /publications/lal/lalletters/ # not searchable by our search engine either Disallow: /ref/classes/ # not searchable by our search engine either Disallow: /ref/ill/ # not searchable by our search engine either Disallow: /ref/star/ Disallow: /reserve/ #for short term pages, links from announcements or calendar entries, and our bookclubs Disallow: /scripts/ # not searchable by our search engine either Disallow: /searches/ # not searchable by our search engine either Disallow: /statistics/downloads/ # not searchable by our search engine either Disallow: /libraries/collect2/ Disallow: /system/map/ # NLC search engine only does shallow here - no sub directories Disallow: /Staff_Archives_Email/ # this a virtual drive from h:\ Disallow: /Staff_Archives_Documents/ # this a virtual drive from h:\ #make certain some sub folders are browseable #inhouse/makingadifference, inhouse/notepads # added 3/30/08 to prevent Google from continuing to crawl (probably can remove after a few weeks) # changed from /wikis/futuresearch 8/23/08 because itsapic was swamping the FS wiki (a loop apparently) Disallow: /wikis/ User-agent: Browsershots Disallow: