ISP Tech Talk

USENET NEWS SERVERS - THE SIMPLE VERSION

In the beginning there was mail. And simplicity was upon the face of the UUCP network. And then the net.wizards created mailing lists. And they were good. And useful. But many sysadmins thought that it was wasteful to store copies of the same message in lots of user mailboxes, so the net.wizards created Usenet news - tens, then hundreds, then thousands, then tens of thousands of discussion groups that magically propagate from box to box, being fruitful filling the partitions of the disk. And the net.wizards saw that it was good. Until May of 1997, when 50 percent of the news volume became either spam or cancels of spam.

SOME EXPLANATION

News was originally transmitted in "batches," along with e-mail, hopping from box to box using the UUCP protocol. Most of the boxes were UNIX machines of various sorts, running at universities or larger companies, and the UUCP transfers took place late at night when the calls were cheapest. In those days, "Net Access" meant "access to Usenet news and to e-mail" and propagation cross-country and back could take 3 to 5 days (or more). UUCP is still used as a "transport" mechanism for news (and for e-mail), but most of the news and e-mail traffic has long since migrated to that global TCP/IP network called "the Internet." You may have heard of it. . .

DEFINITIONS

"News Transfer" is the process of moving the actual news articles around (articles that have already been "injected" into the Usenet news network). This is now usually done via the NNTP (Network News Transfer Protocol), which runs on top of TCP/IP.

"News Reading" is the process of querying a machine's stored database of news articles and groups - and also of "posting" news. News posting refers to the original place that a given article is injected into the news system.

When an article is posted (not transferred), it is given a globally-unique message ID which identifies the article as it passes from system to system.

HOW NEWS IS PASSED AROUND

Since the beginning of Usenet, the idea was to avoid having one or more key central sites, without which the system would fall apart. So the system was designed for minimal intelligence and maximal redundancy.

In general, every news server peers with at least one other news server, and automatically offers any article received to all of its news peers (except the peer it heard the article from). So if you have one news peer, you'll offer back only articles locally originated - but if you have two news peers, you'll offer to each peer your own local articles and those articles learned from the other peer.

And if you have 10 news peers and one of them is much faster than another, you'll offer hundreds of thousands of articles a day to each of the other eight or nine peers. Actually, I'm lying a bit, here. You do have the ability to restrict which articles you send to which peers - you don't have to offer everyone a "full feed."

NEWS PEERING - WHO'S THE CUSTOMER?

With the Usenet system, it's hard to tell who's the downstream or customer end of a news peering session (vs. who's the upstream or provider end). Everyone peers with everyone else and may the fastest box win.

NEWS STORAGE

The actual news articles are stored in the UNIX file system, and each newsgroup has a directory. So alt.binaries is the /news/alt/binaries directory, and article 5 in alt.binaries is found at /news/alt/binaries/5 . alt.binaries.really.really.sticky is found at /news/alt/binaries/really/really/.sticky, etc...

Each system has a history database which keeps track of the news articles that have already been seen by the news system. These articles may currently exist on-disk; they may be older articles that at one point were on-disk but are now expired; or they may be articles that were not in a newsgroup carried by the system.

Even if an article is not to be stored and kept around on a given news server, its message ID should be noted in the history database, so you waste the bandwidth and CPU time to retrieve the article again and then have to make the same "not interested" determination. A point about news: Yes, it's true. Unless you tell your news peers "don't send me alt.some.new.group.that.someone.created," you must first get an article that has that group listed before you can decide to toss it. This means that you can waste tons and tons of bandwidth and CPU time getting binaries articles that you're never going to store.

NEWS SYSTEM ARCHITECTURE

There's a single process, the inn "daemon" - called innd - that lives off in the background and handles all news-connection requests and all of the news feeding tasks. Any host wanting to talk to the innd and transfer news has to be listed in the hosts.nntp file. Any other host is handed off by innd to an nnrpd (Network News Reading Protocol Daemon, a subset of NNTP). If that host is listed in the nnrp.access file, the nnrpd will talk to it and let it read news - otherwise it'll deny it access. Each nnrpd handles only one news reader at a time, while the innd process handles many (potentially hundreds) of simultaneous news transfer sessions.

THE MOST IMPORTANT FILE

The most important file in the news system is the "active file." This is a list of every newsgroup the system will carry; the minimum and maximum article numbers currently on-disk in that newsgroup; and whether or not the group is moderated.

The active file is maintained by the innd process. You use the "ctlinnd" program to tell the innd process to add or delete groups. As news is posted and transferred in, the innd process updates its in-memory idea of the maximum article number for each group. innd writes the active file out to disk every N minutes (N is a tunable parameter).

The nnrpds (nowadays) all share a read-only copy of the active file - which is good, since it's usually at least a few hundred kilobytes and often a megabyte or two. The size of the active file is one of the major reasons to not throw in thousands of unused extra newsgroups (i.e. "We have all 45,000 newsgroups out there!"). Before the "shared-active patch" (which is now not a patch but is built-in to the inn distribution), each nnrpd loaded and refreshed its own copy of the active file, which was a huge waste of memory!

NEWS.DAILY: THE OVERNIGHT "THING"

Currently, inn requires an overnight cleanup session to purge old news from the news store, and to process logs and clean up some of the databases. The script news.daily, usually run by the cron daemon, takes care of this. For most of the time that news.daily runs, the news system is still available to handle new article posts, but you should expect 15 to 45 minutes of news-server unavailability overnight (unless you modify inn) as news.daily finishes.

Basically, news.daily's job is to run expire and then re-update the databases. Expire looks at the time stamps of the entries in the history databases, and figures out which articles (out of the hundreds of thousands or millions you'll probably have on-disk) need to be deleted - and then goes through the process of removing them. Once that's done, the overview indexes in each directory are rebuilt, and then the server pauses to renumber. This is the period where you'll have to modify the inn system if you want to be able to accept posts 24x7. The renumbering process involves looking at each news directory (potentially tens of thousands of them) and updating the active file's notion of the minimum and maximum article numbers for each group.

As mentioned, logs are processed; rotated; and compressed - and the summary report(s) are mailed to the news administrator(s) of the system.

NEWS FEEDING: NON-STREAMING

The original NNTP protocol had each peer say to a news peer: "IHAVE " - or, in English - "I have this article, do you want it/already have it?" In response, the other news server would say "435," which means "already got it" or "335," which means "no, don't have it, send it to me." Then, if the response was 435, the offering server will send the article text, and the "receiver" will send back "235," which means "got it OK."

But there's a problem with using that protocol when your latency (the round-trip time to send data from site A to site B and back to site A) isn't very low - at least, with today's new loads.

Suppose you're trying to send six articles per second. Let's do the math. If you assume that transferring each article takes only as much time as the inter-machine latency (not a good assumption, but an excellent simplification), we have: 1 second / 12 = 83ms. Twelve is the number of round-trip communications (each article will have a IHAVE/335 round-trip and a /235.)

Of course, it usually takes longer than simply the round-trip latency to transfer an article - especially if the article is a few hundred kilobytes in size.

Anyway, it's apparent that if the latency goes above 83ms between the two ends of a news peering session, a full feed isn't possible.

The situation is even worse over saturated 56K and T-1 links, and satellite and trans-oceanic links, where 500ms and up is common.

NEWS FEEDING: STREAMING

We'll skip the implementation details for now, but the "streaming extensions to NNTP" are commonly used. Basically, a message is sent saying "Here are 10 message IDs. Which do you want?." The responder gives back a list of message-ids to send, and the sender sends them all. Though the same amount of data (roughly) is sent, there are fewer "latency" delays.

DESIGNING INN-BASED NEWS SERVERS: THE PROBLEM

Well, what's so hard about designing a news server? Disks. You need disks. Lots of disks. Yes, you need a fairly powerful machine. Something like a Sun Sparc 10 with a 60 to 80 MHz CPU; or a P120 or greater running some sort of BSD or Linux; or an Alpha with a bit of cache RAM (the Multias wont do); and on and on. For most architectures 128 MB of RAM will be enough to support 5 to 30 news readers simultaneously, but more memory never hurts and memory is cheap.

But about disks, also called "spindles," the problem is that each article that comes in causes a write to:

the history file (/usr/local/news/history)
a news log file (/var/log/news/...)
the article on disk (/news/alt/binaries/sticky/545679)
the .overview file (/news/alt/binaries/sticky/.overview)

And a full news feed of 600,000 articles now means that you have to keep up with 6 articles per second - and peaks of maybe 20-30 articles per second to keep up. Even if you take 200,000 articles per day, you still have to write two articles per second and do the bookkeeping associated with that, every second.

Then, overnight, you have to search for a full day's load of articles and expire it!

Additionally, you have to support the nnrpds, which want to retrieve articles and .overview files (most of the I/O done by nnrpds is now .overview lookups).

DESIGNING INN-BASED NEWS SERVERS: THE ANSWER

A fairly ideal news disk layout is:

A system disk, with the OS software. (~1-2 GB)
A swap disk. (~1-2 GB)
A disk with the history database. (~1-2 GB)
A disk with news logs (/var/log/news) (~1-2 GB)
2 or 3 "spool" disks with non-binaries group storage (2 GB each)
2 or 3 "spool" disks with binaries group storage (4 GB each)
2 or 3 disks with overview storage (2 GB each)

In reality, few can afford that many disks. So what you do is make trade-offs. The most common trade-off is to put the .overview files in with the news spool disks. This should be fine until you start getting more than 50 or so simultaneous news readers (nnrpds) running. Often, there is no separate swap disk, which is acceptable if you have 256 MB or so of RAM. Under no circumstances, though, should /var/log/news be on the same disk with the history database - and neither should be on a news spool disk.

EXPIRE TIMES

With the above configuration, you should be able to hold a week or so of non-binaries groups and two to three days of binaries groups - binaries groups are any groups with "binaries" or "sex" in the title - depending on how many of the binaries groups you accept.

SIGNS OF A SLOW SERVER

NNTP is a text protocol. This means that you can just Telnet to port 119 on a news server; type NNTP commands; and see the same responses that a news reading or transferring peer would see.

One simple way to know that your innd is very overloaded is to test the time it takes to Telnet to port 119 on it; get a welcome banner; type "QUIT"; and have the connection close. If any of these operations takes much more than half of a second, your news box is getting overloaded. If it takes many seconds, it's seriously overloaded.

This is a test of the "select loop" - how fast innd can come around and service each request.

SIGNS OF A SLOW SERVER: FIX 1

Usually the bottleneck is disk I/O. If the innd is waiting for disks to spin so it can deposit articles or log data, it'll fall behind and not be able to deal with other requests (like requests for new connections and even simple requests to quit). You can use the "iostat -D 1" command under most UNIX OS flavors to see this. If the percent use is near 100 for many seconds, you've got overloaded disks.

SIGNS OF A SLOW SERVER: FIX 2

If you're running an older version of innd (say, before 1.5.1), sites that stream to you can slow down the innd "select" loop. The problem is that innd would sit and read many articles from a given peer before coming along to service the next peer waiting for innd's attention.

The fix was to make each read() from a remote site only read a maximum of 2K or 4K or so before going back to the select loop to service other peers. Normally this is a bad thing - increasing the number of system calls (read() and select() are system calls) increases operating system overhead on the machine, but since innd isn't (yet) multi-threaded, there was no choice other than to disable streaming from remote sites. The fix is called the "streaming patch," and you should apply it if your select loop(s) are slow but there

doesn't appear to be any disk I/O bottleneck.

A QUICK NOTE/FIX ABOUT CONTROL.CANCEL

Stock innd will store all of the 100,000+ cancel messages in one huge directory. If allowed to accumulate, this can cause the expire times to balloon by many hours. Reading from a UNIX directory with tens or hundreds of thousands of entries takes forever! These "cancel control messages" are stored in the spool directory for the phantom group control.cancel.

The answer is to run a process every few minutes to wipe out any files (including .overview) in your control.cancel directory. (Depending on where you put your news spool, this might be called /var/spool/news/control/cancel).

You can add a UNIX crontab entry (the "crontab -e news" will edit the news user's crontab file on many UNIX flavors) to do this. The line looks like:

0,10,20,30,40,50 * * * * (cd /var/spool/news/control/cancel ; rm *)

This makes sure that there will only be a couple of thousand of entries in the control.cancel directory when expire runs.

ONE THOUGHT: OUTSOURCE NEWS

Running a news server is like having a baby. It's more expensive than you could ever initially imagine, both in terms of equipment and especially in terms of man-hours. There are companies that will let you point your users at their news servers. These "news-reading" companies include zippo.com, supernews.com (the oldest of the bunch), as well as new-comers like newsread.com (owned by the author) and ispnews.com .

Things to look for in a news-reading provider are:

Connectivity - How well-connected are you to their news servers? The number of hops is the commonly used metric, but it's not really accurate. What you should look for is a 60 to 80 ms average ping-time, and low packet loss. Low packet loss is more important than average ping-time, and certainly both are more important than hop-count.
Completeness of feed - that they get enough news that users won't be complaining about missed articles.
Number of groups - this may be an advertising issue for you. Some news providers, though, don't have 40,000 groups online - they have 10,000 to 15,000 and will add groups on request.
Quality of emergency response/tech support. Send them mail at 3 a.m. and see how quickly they respond.
Almost all news-reading providers provide a free trial period, so take advantage of it and get your "news- hog" users to test the services out for you.

Copyright 1998 Mecklermedia Corporation.
All Rights Reserved. Legal Notices.
About Mecklermedia Corp.

Colorado Offices
13949 W Colfax Ave Suite 250, Golden, CO 80401
Voice: 303-235-9510; Fax: 303-235-9502

Fable Of Contents