If you read the documentation on the Sun website, you’ll notice that the “medium” configuration I’m currently testing has “8 cores”, which, if you remember my previous post on the T2000, you migh interpret as being capable of doing 8 different things at the same time. In the startup messages I’ve posted, you could see it presents itself to the operating system as 32 CPU’s, which you might interpret as being capable of doing 32 different things at the same time.
So which is it? 8? 32?
With these new ways of doing processing and having multiple CPU’s in the same chip, sharing some resources, it isn’t as black and white as it used to be back when each CPU in a computer was a seperate chip. So, let’s see if we can find out what the “real” number is.
Maarten asked me why I didn’t do a parallel build of some software, and that’s exactly what I used to see if I could find an answer. PHP is a Open Source computer language. This weblog uses it to generate the web page you’re reading. Compiling that software takes 3 minutes 5 seconds on my 3.2 GHZ Pentium 4. If use exactly the same build method on the T2000, it takes about 27 and a half minute. Sounds awful, but remember that this method would use only about 1/32 of the capacity of the Sun. Maarten’s question was why I wasn’t using the tools that would split the job in a lot of small things that could be done in parallel.
Part of the job can be done in parallel – PHP is a large collection of small source files written in de C language, and compiling each of those files can be done independently from all the others. The results must be combined into a few libraries and applications, and those steps cannot be done in parallel. A rough guess is that about half the work of building PHP cannot be done in parallel, and must therefore be done the “slow” way. This shows that building software is not the best thing to buy a T2000 for, but it allows me to test the machine in one way: how many things can be done in parallel? 8? 32? A number in between?
The GNU version of Make has a parameter that tells it how many jobs it attempts to do in parallel. As this Sun article says:
Your results will vary based on the particular compiler, options, and language being compiled, as well as whether the sources are local or remote. A common rule-of-thumb is to request the number of parallel jobs to be approximately 1.5 times the number of available CPUs on the machine.
I decided to do the reverse: build PHP repeatedly, with a different number of parallel jobs. Afterwards, look at what level the build was at best speed, and then divide by 1.5, and take that number as “the number of CPU’s” in the classic sense.
Here’s a screenshot of what a parallel build looks like:

As you can see, plenty of “cc1″ processes doing work, and a respectable load average. This screenshot was made when using 40 jobs, and there were quite a number of processes in “runnable” state according to vmstat – which means there were processes waiting for an available processor, which indicates that the “classic” number of CPU’s is somewhat less than 40 / 1.5.
So here’s the actual graph:

On the horizontal axis you’ll find the “-j” parameter I gave to make, the number of parallel jobs to use. The vertical axis is the number of seconds the build took. As you can see, with 1 job, or the non-parallel way to build, it took over 1600 seconds. The machine was mostly idle during this time. Increase the number of parallel things the machine is allowed to do, and performance increases rapidly. Optimal build times appear at around 24 jobs, things don’t get much faster by allocating more jobs, so the machine is pretty much using all available resources at that point.
Divide that by 1.5, and you get 16 CPU’s.
So here’s the dilemma Sun must have faced: if they’d told the operating system that this hardware had 8 CPU’s available, the operating system would have scheduled no more than 8 threads executing at the same time, and capacity would have been wasted. But how high should they go? It all depends on the workset that needs to be done, the kind of application that runs – and since not every application needs “just” CPU, but disk- and network access as well, it’s difficult to get the number exactly right for all tasks at hand. So they probably did tests similar to this simple one I did, and concluded that the safe number was to tell the operating system that 32 CPU’s are available. If the operating system actually has 32 threads that need work at the same time, some of them will be slown down a bit, but no capacity would have been wasted. It’s better to overstate the available capacity a bit and work up a run queue than lose capacity because the operating system thinks it hasn’t as many CPU’s available.
But don’t use the above timing numbers to compare it to the 3 minutes my Pentium 4 took – a large part of the build process cannot be done in parallel, and more realistic “benchmarks” will be done later… and you’ll probably find all kind of benchmarks on the web anyway – I wanted to take a slightly different look at what drives the performance of this fairly unique machine…

[Quote:]
Sometimes brotherhood means much more than sharing parents. Sometimes it means sharing hands.
When a young Fort Lewis soldier returned from Iraq paralyzed from the upper chest down, it was his teenage brother who assumed the role of roommate and primary caretaker.
They’ve learned what it means to feel completely dependent and what it means to feel completely responsible.





[Quote:]
Dell is refusing to give a refund to a customer who believes she was wrongly sold a server.
Kate was asked by her boss to contact Dell to buy two new desktop computers and “something to link them”. Dell sold her two PCs and a PowerEdge server, ironically with only one network card.
Kate realised her mistake because she spoke to a more technically-literate friend who told her she needed a cable to link two computers, not a server. Chris asked Dell to refund Kate because the server was effectively mis-sold. Her company has only two employees.
But Dell refused to give her a refund because the order was made through its business channel.
Dell is also one of the very few companies who insist on sending me weekly leaflets with advertising. They’re totally incapable of understanding the word “no”. Any sane company appreciates that you tell them you’ll never buy from them anyway and stop sending crap. Dell is not a sane company.
|
[Quote:]
It didn’t take long before the wheels started coming off Kathleen Troia “KT” McFarland’s campaign to unseat Sen. Hillary Rodham Clinton.
Things got off to a promising start in early March, with favorable publicity and several national TV interviews for the wealthy, 54-year-old McFarland, a Reagan-era Pentagon official who spent the last 20 years raising a family and has never held elective office.
But then McFarland was hit with embarrassing disclosures about her voting history, including her registering in two places and missing several elections. Records suggest she did not even vote in 1984, when Reagan, her boss and political hero, was seeking re-election, though she believes she didn’t miss that one.
[..]
“Build up, tear down. Build up, tear down. There’s no one who can survive this political process intact,” McFarland said during an interview in her Park Avenue apartment.
“What it means is that real people don’t run for public office anymore,” she added. “Manufactured people run for public office.”
She’s right, of course. And all others will be swift-boated.
Out of interest, is your Pentium a hyperthreaded one too?
yes, it is. However, in FreeBSD 5.4 on this box, it still shows as just one:
suske# sysctl -a | grep cpu
kern.threads.virtual_cpu: 1
kern.smp.maxcpus: 1
kern.smp.cpus: 1
…
dev.cpu.0.freq: 3194
In FreeBSD 5.4 hyperthreading is supported by default, here’s a snippet from the boot messages:
CPU: Intel(R) Pentium(R) 4 CPU 3.20GHz (3192.01-MHz 686-class CPU)
Origin = “GenuineIntel” Id = 0xf41 Stepping = 1
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Hyperthreading: 2 logical CPUs
Out of curiosity, what is the max % of CPU usage reported for a thread that’s hammering away at a compile? (In the screenshot the max is 2.8% but it’s sorted and I imagine some scrolled off the screen.)
You’re suggesting that they’re faking the number of processors (32) in the OS, but the chip actually has 8 cores that are designed to handle 4 simultaneous threads each, so there’s a real hardware basis for the number 32. Here’s a quote from a technical paper:
From: http://opensparc.sunsource.net/nonav/publications/D05_01Aut2.pdf, found among other papers at: http://opensparc.sunsource.net/nonav/pubs.html
no, no, I’m not claiming they’re faking the number of processors, I’m saying the technology cannot be compared to 32 actual seperate CPU’s – as your quote clearly demonstrates, and as my “test” show on a practical level. Interleaving threads is a good basis for showing the OS a certain number of CPU’s, you’re absolutely right that this is the source for the number 32 – I must have missed that in my research, that’s a good find. Wether the interleaving has a positive effect on available CPU capacity for actual work depends on the type of workload available – there’s a reason Sun tells you what the machine is very good at.
I was reacting you your: “So they probably did tests similar to this simple one I did, and concluded that the safe number was to tell the operating system that 32 CPU’s are available.” They may well have done tests, but the bottom line is that they built support for 32 simultaneous threads, so they id as a 32 proc machine.
I wonder what the impact on compiler smarts is going to be. Why bother to work hard to avoid stalling the pipeline when you know there are 3 other threads that’ll jump in and utilize the processor?
Correct, I was probably wrong when I said that. And yes, compilers suddenly became even more interesting with this chip, that’s right – but since this chip shines at threading, and the easiest language to do threading in is Java, I expect the same thing to happen in virtual machine technology..