String Interning and Threads
Anton Tagunov added an excellent comment to yesterday’s entry:
The one thing that has always stopped me from doing this was: interning must use some global lock. So, while interning would cause little to no harm for desktop application it is likely to introduce extra synchronization bottleneck for server applications. Thus potentially degrading perfromance on some multi-cpu beast. Would you agree with this, Adrian? Firstly, let me make something very clear: string interning will cause bugs in your code and they will be hard to track down – they will however be easy to fix once found. At some point, someone, somewhere will forget to intern a string and let it pass into your code that assumes strings are interned. Also, string interning is an extremely situational optimization. In most cases, it will worsen performance because the overhead of interning the strings will not be made up for by the reduced complexity of comparisons. Even of the cases where it does help, most of the time the difference will not be noticeable. As always, don’t bother optimizing until you know that you need to and that it will help – this is an optimization that causes the code to become less maintainable. Having said that, lets go back to the original question. Firstly, does string interning require synchronization? Probably, but not in terms of Java. The
String.intern()method is a native method and works via JNI. It would be difficult to imagine a way of achieving the behavior without at least some synchronization though. The synchronized block however would be very small, and very rarely encountered. There are two situations to consider, either the string is already in the interned list or the string is not. If it is, then no synchronization needs to occur because the list is only being read. So multiple strings can be interned at once so long as all of them are already in the interned list. Synchronization will be needed however whenever a string is interned for the first time (ie: it doesn’t match any String constant that has been loaded or any previously interned string). So on a multiple CPU system, it would be very bad to intern a lot of strings that are only ever used once or twice as they would require a lot of synchronization for no benefit. Of course on a single CPU system, doing this would be a bad thing anyway because it would incur the extra cost of comparing strings to check if they match an interned string without gaining any real benefit. My theory would then be (and only real world application profiling will confirm this in any particular situation) that the string interning technique is slightly less likely to pay off on multiple CPU systems, however because the situations in which string interning is useful require that the vast majority ofString.intern()calls match something already in the cache (most likely one of the string constants they’re to be compared against) the question of how many CPUs will be in use isn’t going to have any significant impact. I can’t stress enough though that if you don’t have specific profiling data that shows String comparisons as the biggest bottle neck in your application, you shouldn’t apply this optimization. Great question. UPDATE: Here’s an interesting discussion of interning relating specifically to this question. The automatic google search on the side (if you actually click through to this blog entry) is very handy at times.
String Interning
Elan Meng investigates the behaviours of string constants, interning and == compared to .equals. Very informative. The question remaining is, why would anyone ever use == instead of .equals on a String considering how likely it is to cause confusion and the potential for disaster if one of the strings for some reason isn’t interned. The answer is performance. In the average case, the performance difference between == and .equals is pretty much non-existent. The first thing .equals does (like any good equals method) is check if the objects are == and returns true immediately if they are. The second thing .equals does is check that the strings have the same length (in Java the length of the string is a constant field and so requires no computation). If an answer still hasn’t been found, the characters of each String are iterated over and as soon as they differ, false is returned. Now, consider the possible cases:
Preparing For Screen Tests
Angel Studios is starting to plan a round of screen tests for the people interested in being a part of our films and stage productions. We’d essentially like a database of actors we can flick through and find a list of people who might be suitable for a given role. We also need to do a bunch of auditions for our upcoming short film which is just starting production. There are a few people I want to get back in touch with to particularly invite along to do a screen test but we need to sort out exactly how we want to do it yet. Even so, if anyone’s interested in getting involved in film or stage productions, no experience is necessary, give me a yell (adrian at intencha dot com) and I’ll let you know when we’re actually ready to start doing some. It’s a lot of fun.
Stuck In A Mindset
This is a great example of getting stuck in a mindset. A piece of very poorly written Java code is presented followed by a much shorter piece of Groovy code and Groovy is declared the winner.
The original Groovy:
list = ["Rod", "James", "Chris"]
shorts = list.findAll { it.size() <= 4 }
shorts.each { println it }
Java:
for ( String item : new String[] {"Rod", "James", "Chris" } ) if ( item.length() <= 4 ) System.out.println(item);
oooo, one line! It must be good… Of course if I were actually going to write that, I’d write it as:
Scripting Musings
Continuing my journey of learning regarding scripting languages (starting here):
I like it because it saves typing. There was a good example of a 20-line Java program that was reduced to 3 lines in Groovy. I’ve lost the link though. Posted by: Jonathan Aquino I like readability. Java can definitely be overly verbose at times, particularly with IO:
BufferedInputStream in = new BufferedInputStream(new InputStreamReader(System.in()); however, number of lines required to do something isn't something that I really consider when choosing a language. Obviously if it's at either extreme it matters, but few languages are either so short or so long that it really impacts readability. I'm also far more concerned with maintenance time than with development time because maintenance time almost always outweighs development time. Having said that, many people do like really compact languages and some scripting languages provide that (HyperTalk certainly doesn't). Greg Black adds some
interesting comments as well. I certainly see the benefits of scripting for quick and dirty solutions or small programs as I tried to point out. Greg also quotes me as saying:
Why Do People Like Scripting Languages?
As much as the title seems to suggest one of my rants, this is actually a valid question along with a bit of my own pondering. Scripting languages seem to be the flavor of the month these days and I’m not really sure why. I’ve got nothing against scripting languages but I don’t see why they should be considered the be all and end all solution that people seem to think they are. Interestingly, when I first seriously got into programming, it was using HyperCard and there was a constant barrage of insults coming from the “real programmers” about these hobbyists using scripting languages. More than ten years later and all of a sudden you’re just not groovy (pardon the pun) if you’re not using a scripting language. I love being able to write a quick perl script to munge a text file in an odd format or to run through the Xalan codebase and change the package so that it doesn’t conflict with the version in the JRE. Our support auto-responder at work is a cool little perl script that I wrote to take the incoming (evil MS HTML formatted email) that comes from the web form, parse it, log the details in our tracking system and fire back an email to the user with the tracking number. Works a charm. It would be a real pain to write that stuff in anything but perl because of perl’s awesome support for text parsing and abundant if unwieldy and occasionally unreliable libraries in CPAN. I’ve also written a major business system in perl with database interaction, workflow and all that jazz. It worked well but it was certainly no easier to do it in perl than Java or most likely C given appropriate libraries. I wouldn’t consider C an option unless performance was absolutely critical for server systems however because it leaves open the risk of buffer overruns and similar security holes that can be completely eliminated automatically by most other languages. Even if speed were critical I’d recommend buying faster hardware or using a distributed system before writing a server in C. I’ve also written little scripts and smallish sites in PHP. It’s a nice language that I enjoy using but again I don’t see anything hugely wonderful about it. Everyone seems to be very python oriented these days and I must admit to having almost no knowledge of the language but from the code I have tried to modify in python I really don’t see any reason to be overly excited about it. Again, there doesn’t seem to be anything particularly wrong with it, but I don’t see why it would be so much better than C, Java, Visual Basic, C# etc. I also use JavaScript a lot at work and use it as a full programming language, not just to do roll over graphics. It can do some cool things but the resulting code is far from easily maintainable and again, I don’t see the advantage other than it’s the only option for code-in-a-browser when working cross-browser. The most common reason I hear people giving for why they like scripting languages is because they “just flow better”. I just don’t buy that. I grew up on scripting languages and I just don’t find that they flow any better than any other languages. They do tend to be easier to learn because you get to ignore most of the rules of good programming while you learn (think perl without the use strict directive). If you want to write good code though, you should put that use strict line back in and pay attention to all those little details that make initial coding harder but maintainability easier. Once you’re thinking about all those little things I find scripting has the same feel as “programming”. So am I abnormal or am I just missing something? Maybe it’s both….
Installing On Linux
A while back Kyle Rankin questioned why people would use InstallShield under Linux. He suggests people use the standard package management schemes that the various distributions provide and he’s dead right. You should make things consistent for the user because it’s consistency that makes a user interface easy to learn and makes it more productive (once you’ve learnt it once you can use it in a whole heap of places at once without taking the time to think of the right way to do it). There is however a problem with this. There are just way too many linux distributions. Even taking into account the fact that there’s a lot of overlap in package management tools, it’s a big ask to expect a company to provide different packages for all of the different systems. Given that, they now have two choices:
Swing Text APIs
On a more positive note, if you need to work with the Swing Text APIs in any detail,would like to do something with text that you currently can’t or for some reason are implementing a text API, take a look at this overview and the articles here (the text articles are towards the bottom). Some of the docs seem to be fairly old but the basic features of the API really haven’t changed all that much and it’s far more important to understand the design than the specific methods that are available anyway. Once you understand the design, you’ll know where to look for the methods.
Language Bigotry
I’m getting really tired of the amount of bigotry in regards to programming languages. First let me admit that I’m certainly guilty of technology bigotry – I’m definitely biased towards Mac OS and Java to name a couple of things, but I’m at least aware of it and willing to admit it. Furthermore I can provide reasonable explanations as to why I like them if not actually explain in every situation why I favor them (read: at times I’m just being bigoted about it). When it comes to programming languages and in fact most things in technology, there are few things that are not the best choice in at least some situation – even if it is only one incredibly specific situation. This is even more true when you only take into account the things that are widely used. Lately Java has been coping it (more than usual) with people actually booing at demonstrations that used Java and a lot of fairly immature comments being made. Lets take the view point that Java is a pathetically hopeless, good for nothing language. Everything about it sucks, the people who use it are obviously idiots and it’s too slow to do anything useful with. Obviously the company I work for made a massive mistake implementing their product in Java and at long last we’ve realized it and want to reimplement it in a good language (we’ll probably have to fire all those stupid java developers too but one step at a time). Lets look at the (highly informal) requirements for these products:
Dependencies Redux
I ranted earlier about dependencies and the way Java programmers are always pining for the latest and greatest. The comment by Stephen Thorne to the article deserves being published with the same level of visibility as the original post so I quote it below. It also deserves some rebuttal which is also below.
I’ve been saying this for years, and yet I still run into rabid pro-java programmers who managed to rattle off a list of reasons why java is the bestest programming language in the world include “good library support” to which my response is “hold on, slow down, good library support? let me tell you a story…” and I recite any one of a dozen anecdotes about dependancy hell in java. The only java project I’ve seen that manages to avoid this without extreme pain is kowari, which takes the slant “We’ll distribute the ENTIRE DAMNED THING as a .jar, libraries included, just make sure you have a recent enough JVM.” They don’t have the windowing troubles because its a database. ;) The first thing I want to correct is the implication that java doesn’t have good library support – it absolutely does. Look around at the huge range of libraries available for Java and particularly look at the fantastic standard library that comes with it. Plenty of top quality libraries for pretty much every moderately common task. There is however a separate issue of dependency management. Problematic dependency management (aka dll hell) is caused by two things:
Greg Meet Ken
Greg Black comments on the IBM donation of code to the ASF (at least I think that’s what he meant). Ken Coar has already provided the explations. For the record Greg, your site just gives me “access denied” so I can only read your blog via Planet Humbug and can’t leave this as a comment.
JRT
James Strachan puts forward a proposal that Sun opensource the standard Java libraries and I see no problems with it. In general I don’t think we’ll see all the benefits James predicts from it though we would see some. In particular I want to point out one benefit that James claims we’ll get but that we definitely won’t get:
more eyeballs are now looking closely at the code This is the biggest advantage opensource proponents put forward for opensourcing any code but it just doesn’t apply to Java. Why not? The source code to the standard Java libraries are included with every copy of the JRE. You can already go and inspect the source code, find bugs and submit patches to Sun. The fact is, most people can’t be bothered. They have real work to do and don’t want to be wasting time analyzing the source code for the library their using, they just want it to work. When it doesn’t work then people tend to turn to the source and people are already doing that. I also don’t believe that people really want to write Java code that can run on the .Net platform. Java has always suffered from people feeling that it was second rate due to it’s cross platform nature. This is true to some extent, when writing any cross platform code you inherently make it harder to use the platforms native resources due to the extra layer of abstraction. Why would you use the .Net platform when all of the extra functionality it would provide (like tie-ins to word, infopath etc) wouldn’t be available from Java (because they’re not cross-platform). You’d have to use JNI or similar to call “native” .Net code and so you have to learn C#, the .Net libraries and their toolsets etc. Also, opensource developers who are refusing to use Java now will refuse to use Java then. Read some of the blogs coming out of OSCON at java.net and you’ll realize that the majority of the opensource world is just unreasonably bigoted against Java. That’s okay though, I’ve grown to really dislike the GPL anyway.