Wednesday, November 26, 2008

setting up DSpace and tomcat on windoze

My attention has turned to digital repositories. I looked at Greenstone but had enough problems with even a simple setup (admittedly trying to build from source) that I gave up. So I am now looking at DSpace. I had terrible trouble getting it to work on Windoze with Tomcat. I got it working eventually. Here is the tale of my voyage of discovery.

Prerequisites

Postgres.

goto http://www.postgresql.org/download/ and click on link for windoze. this will take you to a one-click installer for 8.3 the one-click install runs a nice setup wizard. database superuser = postgres,1gandalf. listening port=5432 (default). locale = English, United Kingdom I got the popup "A non-fatal error occured during the cluster initialisation. Please check the installation log in /tmp for details." There did not seem to be a log in either c:\temp or c:\windows\temp. The wizard then went on to completion. The service says it is starting for quite a while then stops. this seems wrong. The windoze event log has a timeout waiting for server startup, preceeded by a fatal error "could not create lock file 'postmaster.pid', permission denied. Went into c:/Program Files/PostgreSQL/8.3 and did a recursive chmod 777. Then restarted the service. this fixed it.

tomcat.

download binary distro for tomcat 5.5 core, windows installer. connector port = 8030 (default is 8080). u,pw=admin,1gandalf service fails to start. turn on debug in tomcat configuration panel, then restart. catalina logfile shows a bind error, address already in use. this was because there was a leftover process that interferred. I found the process using tcpview. need to put tcpview on my web pages. it is much better than netstat. after the kill tomcat restarted just fine.

maven.

it does not seem to come with anything ready for windoze. I unpacked the binary zip and found it is an installation directory. mvn can be run from the bin directory. Add the full pathname of the bin dir to the PATH using the control panel.

postgres (continued)

run PG Admin III (start->PostgreSQL 8.3->pg Admin III. double-click on localhost to connect to the db, enter the password for the postgres user. double-click on loginRoles then rightMouse. Select 'New login role...'. u,pw=dspace,dspace, check 'can create database objects' and 'can create roles'. double-click on databases then rightMouse, select 'new database...'. name=dspace,owner=dspace, UTF8 encoding. privilege properties, add user/group public with connect rights.

dspace.

edit dspace/config/dspace.cfg.
change pathnames so they work for Windoze, e.g /dspace => c:/dspace
change port number to 8030, i.e the one used by tomcat.
db.name = postgres
db.driver = org.postgresql.Driver
db.username = dspace
db.password = dspace
mail.server=mail.company.co.uk
mail.from.address = amarlow@company.com
feedback.recipient = amarlow@company.com
handle.prefix = 123400009
authentication.password.domain.valid = company.com

create c:\dspace cd to where dspace was unpacked. mvn package. this starts by doing lots of downloading and takes a *long* time. it takes over 77 minutes! cd dspace/target/dspace-1.5.1-build.dir. ant fresh_install

using a DOS cmd window:
cd c:\dspace
bin\dsrun org.dspace.administer.CreateAdministrator
email: amarlow@company.com
first name: andrew
last name: marlow
password: 1gandalf

shutdown tomcat. cd to tomcat conf dir edit context.xml putting in the dspace stuff. the context file is new to tomcat v5. restart tomcat. http://localhost:8030/jspui does not work! could not get this to work. tried to copy webapps/jspui to tomcat webapps dir but this also failed, but for a different reason. this gave an error in the tomcat log because it couldn't find ${dspace.dir}\config\dspace.cfg also get a NPE trying to use log4j when not properly configured. problem seems to be to do with the config directory not being copied over. but where to put it?

See http://mailman.mit.edu/pipermail/dspace-general/2008-July/002103.html for someone else who had exactly the same problem as me.

There seems to be a servlet LoadDSpaceConfig that is supposed to manage this and presunably report if there are any problems. But it seems that this is not working. I found this little nuggest in the DSpace manual on page 111, way after the installation instructions!:

The org.dspace.app.webui.servlet.LoadDSpaceConfig servlet is always loaded first. This is a very simple servlet that checks the dspace-config context parameter from the DSpace deployment descriptor, and uses it to locate dspace.cfg. It also loads up the Log4j configuration. It's important that this servlet is loaded first, since if another servlet is loaded up, it will cause the system to try and load DSpace and Log4j configurations, neither of which would be found.

I removed the context stuff from context.xml. This is to pursue a working configuration using the jspui that is copied into the tomcat webapps dir.

I edited webapps/jspui/WEB-INF/web.xml, replacing the dspace.dir param value with the DOS path (C:\dspace.....). This did have an effect. this gave the following error:

log4j:WARN No appenders could be found for logger (org.apache.commons.digester.Digester.sax).
log4j:WARN Please initialize the log4j system properly.
INFO: Loading provided config file: c:\dspace\config\dspace.cfg
INFO: Using dspace provided log configuration (log.init.config)
INFO: Loading: c:/dspace/config/log4j.propertieslog4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: c:\dspace\log\dspace.log (Access is denied)
        at java.io.FileOutputStream.openAppend(Native Method)
        at java.io.FileOutputStream.(Unknown Source)
        at java.io.FileOutputStream.(Unknown Source)
        at org.apache.log4j.FileAppender.setFile(FileAppender.java:289)
        at org.apache.log4j.RollingFileAppender.setFile(RollingFileAppender.java:167)

to fix this I did cd dspace;chmod -R 777 .

this gave

log4j:WARN No appenders could be found for logger (org.apache.commons.digester.Digester.sax).
log4j:WARN Please initialize the log4j system properly.
INFO: Loading provided config file: c:\dspace\config\dspace.cfg
INFO: Using dspace provided log configuration (log.init.config)
INFO: Loading: c:/dspace/config/log4j.properties
this looks alot healthier. Now when I visit http://localhost:8030/jspui/ I get the page I expect. Wow!

3 comments:

Andrew Marlow said...

I had another go at getting tomcat and dspace to play nice, but it was no good. Tomcat just refuses to see dspace unless webapps/xmlui is copied to the tomcat webapps dir. But there is a better way to do the copy:

cd dspace-1.5.1-build.dir
ant update
cp -R webapps/xmlui /cygdrive/c/Program` Files/Apache\ Software\ Foundation/Tomcat\ 6.0/webapps

Unfortunately, in web.xml this leaves a dspace-config param with a bad value: ${dspace.dir} where dspace.dir is not defined. This must be edited to refer to c:/dspace.

Also permissions need to be opened on c:/dspace.

Then a missing logs dir needs to be created, tomcat/webapps/xmlui/WEB-INF/logs

Andrew Marlow said...

There is an issue when doing a bulk import into DSpace. Some of the titles will have titles with accented characters, e.g German and French. Any script that produces an import XML from a CSV of journal titles will need to do the conversion somehow.

Andrew Marlow said...

To clear out the database take the following steps:

Stop tomcat (otherwise you will get database in use errors).

Delete the database:
run PG Admin III (start->PostgreSQL 8.3->pg Admin III. double-click on localhost to connect to the db, double-click on databases then rightMouse, select 'new database...'. name=dspace,owner=dspace, UTF8 encoding. privilege properties, add user/group public with connect rights.

Re-create the database by saying ant fresh_install in the dspace build dir.