Wednesday, April 14, 2010

Broken buckets in Eucalyptus

I have experienced an issue with Eucalyptus Walrus when uploading big data chunks, which has at some point failed without finishing and left the bucket in an inconsistent state, so that when I try to list its contents I get an error message such as:

dd1e48ad-45ea-44e3-9227-e9fd67309ab0adminBukkitListBucketTypejava.lang.NullPointerException

The steps I followed to fix it:
a) Stop both Cloud Controller and Walrus
b) Physically remove the bucket from the Walrus storage
c) Remove the affected entries from the CLC database
d) Restart both components

The tricky one is fixing the database, which I did following these steps:
On the CLC server the Eucalyptus database files are in:

/var/lib/eucalyptus/db

the .script files contain the data to be loaded to the in-memory database at CLC startup time.
The relevant file for buckets is eucalyptus_walrus.script, where certain entries must be removed:

1.- Look for the entry where the affected bucket is inserted and remove it:

INSERT INTO BUCKETS VALUES(10,'image-bucket',0,'2010-03-05 18:16:29.965000000',FALSE,FALSE,FALSE,FALSE,FALSE,'US','admin')

the first value in the INSERT is the bucket ID (10), which we'll need later.

2.- Look for the entries where the objects in the affected bucket are inserted and remove them (recognisable by the name of the bucket in the INSERT):

INSERT INTO OBJECTS VALUES(16,'image-bucket',NULL,'application/xml','c0a490cf7d28955fd3e8d0b249c64c60',FALSE,FALSE,FALSE,FALSE,'2010-03-05 18:16:31.012000000','debian.5-0.x86-64.img.manifest.xml','debian.5-0.x86-64.img.manifest.xmlLCc2jw..','admin',6293,'STANDARD')NDARD')
INSERT INTO OBJECTS VALUES(17,'image-bucket',NULL,NULL,NULL,FALSE,FALSE,FALSE,FALSE,NULL,'debian.5-0.x86-64.img.part.0','debian.5-0.x86-64.img.part.0nOsjWQ..','admin',NULL,NULL)

The first value in each insert is the object ID (16 and 17), which we will need later.

3.- Look for the entry where access is granted to the bucket (recognisable by the bucket ID) and remove it.

INSERT INTO BUCKET_HAS_GRANTS VALUES(10,26)

The insert goes into the BUCKET_HAS_GRANTS table and the first values (10) is the bucket ID we found out earlier. The second value is the access
granted ID (26) which we will need later.

4.- Look for the entries where access is granted to the objects (recognisable by the object IDs) and remove them.

INSERT INTO OBJECT_HAS_GRANTS VALUES(16,27)
INSERT INTO OBJECT_HAS_GRANTS VALUES(17,28)

The INSERTs are into the OBJECT_HAS_GRANTS table and the first value is the Object IDs we found out earlier. The second values are the access
granted IDs (27 and 28) which we will need later

5.- Look for the entries where access rights are defined for the bucket and the objects (recognisable by the access grant IDs) and remove them:

INSERT INTO GRANTS VALUES(26,TRUE,TRUE,TRUE,TRUE,NULL,'admin')
INSERT INTO GRANTS VALUES(27,TRUE,TRUE,TRUE,TRUE,NULL,'admin')
INSERT INTO GRANTS VALUES(28,TRUE,TRUE,TRUE,TRUE,NULL,'admin')

The INSERTs are into the GRANTS table and the first values are the access grant IDs we found out earlier.

(Also posted in the Eucalyptus forum)