PDA

View Full Version : HTML based form for XML database entry


jharriso
09-05-2006, 06:38 PM
Ed, Mark, anyone with XML really - I'm working on a project documenting some native american languages before they go extinct, and we're building XML databases as dictionaries.

We'd like to be able to get people with no experience with coding of any kind to be able to enter data. Is there a free and easy way to take data from forms on an HTML page and have it produce XML code in a predefined format that we can then move into our running dictionary?

We're going to have 4 cross referenced databases, this is our structure:
Metadata
ID
Source name
Source Description
Source File
Type (audio, print, video)
start time
speaker
genre
date

Audio
ID
Audio file path
metadata ID <ref>
start time

Lexicon
ID
Lexeme
Metadata ID
long gloss
short gloss
page
start time
pos
Plural
audio ref
Audio ref
semantic domain

Text
ID
transcription
translation
Lexeme ID
Metadata ID

Metadata will consist of information on our sources, lexicon is actual word entries, audio is for audio clips, and text will be for example excerpts of text.

Do these look like reasonable data structures?

fischerm
09-07-2006, 02:40 PM
I doubt there is anything out there already that will do what you want. However what you describe doesn't seem difficult to write from scratch. One question about XML. Are you choosing XML as a convenient and standard way to distribute the database once it is complete? If distribution is the primary reason, you might think about using a database as a way to store the entries while you do the data entry, and then write a second program to pull all the records out of the database and build your final XML document for distribution.

If a database isn't an option, creating an HTML form that simply formats up an XML entry that you can cut and paste into the 'master' document is really simple to do.

emurphy1
09-07-2006, 02:42 PM
Is there a free and easy way to take data from forms on an HTML page and have it produce XML code in a predefined format that we can then move into our running dictionary?

Uhh, free yes...easy no. I'm pretty inexperienced using XML so I can't say "Yeah I've done that and here is how you do it". But O'Reilly has a good article (http://www.xml.com/pub/a/2002/06/12/xupdate.html) on how to do it using XML, XSLT (Style Sheets) and XUpdate. The article discusses data schemas and transforming data to an HTML form for editing.


We're going to have 4 cross referenced databases, this is our structure..

Why 4 databases? Why not one database with 4 tables?


Metadata will consist of information on our sources, lexicon is actual word entries, audio is for audio clips, and text will be for example excerpts of text.

Do these look like reasonable data structures?
Yes, it looks like you are already planning on using foreign keys to enhance referential integrity...good move. Also, I don't see duplicate data across tables...another good thing. Without a better explaination of the data that will be stored in the tables I can't comment further. Overall you database design looks like a good start. :)

jharriso
09-07-2006, 04:42 PM
Commented code for Ed


Metadata
ID //Metadata ID
Source name //Title of source material
Source Description //Description of source material
Source File //File name of source material
Type //(audio, print, video)
start time //Start time of source material
speaker //Speakers in material
genre //Story, interview, etc
date //Date material was produced

Audio
ID //Audio file ID
Audio file path //path to sound on server
metadata ID <ref> //Link to data about sources
start time //start time of word in source file

Lexicon
ID //Lexical entry ID
Lexeme //representation of word
Metadata ID //list of Metadata IDs
long gloss //Detailed translation
short gloss //General translation
page //Page source material
start time //Start time in source material
pos //Part of speech
Plural //Plural forms
audio ref //Audio representation
Audio ref //Audio representation
semantic domain //Classification of meaning

Text
ID //Texts ID
Transcription //Transcription of text in native language
Translation //Translation of text
Lexeme ID //List of lexeme IDs
Metadata ID //List of Metadata IDs



All in all, we're looking to wind up with something like this: http://corpus.linguistics.berkeley.edu/~yurok/web/lexicon.html

They're using three different databases, with the metadata entries holding both our metadata and text tables. We want more flexibility with our texts, so we're building a fourth file.

Is there an advantage to putting everything into four files, or into one file with seperate tables?

Yes Mark, distribution is the overall reason for using XML as our database of choice. I'm not sure we'd have the resources to be able to do the alternative program as data entry - translating to XML.

Basically, I'm looking for a functioning version of the mockup that is attached, where the entries are optional, but inputting nothing would still generate the tag with nothing in it. Clicking submit would output the formatted XML code below, that could be copied and pasted into the working XML document.
Clicking clear would totally refresh the page.

I don't have any experience working with forms and HTML and behind the scene processing online, so that, I think, is my main question.

picch
09-07-2006, 11:35 PM
I can ask the Development Director of CAL how he did it (I'm sure it's coded from scratch). But what happens with us is we have a news XML feed, when someone creates a news post on the main page (in simple text) it is then converted and dumped into our XML feed once the post is created. We also have other XML feeds for our divisions, leagues, and members which are updated on a nightly basis my means of a cron job or php script, I can't remember off the top of my head.

Is that along the lines of what you're looking for?

Here is the page that contains information about all of our feeds: http://caleague.com/?page=xml

Don't click on the users.xml feed, it's huge and will probably crash your browser and possibly your PC if you don't have tons of RAM. It contains information for 450,000 users

If you click the others just give them a few seconds to load because they aren't small either

jharriso
09-08-2006, 01:17 AM
Garret, that would probably work great, especially because we wouldn't have to set it up locally on each person's machine... Whatever information you can provide on the matter would be greatly appreciated.