[redland-dev] Parsing ntriples from a string
Dave Beckett
dave at dajobe.org
Mon Aug 21 04:24:12 UTC 2006
DeeJay-G615 wrote:
> After a fair bit of digging through redland and raptor I found the
> source of my issues.
>
> ntriples_parse.c :: raptor_ntriples_parse_chunk
>
> if(is_end) {
> if(ntriples_parser->offset != ntriples_parser->line_length)
> raptor_parser_error(rdf_parser, "Junk at end of input.\"");
> return 0;
> }
>
> I guess I should have read the grammar for N-Triples
> (http://www.w3.org/TR/rdf-testcases/#ntriples) as I would have
> discovered that each triple must be terminated by an end of line character.
>
> So
>
> "<http://example/q?abc=1&def=2>
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> \"xxx\" ."
>
> should be
>
> "<http://example/q?abc=1&def=2>
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#value> \"xxx\" .\n"
>
> Going back to the code exerpt above: surely it should be
>
> if(is_end) {
> if(ntriples_parser->offset != ntriples_parser->line_length) {
> raptor_parser_error(rdf_parser, "Junk at end of input.\"");
> return 1;
> } else {
> return 0;
> }
> }
>
> As the procedure should return non zero on failure.
Yes, I've made this change to raptor.
>
> Something else I wanted to bring up as a result of my travels through
> the parsing parts of the API...
>
> rdf_parser_raptor.c :: librdf_parser_raptor_parse_into_model_common
in redland
> which encapsulates the functionality of parsing from a uri, from a
> string and from a counted.
>
> the parameter list is
> (void *context,
> librdf_uri *uri,
> const unsigned char *string,
> size_t length,
> librdf_uri *base_uri,
> librdf_model* model)
>
> Procedures that use this to parse from a uri, pass the URI, null for the
> string and 0 for the length. Functions that parse from a string, pass
> null for the uri, pass the string and then 0 if they don't have the length.
and the number of parameters show is a bit of a mess, hence why it's
internal only.
> How that calling convention interacts with the following code is my
> point of interest.
>
> if(!base_uri)
> base_uri=uri;
>
> /* No base URI given, cannot proceed */
> if(!base_uri)
> return 1;
>
> If you don't pass a base uri when you are parsing from a URI you are ok,
> as it sets the base_uri to the uri as a default.
> However if you are parsing from a string, and you don't supply
> base_uri... base_uri (which is already NULL) gets set to NULL, and then
> the second if block executes and it returns a failure.
>
> Was this the intended behaviour? I'm guessing yes, as back up in
> rdf_parse.c, the 'parse from string' procedures check for the base URI
> being null, however the 'parse from uri' procedures don't.
>
> What is the ryhme or reason to this?
It's called a bug, software has them.
The underlying problem is that some syntaxes require a base URI and some
don't. The safe way is to always pass one in, however you call a parser.
That will never fail. If you want to be lazy and hope that the parser
doesn't need it, some of the tests above - which I agree are too strict -
will fail.
I'll add a function to raptor to test when a parser needs a base URI,
so that the test above can be more specific.
Dave
More information about the redland-dev
mailing list