Trying to crawl your site gives 500 internal server error?

Daan_Timmer · March 23, 2009, 3:07pm

Hello sparkfun,

For a school assignment I have to make a nice simple list of different available sensors. They also want to know which (online) shops sell what sensors.

Now that would be a Pain in the behind to maintain that manually. So I thought up of the following: make a very simple crawler, specific for a few sites (but able to be expanded for other sites as well).

So it starts at the basics,

<?php
$doc = new DOMDocument()l
$doc->loadHTMLFile("http://www.sparkfun.com/");
?>

But that gives me:

Warning: DOMDocument::loadHTMLFile(http://www.sparkfun.com/) [function.DOMDocument-loadHTMLFile]: failed to open stream: HTTP request failed! HTTP/1.1 500 Internal Server Error in D:\webserver\htdocs\school\core\crawler.class.php on line 19

Warning: DOMDocument::loadHTMLFile() [function.DOMDocument-loadHTMLFile]: I/O warning : failed to load external entity "http://www.sparkfun.com/" in D:\webserver\htdocs\school\core\crawler.class.php on line 19

So, before continuing, I would like to know AM I allowed to crawl your site for this purpose (never bad to ask for such a thing, right? :P) secondly, why do I get the 500status >_< I tried google, no problem although it gives a sh*t load of tag errors

And, perhaps a more rude question, would it be possible for you to supply a datafile with a certain dump of your products database? Would make it even more easy

Thanks in advance,

Daan Timmer

p.s. Why can I post this topic as a sticky? bug?

riden · March 23, 2009, 7:45pm

Instead of pulling down potentially megabytes of data, most of which you won’t need, why not download their PDF catalog?

http://www.sparkfun.com/commerce/downlo … atalog.pdf

Daan_Timmer · March 23, 2009, 7:56pm

Well, hehe, megabytes O.- I don’t think we will reach that very fast.

The HTML is nice and clean, so.

When crawling I don’t pull any images, only the HTML.

I only want a certain portion of the products (sensors category).

And I wouldend mind using the PDF catalog, but that is 4.5MB in size, that way we pull way more data off the servers.

It is also not like we fetch data on each request. More like every once a week or something an update that takes less then 1minute to complete.

Another thing, if you can provide me with a PHP-PDF reader that can extract the data the way I need it then I would go use the PDF immediatly.

I’ve found a workaround that works.

Will be waiting for a reply from official sparkfun member to tell me if I will be allowed to use this data or not.

busonerd · March 23, 2009, 9:43pm

Email them.

Daan_Timmer · March 23, 2009, 11:41pm

Sure, no problem, if there would be an email address to mail to…

I’ve searched the site but there was no email address >< Only about technical (product) issues ><

Or I am blind! That is possible too of course!

reklipz · March 24, 2009, 10:47am

http://www.sparkfun.com/commerce/popup_feedback.php

Hope this helps!

Frencil · April 1, 2009, 2:48pm

Daan Timmer,

Try this:

<?php
$doc = new DOMDocument()l
$doc->loadHTMLFile("http://www.sparkfun.com/commerce/");
?>

There’s a lot of stuff on the SparkFun domain that you’re not going to have access to. The entire storefront lives under that directory.

Hope that helps!

Topic		Replies	Views
Internal Server Error when posting reply SparkFun Site Questions/Comments	2	1397	March 7, 2007
Server errors previewing or submitting posts SparkFun Site Questions/Comments	8	3031	July 25, 2007
Compatibility issue: Site no workie with Firefox 3 SparkFun Site Questions/Comments	3	1542	November 26, 2008
Forum search not working SparkFun Site Questions/Comments	5	4442	November 16, 2018
Sparkfun Learn SparkFun Site Questions/Comments	1	1633	September 5, 2013

Trying to crawl your site gives 500 internal server error?

Related topics