StarExec
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Creating Subsets

3 posters

Go down

Creating Subsets Empty Creating Subsets

Post by timking Mon Mar 30, 2015 9:10 am

I currently have a lot of trouble creating representative subsets of currently existing spaces. The main idea here is that given a space one should be able to select and run an arbitrary subset of the hierarchy. Creating representative subsets was extremely useful at NYU for testing with our mini-cluster, and it is something I find myself often wanting to do. I am not sure if I am just not thinking through the set of tools properly or if it is the case that there really just isn't good support for this yet. So I thought I would ask others if/how they do this. I am especially interested in whether or not the starexec devs have tried to do this.

Task 0: List Paths
Given a root space, list all of the benchmarks ids accessible in the hierarchy and for each path from the root space to the benchmark print the path as a string of space name. (Alternatively, print the space ids along the path.) This is more of a warm up for the next tasks.

Task 1: Create Subsets
Given a set of benchmark ids and a root space, create a new space in the space $TARGET containing exactly those benchmarks in the benchmark id set in the hierarchy starting from the root space. The resulting space should preserve the subspace hierarchy along the paths to the reachable benchmarks. Space attributes and other objects in the space do not have to be preserved. (They can be set to default values.)
Example:
Suppose you start with the space
  (space r
    (space s1 (benchmarks bench1))
    (space s2 (benchmarks bench2)))
and you select bench2 then the output should be
  (space r
    (space s2 (benchmarks bench2)))
[with fresh ids for r and s2 of course].


Task 2: Create Subsets (relaxed)
This is the same as the previous task 1 except any subspace of the root can appear in the resulting space. This allows for potentially empty spaces.  (This is simply to loosen the requirements of the problem if it happens to be easier.)
Example:
The previous example could also accept the output where s2 is empty.
  (space r
    (space s1 (benchmarks bench1))
    (space s2))

It is important for the tasks to be fully scripted as it will involve hundreds to thousands of inclusions/omissions. Doing this manually won't scale.

Alternatively, the input to task 1 and 2 could be paths to benchmarks from the root nodes instead of benchmark ids. (This is a bit nicer as benchmarks can appear in different spaces reachable from the root.)

timking

Posts : 23
Join date : 2014-04-23

Back to top Go down

Creating Subsets Empty Re: Creating Subsets

Post by Admin Mon Mar 30, 2015 3:54 pm

Hi, Tim.

I can see the benefits of a feature like this. But isn't this pretty easy to do on the client side with space XML? You download the space XML for your hierarchy, select only the benchmarks in that space XML with ids in your id set (by deleting the other benchmark elements), and then upload the modified space XML to create a new space with just those benchmarks.

Again, I'm not saying we couldn't add this, I am just wondering if I am missing something and it would be hard to do yourself.

Aaron

Admin
Admin

Posts : 162
Join date : 2014-04-22

http://www.cs.uiowa.edu/~astump/

Back to top Go down

Creating Subsets Empty Re: Creating Subsets

Post by timking Tue Mar 31, 2015 8:58 am

Hi Aaron,

Let me try to explain why this seems less simple to me than you are suggesting. Supposing I knew the exact list of benchmark ids I wanted to keep (or throw out), I could make a script (of questionable quality) that looks for lines in the xml file that are Benchmark tags, then use regexp to pattern match for the id, and drop those lines if the id is not in the set. This is to me the obvious implementation of your suggestion. It is also an implementation of Task 2, but not Task 1 as it will leave empty spaces. (It is completely space agnostic. But more worrying is that it is brittle against the format of the benchmark tag changing.) Also any benchmark contained in two spaces cannot be distinguished between so either all copies of a benchmark are kept or all are removed.

Before you have a set of ids though, it is very helpful to have the path to each benchmark as a string. In my prior experience making problem sets, what I would often do is to do pattern selection based on paths. This is an easy way to select subsets of benchmarks. So before I have the desired benchmark ids, I often want the full path for the benchmarks as strings (relative to the root) and to select benchmarks from this list. This makes selecting families/subfamilies quite a bit simpler. Problem set construction can get a bit messy at the edges, but this is a generally useful early step. This is why I included Task 0 in my original post.

I am not saying that this can't be done, but I am not sure how to do this cleanly with a low amount of effort. The obvious answer to making it clean is to stop using line and regular expression based parsing and switch to using an xml parser, but I am hoping that someone has figured out a lighter weight solution.

Tim

timking

Posts : 23
Join date : 2014-04-23

Back to top Go down

Creating Subsets Empty Re: Creating Subsets

Post by Admin Wed Apr 01, 2015 10:37 am

Hi, Tim.

The tasks you are describing are very easy to do by processing the XML, right? You can determine the path(s) to a benchmark in the space hierarchy simply by traversing the XML and building up the path to each benchmark element as you traverse. You can select a single copy of a benchmark, and also drop empty spaces. Given how easy this is to do -- yes, processing XML -- and given the fact that while I do see there is general functionality here, it would be a bit of an exercise to define a general interface for this, that doesn't just degenerate into giving you the information that is in the XML but in textual form -- well, given these things, I think we will have to hold off on implementing this in StarExec for now. I hope you understand. If there is a lot of interest from others, we could revisit this, though.

There is a ramp-up cost to processing XML, sure. But it is quite worth it, as then you can do really anything you want with the space hierarchy information. You won't be left having to ask us to do something, waiting while we debate the proper general form of the functionality, and finally getting it implemented.

There are quite a few XML-related libraries for OCaml, for example (and certainly for other languages). I believe I used XML-Light (which seems to be rather old) a year ago or so, and found it quite workable.

Aaron

Admin
Admin

Posts : 162
Join date : 2014-04-22

http://www.cs.uiowa.edu/~astump/

Back to top Go down

Creating Subsets Empty Re: Creating Subsets

Post by j.waldmann Thu Apr 02, 2015 5:20 am

Hi.

Just a data point from Termination. We have this "subset selection rule" for competitions
http://termination-portal.org/wiki/Termination_Competition_Problem_Selection_Algorithm
and I implemented this in our star-exec client
https://github.com/stefanvonderkrone/star-exec-presenter/blob/master/Control/Job.hs#L136

Actually we did not apply the rule for competition last year (we ran on the complete benchmark set)
but the "random subset" selection was still very useful for testing.
You are of course very welcome to use, fork, patch our code.

- Johannes.

PS: since I noticed the word "regexp" in the above ... http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

j.waldmann

Posts : 84
Join date : 2014-04-26

Back to top Go down

Creating Subsets Empty Re: Creating Subsets

Post by Admin Thu Apr 02, 2015 10:41 am

Johannes, your comment puts Tim's request in a somewhat different light for me. I can indeed see the usefulness of being able to create a job to run a small random subset of benchmarks in a space hierarchy. I too have done this before by hand.

What if we added an option to run a random subset of a space hierarchy by choosing X percent of benchmarks in each space in the hierarchy, where the user specifies X? What do you (Johannes and Tim) think? This would be another option for job creation via the web interface (and it could be added to StarExecCommand, too).

Aaron
PS Johannes, I hope you'll be happy to hear that we are documenting our URL interface currently, and should have that available sometime this month.

Admin
Admin

Posts : 162
Join date : 2014-04-22

http://www.cs.uiowa.edu/~astump/

Back to top Go down

Creating Subsets Empty Re: Creating Subsets

Post by Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum